GREEKC hackathon training event
This folder contains the results obtained during the GREEKC hackathon (May 23-26, 2019), and the subsequent development until July 9, 2019. After this date, the workflow will be further developed in a separate repository.
https://github.com/YvonFrid/cisreg-GWAS/
The aim of the project is to apply bioinformatic methods to detect non-coding disease-associated variant that may affect transcriptional regulation by modifying transcription factor binding sites. The approach is based on the integration of information elements collected automatically from various genomic databases (BioMart, dbSNP, Ensembl, HalpoReg), and on the selection of variations that may affect regulation, by combining specialized bioinformatic tools: Regulatory Sequence Analysis Tools (RSAT) and ChIP-seq (ReMap) data.
For this, we develop an analysis workflow in the R statistical language, with BioConductor and CRAN libraries, to invoke remote resources (Web services). The tool is designed generically, and can be adapted for the study of regulatory variants of any disease documented in the GWAS catalog.
In order to facilitate its use by a biologist, the tool automatically generates (in R markdown) an analysis report illustrated by figures and tables.
The table below provides the URL of each resource mobilised by the workflow, and indicates their API if availeble.
Resource name | Data types | URL | Access mode in the workflow |
---|---|---|---|
GWAS catalog | SNPs associated to a query disease | https://www.ebi.ac.uk/gwas/ | ftp download |
HaploReg | Collect the SNPs in linkage desiquilibrium (LD) | https://pubs.broadinstitute.org/mammals/haploreg/ | R package |
BioMart | Collect SNP missing data | http://www.biomart.org | R package |
ReMap | Collect transcriptional regulators ChIP-seq experiments | http://remap.cisreg.eu/ | Web interface, to be converted to REST |
Jaspar | Collect all matrices corresponding to transcription factor names | http://jaspar2018.genereg.net | ftp download, to be converted to REST |
RSAT | Prediction of polymorphic variations affecting trnascription factor binding | http://rsat.sb-roscoff.fr/ | Web interface, to be converted to REST |
The workflow is written in R code embedded in a R markdown document, which automatically generates a report in HTML , pdf or Word .docs format.
Main R packages
Replace the downloads and manual analyses by programmatic accesses
At the end of the hackathon, we aim at providing a fully automated workflow relying as much as possible on APIs wihout having to download the full datasets and parse them locally.
After day 1, …
After day 2, …