hackathon-marseille

GREEKC hackathon training event

View the Project on GitHub GREEKC/hackathon-marseille

A workflow to analyse disease-associated regulatory variants

Warning

This folder contains the results obtained during the GREEKC hackathon (May 23-26, 2019), and the subsequent development until July 9, 2019. After this date, the workflow will be further developed in a separate repository.

https://github.com/YvonFrid/cisreg-GWAS/

Project proponent

Participants

Motivation

The aim of the project is to apply bioinformatic methods to detect non-coding disease-associated variant that may affect transcriptional regulation by modifying transcription factor binding sites. The approach is based on the integration of information elements collected automatically from various genomic databases (BioMart, dbSNP, Ensembl, HalpoReg), and on the selection of variations that may affect regulation, by combining specialized bioinformatic tools: Regulatory Sequence Analysis Tools (RSAT) and ChIP-seq (ReMap) data.

For this, we develop an analysis workflow in the R statistical language, with BioConductor and CRAN libraries, to invoke remote resources (Web services). The tool is designed generically, and can be adapted for the study of regulatory variants of any disease documented in the GWAS catalog.

In order to facilitate its use by a biologist, the tool automatically generates (in R markdown) an analysis report illustrated by figures and tables.

Interoperability issues

Mobilized resources

The table below provides the URL of each resource mobilised by the workflow, and indicates their API if availeble.

Resource name Data types URL Access mode in the workflow
GWAS catalog SNPs associated to a query disease https://www.ebi.ac.uk/gwas/ ftp download
HaploReg Collect the SNPs in linkage desiquilibrium (LD) https://pubs.broadinstitute.org/mammals/haploreg/ R package
BioMart Collect SNP missing data http://www.biomart.org R package
ReMap Collect transcriptional regulators ChIP-seq experiments http://remap.cisreg.eu/ Web interface, to be converted to REST
Jaspar Collect all matrices corresponding to transcription factor names http://jaspar2018.genereg.net ftp download, to be converted to REST
RSAT Prediction of polymorphic variations affecting trnascription factor binding http://rsat.sb-roscoff.fr/ Web interface, to be converted to REST

Languages, libraires and tools used in the workflow

The workflow is written in R code embedded in a R markdown document, which automatically generates a report in HTML , pdf or Word .docs format.

Main R packages

Needs

Requested skills for the hacking

Expectated deliveries

Final goal

At the end of the hackathon, we aim at providing a fully automated workflow relying as much as possible on APIs wihout having to download the full datasets and parse them locally.

Intermediate goals and milestones

After day 1, …

After day 2, …

Deliverables