Characterization of regulatory variants in promoters with enhancer activity and their relation with human diseases

Abstract

Gene regulation is driven by the interaction of regulatory sequences, commonly categorized as either enhancers or promoters. Recently, using a modification of the STARR-seq assay, we identified sets of promoters with enhancer potential. In a first publication the group characterized these promoters with enhancer activity (ePromoters) in HeLa and K562, finding that these sequences share epigenetic characteristics with enhancers and do show contact with other promoters in 3D interactions. ePromoters represent between 2% to 6% of the promoters in a given cell line, and they show cell type specificity. Moreover, genes regulated by ePromoters show enrichment in gene ontologies related to inflammatory or stress response.

Given that the majority of genetic variants associated with human diseases and traits (93.7%) have been found to be located in non-coding DNA, in this follow up analysis we set out to characterize regulatory variants in ePromoters. Using genetic variants associated with traits and disease (GWAS catalog), we found a significant enrichment of GWAS variants associated to Hematological Measurements in HeLa ePromoters, while in K562 ePromoters additionally we see enrichment of ‘Other measurements’ category, which tends to be related to different conditions as asthma or osteoarthritis, related to inflammatory response.

We hypothesize that genetic variants within ePromoters are likely to affect transcription factor (TF) binding. Therefore, we aimed to identify which are the relevant TFs interacting with these regulatory regions. Using pattern-matching approaches, we identified an enrichment for TFs belonging to the bZIP family, which have been implicated in pathways such as Toll-like receptor signaling, B and T cell receptor signaling and even to diseases related to inflammation such as acute myeloid leukemia and diabetes.

In order to have a comprehensive collection of variants we decided to further include variants from the GTEx project and CLINVAR, together with variants for the GWAS catalog. Annotated genetic variants commonly tag SNPs and they are not necessarily causal variants; for this reason we identified SNPs in Linkage Disequilibrium (LD) with this collection. In total 1,515 and 2,014 variants of our extended collection fall within the ePromoters neighbourhood in HeLA and K562, respectively. Of these we identified 109 and 190 to be likely affecting binding for 42 and 46 transcription factors, reported as enriched in ePromoter sequences. Particularly, we found the variant rs3771180 associated to asthma affecting the binding of FOS, FOSL1, JUN(var.2) and NFE2; and rs990171 that is related to lymphocyte counts and Celiac disease disrupting the binding of MEF2C. These transcription factors have been implicated in the TLR4 signalling, the T-cell leukemia virus 1 infection and megakaryocyte development and platelet production, supporting our hypothesis of ePromoters being associated to inflammatory response and to other hematological measurements.

Understanding ePromoters and the regulatory mechanisms that affect their dual function will help identify the causes of human diseases and traits.

Date
Jul 13, 2020 12:00 AM — Jul 16, 2020 12:00 AM
Event
ISMB2020
Location
Virtual event
Lucia Ramirez Navarro
Lucia Ramirez Navarro
Master student

Student at the Wellcome Sanger Institute. Interested in genomics, bioinformatics and immune system.