By Yuchun Guo, Ph.D., Associate Director, Computational Biology and Machine Learning and Yuting Liu, Ph.D., Head of Data Science, CAMP4 Therapeutics
The Dark Side of the Genome – the 98% that does not code for proteins but regulates the expression of the 2% that does – presents significant opportunity when it comes to discovering new therapeutic targets, including the regulatory RNAs (regRNAs) that CAMP4 pursues.
Enhancer RNAs (eRNAs) are one type of regRNA we target. eRNAs are produced by DNA sequences called enhancers that are located within the same chromosomal domain, or “neighborhood”, as the genes they regulate. Within these neighborhoods, one gene may be regulated by several enhancers, and one enhancer may regulate several genes. Understanding these interactions is a key component in modulating gene expression for therapeutic purposes. Computational tools can help unravel these complex interactions to accelerate the discovery of regRNAs that regulate the gene whose expression we aim to modulate therapeutically.
To guide our target discovery process, CAMP4 has developed a proprietary AI-based model, EPICTM (Enhancer-Promoter Interaction Characterization), that predicts functional enhancer-gene interactions with a higher degree of accuracy than other methods.
The foundation for EPIC’s high predictive power is our vast, proprietary set of next-generation sequencing (NGS) data that allows us to map enhancers and their chromosomal domains across the entire genome for any gene expressed in a given cell type. These maps are unmatched by any publicly available data set.
Our team taps its deep knowledge and expertise to conduct a range of assays in our chosen cell type to identify and generate data on three genomic features for EPIC: which regions of the genome are accessible enough for enhancer-gene interactions to occur; how active the enhancers are; and which enhancers are interacting with which genes.
Standard models for predicting interactions also rely on these basic features, but the predictive power of these models is limited by the rather narrow scope of data and the suboptimal way in which those data are integrated. EPIC significantly improves upon the standard model in two ways:
First, we collect broader and more detailed data on the three basic features listed above in the cell type of interest using our internally developed assays, which we have found to be more informative than those used by the standard model. In particular, the assay for locating enhancer-gene interactions requires specialized expertise that many groups lack, and publicly available data cover only a limited number of cell types. CAMP4 has the expertise to run these assays in any selected cell type to obtain precise cell-specific information for EPIC.
Second, EPIC incorporates “engineered features” that combine the basic features in novel ways to create new features. These new features give EPIC an extra level of sophistication and predictive power for identifying enhancer-gene interactions that other approaches cannot achieve, as we recently reported in a poster presented at the 15th Cold Spring Harbor conference on Systems Biology: Global Regulation of Gene Expression.
The power of EPIC is its ability to quickly home in on the enhancers that produce regRNAs of interest to us. In this way, EPIC gives us a high degree of confidence that we are fully illuminating the Dark Side of the Genome to find the optimal regRNA targets for our therapeutic programs – and we continue to optimize EPIC by engineering more new features.
Today’s convergence of biology and technology are driving amazing discoveries that were unimaginable just five years ago. With EPIC, we have coupled the computational power of AI with our unparalleled insights in the genome to shed light on its Dark Side and discover new therapeutic targets for developing the equally groundbreaking medicines of today and tomorrow.