Saturday, June 18, 2022

Transcription Factors that Bind to Ultraconserved Sequences

We recently added more than human 700 chip-seq results to our database. If, say, your list of genes upregulated on XYZ knockout overlaps with genes that are bound by transcription factor ABC, you’ll probably know shortly after you plug your list into our “Fisher” tool. Conversely, if your own chip-seq list corresponds with a knockout or overexpression result, you’ll know.

We’ve got a lot to say about transcription factors. But for now, we’ll point to a single result of interest: there are a number of transcription factors that seem to have a very strong proclivity to bind on or near ultraconserved DNA regions. If you’re not informed on the subject of ultraconservation, google it. As far as I’m concerned, it’s the biggest mystery in biology. How is it that certain sequences of DNA, some exceeding 1000 bases, show perfect homology to DNA found in chickens and/or fish? Bear in mind that many of these sequences are not protein-coding sequences, and that even in the case of ultraconserved protein-coding sequences, no synonymous mutations are seen between humans and chickens (which diverged more than 300 million years ago). Adding to the weirdness, some deletions of ultraconserved regions in mice result in…perfectly happy mice.

Given our fascination with the subject, we’ve loaded a number of lists relating to ultraconservation into our database. Using the list “longer list of genes within or near ultraconserved sequences (ucnebase)” (database ID 112314101), we looked for overlaps with freshly-added chip-seq results. The results were quite powerful: outside of other ultraconservation-related lists, the single best correlation with this “longer list” was with DNA sequences that are bound by the transcription factor PCGF2. Here are the chip-seq-related P-values:

PCGF2 chip-seq targets in multiple cell lines: 10-83

PHC1 chip-seq targets in HEK293: 10-74

AEBP2 chip seq targets in multiple cell lines: 10-65

EPOP chip-seq targets in NT2-D1 line: 10-62

JARID2 chip seq targets in multiple cell lines: 10-52

PCGF1 chip-seq targets in HEK293: 10-27

EZH2 chip seq targets in multiple cell lines: 10-22

Needless to say, the vast majority of chip-seq results in our database do not overlap with ultraconserved DNA at any significance. Note that 4 of the 7 results above involve the Polycomb group of proteins. These proteins also tend to interact. For example, check out the PCGF2 interaction network.

These results could be very interesting and worth some follow up. It is worth remembering, however, that ultraconserved sequences are known to have high AT content, and at least some of the above TFs (e.g. JARID2) bind high-AT sequences. Thus, the binding of these TFs to ultraconserved sequences does not necessarily help to solve the mystery of ultraconservation. One step for further inquiry would be to examine the specific DNA stretches that are pulled down with these TFs. Are they uniformly AT-enriched? Are these stretches themselves ultraconserved, or simply nearby ultraconserved genes (our gene-centric lists do not discriminate between the two)?

A 2013 study pulled down protein-bound ultraconserved sequences and performed mass-spec on these proteins. The resulting list of binding proteins is not enriched for the above 7 chip-seq derived proteins. An explanation for ultraconservation offered in the papers is that ultraconserved sequences appear to have an excess of overlapping TF binding sites (relative to non-ultraconserved sequences). That’s a partial explanation for ultraconservation at best. Presumably, there’s evolutionary pressure for these TFs to bind at these sites…what is the nature of that pressure?*

Non-chip-seq lists that strongly overlap with ultraconserved genes include genes with homeoboxes, human accelerated regions (HARs), the GO “Forebrain development” list, transcription factors in general, genes upregulated in a mouse DCX mutant brain, GWAS autism results, and, perhaps most interestingly, genes that are downregulated in the mouse embryonic brain when two ultraconserved regions are knocked out. Just plug the above database ID into our Fisher tool for a complete list of results.

More on transcription factors shortly. More on ultraconserved DNA later.

 

*My own crazy hypothesis: there is little or no fitness conferred by these sequences, at least on the part of the host organism. Instead, cells will kill themselves if two ultraconserved sequences fail to match perfectly. Here, we’re talking about selfish DNA. How are the two sequences compared for mutations? How is apoptosis triggered? That would be worth investigating.

whatismygene.com 


No comments:

Post a Comment

A Preprint

It has been a while since we posted. That's largely because of the effort put into generating a paper. Check it out on BioRxiv . This is...