We recently added more than human 700 chip-seq results to our database. If, say, your list of genes upregulated on XYZ knockout overlaps with genes that are bound by transcription factor ABC, you’ll probably know shortly after you plug your list into our “Fisher” tool. Conversely, if your own chip-seq list corresponds with a knockout or overexpression result, you’ll know.
We’ve got a lot to say about transcription factors. But for
now, we’ll point to a single result of interest: there are a number of
transcription factors that seem to have a very strong proclivity to bind on or
near ultraconserved DNA regions. If you’re not informed on the subject of
ultraconservation, google it. As far as I’m concerned, it’s the biggest mystery
in biology. How is it that certain sequences of DNA, some exceeding 1000 bases,
show perfect homology to DNA found in chickens and/or fish? Bear in mind that
many of these sequences are not protein-coding sequences, and that even in the
case of ultraconserved protein-coding sequences, no synonymous mutations are
seen between humans and chickens (which diverged more than 300 million years
ago). Adding to the weirdness, some deletions of ultraconserved regions in mice
result in…perfectly happy mice.
Given our fascination with the subject, we’ve loaded a
number of lists relating to ultraconservation into our database. Using the list
“longer list of genes within or near ultraconserved sequences (ucnebase)”
(database ID 112314101), we looked for overlaps with freshly-added chip-seq
results. The results were quite powerful: outside of other
ultraconservation-related lists, the single best correlation with this “longer
list” was with DNA sequences that are bound by the transcription factor PCGF2.
Here are the chip-seq-related P-values:
PCGF2 chip-seq targets in multiple cell lines: 10-83
PHC1 chip-seq targets in HEK293: 10-74
AEBP2 chip seq targets in multiple cell lines: 10-65
EPOP chip-seq targets in NT2-D1 line: 10-62
JARID2 chip seq targets in multiple cell lines: 10-52
PCGF1 chip-seq targets in HEK293: 10-27
EZH2 chip seq targets in multiple cell lines: 10-22
Needless to say, the vast majority of chip-seq results in
our database do not overlap with ultraconserved DNA at any significance. Note
that 4 of the 7 results above involve the Polycomb group of proteins. These
proteins also tend to interact. For example, check out the PCGF2 interaction network.
These results could be very interesting and worth some
follow up. It is worth remembering, however, that ultraconserved sequences are
known to have high AT content, and at least some of the above TFs (e.g. JARID2)
bind high-AT sequences. Thus, the binding of these TFs to ultraconserved
sequences does not necessarily help to solve the mystery of ultraconservation.
One step for further inquiry would be to examine the specific DNA stretches
that are pulled down with these TFs. Are they uniformly AT-enriched? Are these
stretches themselves ultraconserved, or simply nearby ultraconserved genes (our
gene-centric lists do not discriminate between the two)?
A 2013 study pulled down protein-bound ultraconserved
sequences and performed mass-spec on these proteins. The resulting list of
binding proteins is not enriched for the above 7 chip-seq derived proteins. An
explanation for ultraconservation offered in the papers is that ultraconserved
sequences appear to have an excess of overlapping TF binding sites (relative to
non-ultraconserved sequences). That’s a partial explanation for
ultraconservation at best. Presumably, there’s evolutionary pressure for these
TFs to bind at these sites…what is the nature of that pressure?*
Non-chip-seq lists that strongly overlap with ultraconserved
genes include genes with homeoboxes, human accelerated regions (HARs), the GO
“Forebrain development” list, transcription factors in general, genes
upregulated in a mouse DCX mutant brain, GWAS autism results, and, perhaps most
interestingly, genes that are downregulated in the mouse embryonic brain when
two ultraconserved regions are knocked out. Just plug the above database ID
into our Fisher tool for a complete list of results.
More on transcription factors shortly. More on
ultraconserved DNA later.
*My own crazy hypothesis: there is little or no fitness
conferred by these sequences, at least on the part of the host organism.
Instead, cells will kill themselves if two ultraconserved sequences fail to
match perfectly. Here, we’re talking about selfish DNA. How are the two
sequences compared for mutations? How is apoptosis triggered? That would be
worth investigating.
No comments:
Post a Comment