You probably recall the lac operon model of gene regulation in prokaryotes. It was easy…mutation of a regulator alters the output of the genes that are bound by the regulator. However, having tinkered with both transcription factor (TF) binding data and transcriptome data, you may also have found that the two forms of data fail to overlap, at least when higher eukaryotes are involved. If the prokaryotic model holds, after all, shouldn’t transcripts that are repressed by a TF be strongly upregulated when the TF is knocked-out?
We searched over 3,000,000 combinations of TF binding data
and TF alteration (knockout/down, overexpression, mutation) data for cases
where TFs actually do strongly and directly regulate their bound genes. We
found that such cases are indeed quite rare, but they do exist. Take a look.
GENE |
study |
type |
log (P) |
effect |
xbp1 |
GSE44949 |
oe |
-65.92 |
A |
irf1 |
GSE26817 |
oe |
-63.22 |
A |
rest |
GSE34625 |
ko |
-50.11 |
I |
rest |
28178232 |
kd |
-41.63 |
I |
rest |
GSE43033 |
kd |
-37.18 |
I |
xbp1 |
GSE78070 |
mut vs wt |
-34.74 |
|
rest |
GSE98871 |
kd |
-27.35 |
I |
hmgxb3 |
35688146 |
kd |
-25.41 |
A |
srebp2 |
GSE51736 |
oe |
-24.21 |
A |
znf335 |
35688146 |
kd |
-22.93 |
A |
rest |
GSE51461 |
kd |
-19.1 |
I |
atf4 |
GSE39194 |
mut vs wt |
-18.32 |
A |
cdkn1b |
GSE98546 |
mut vs wt |
-17.68 |
A |
kdm5a |
GSE147435 |
ko |
-17.2 |
I |
cic |
GSE80359 |
ko |
-16.36 |
I |
hif1a |
GSE27415 |
kd |
-15.53 |
I |
irf2 |
GSE132178 |
ko |
-14.72 |
A |
esr1 |
GSE18431 |
kd |
-14.18 |
A |
ar |
GSE151429 |
mut vs wt |
-14.15 |
|
nfkb2 |
GSE64234 |
kd |
-13.9 |
A |
yap |
GSE60579 |
oe |
-13.28 |
A |
tead |
GSE156912 |
drug |
-12.36 |
A |
rela |
GSE65040 |
oe |
-11.86 |
A |
grhl1 |
GSE47407 |
oe |
-10.78 |
A |
p53 |
GSE98727 |
ko |
-10.75 |
A |
prdm10 |
GSE135021 |
ko |
-10.71 |
A |
gata1 |
GSE43356 |
mut vs wt |
-10.31 |
A |
twist1 |
GSE43495 |
oe |
-9.15 |
A |
twist1 |
GSE62739 |
oe |
-8.95 |
A |
tfeb |
GSE108384 |
oe |
-8.65 |
A |
ring1 |
29874594 |
ko |
-8.54 |
I |
ezh2 |
GSE84243 |
ko |
-8.39 |
I |
smad4 |
GSE70940 |
kd |
-8.38 |
I |
atf4 |
GSE72658 |
ko |
-8.2 |
A |
e2f1 |
GSE16454 |
ko |
-7.63 |
I |
taf1 |
35688146 |
kd |
-7.51 |
A |
tp73 |
GSE162860 |
oe |
-7.15 |
A |
ar |
GSE66722 |
kd |
-7 |
A |
myod |
GSE168703 |
oe |
-6.97 |
A |
nrf4a1 |
GSE79490 |
oe |
-6.79 |
A |
otx2 |
GSE21900 |
ko |
-6.5 |
A |
spdef |
GSE48928 |
kd |
-6.33 |
A |
six2 |
GSE79024 |
cell type |
-6.03 |
A |
stat1 |
GSE49519 |
ko |
-5.85 |
A |
taf2 |
35688146 |
kd |
-5.62 |
A |
ronin |
GSE120008 |
ko |
-5.43 |
A |
tal1 |
GSE46970 |
ko |
-5.26 |
A |
e2f6 |
35688146 |
kd |
-5.15 |
I |
hoxc9 |
GSE34420 |
oe |
-4.36 |
I |
znf750 |
GSE32685 |
kd |
-3.92 |
I |
The binding data against which the above studies were
compared was primarily retrieved from Chip-Atlas. Whenever possible, we provide the
GSE-identifier of the data we used, so you can tinker with the underlying data
yourself.* In other cases, the PMID ID is provided. Regarding the type of TF
alteration (e.g. knockout or “ko”), splicing alterations are listed as
mutations. The effect of the alteration may be A or I (activation or
inhibition). In two experiments where the TF isoform was altered (against the
dominant form), we refrain from the A/I designation. It’s not difficult to
assign the A/I designation. If, for example, a TF is knocked down, and its gene
products are upregulated, it’s obvious that the TF is predominantly an
inhibitor, at least in the context of the experiment. The P-values are
not adjusted; an adjusted P=.05 cutoff occurs around -8.5 above. Some results
are included by virtue of the fact that, despite unimpressive P-values,
the study was the single-best “aligner” with the TF binding data.
Despite the fact that all the TF-binding data was derived from human cells, 12 of the 50 studies above are mouse studies. This seems like a high number. Unlike human studies, mouse studies are usually conducted in vivo; is it possible that we'd see more TF-binding/transcriptome correspondence if human studies could be conducted in vivo? (No, I'm not bemoaning this state of affairs!)
You can see some repeat offenders, as well as some
impressive P-values above. REST is seen five times, while ATF4, AR,
TWIST1, and XBP1 are seen twice. Perhaps it’s not surprising that a number of
these genes are involved in stress, hormone, or antiviral responses; these are
actions that should be taken immediately, without dithering about waiting for
confirmatory signals.
Chip-seq binding data can align strongly with studies that don’t
seem to directly involve the transcription factor in question. For example, an
MDM2 antagonist strongly downregulates LIN9 chip-seq targets (log(P)=
-87, see GSE189152). TCF2 kd downregulates STAT2 chip-seq targets (-83, GSE48367).
PRDM10 ko strongly downregulates BRCA2 chip-seq targets. Needless to say, there
may be therapeutic implications there (-65, GSE135021). Along this line, the
BRAF V600E mutation seems to downregulate SREBF2 targets. Sometimes, related
TFs seem to regulate each other; SNAI1 oe downregulates SNAIL2 Chip-seq targets
(GSE169735). In general, the transcripts perturbed on PRDM10, PRDM1, PRDM6, ATF6,
and EZH2 alterations seem to do a fine job of overlapping the binding sites of
numerous TFs.
*You can, of course, tinker with the data at our website. Just
enter the gene name or GSE-id in the keyword box in the “Relevant Studies” app,
and you’ll find the relevant database ID, which you can use for further
investigations (e.g. plug the database ID into the Fisher app).
8/2023: Having greatly expanded our database, here's another TF that strongly activates the genes it binds: znf335.