Wednesday, June 22, 2022

Transcription Factors that Strongly Regulate the Genes They Bind

You probably recall the lac operon model of gene regulation in prokaryotes. It was easy…mutation of a regulator alters the output of the genes that are bound by the regulator. However, having tinkered with both transcription factor (TF) binding data and transcriptome data, you may also have found that the two forms of data fail to overlap, at least when higher eukaryotes are involved.  If the prokaryotic model holds, after all, shouldn’t transcripts that are repressed by a TF be strongly upregulated when the TF is knocked-out?

We searched over 3,000,000 combinations of TF binding data and TF alteration (knockout/down, overexpression, mutation) data for cases where TFs actually do strongly and directly regulate their bound genes. We found that such cases are indeed quite rare, but they do exist. Take a look.

GENE

study

type

log (P)

effect

xbp1

GSE44949

oe

-65.92

A

irf1

 GSE26817

oe

-63.22

A

rest

GSE34625

ko

-50.11

I

rest

28178232

kd

-41.63

I

rest

GSE43033

kd

-37.18

I

xbp1

GSE78070

mut vs wt

-34.74

rest

GSE98871

kd

-27.35

I

hmgxb3

35688146

kd

-25.41

A

srebp2

GSE51736

oe

-24.21

A

znf335

35688146

kd

-22.93

A

rest

GSE51461

kd

-19.1

I

atf4

GSE39194

mut vs wt

-18.32

A

cdkn1b

GSE98546

mut vs wt

-17.68

A

kdm5a

GSE147435

ko

-17.2

I

cic

GSE80359

ko

-16.36

I

hif1a

GSE27415

kd

-15.53

I

irf2

GSE132178

ko

-14.72

A

esr1

GSE18431

kd

-14.18

A

ar

GSE151429

mut vs wt

-14.15

nfkb2

GSE64234

kd

-13.9

A

yap

GSE60579

oe

-13.28

A

tead

GSE156912

drug

-12.36

A

rela

GSE65040

oe

-11.86

A

grhl1

GSE47407

oe

-10.78

A

p53

GSE98727

ko

-10.75

A

prdm10

GSE135021

ko

-10.71

A

gata1

GSE43356

mut vs wt

-10.31

A

twist1

GSE43495

oe

-9.15

A

twist1

GSE62739

oe

-8.95

A

tfeb

GSE108384

oe

-8.65

A

ring1

29874594

ko

-8.54

I

ezh2

GSE84243

ko

-8.39

I

smad4

GSE70940

kd

-8.38

I

atf4

GSE72658

ko

-8.2

A

e2f1

GSE16454

ko

-7.63

I

taf1

35688146

kd

-7.51

A

tp73

GSE162860

oe

-7.15

A

ar

GSE66722

kd

-7

A

myod

GSE168703

oe

-6.97

A

nrf4a1

GSE79490

oe

-6.79

A

otx2

GSE21900

ko

-6.5

A

spdef

GSE48928

kd

-6.33

A

six2

GSE79024

cell type

-6.03

A

stat1

GSE49519

ko

-5.85

A

taf2

35688146

kd

-5.62

A

ronin

GSE120008

ko

-5.43

A

tal1

GSE46970

ko

-5.26

A

e2f6

35688146

kd

-5.15

I

hoxc9

GSE34420

oe

-4.36

I

znf750

GSE32685

kd

-3.92

I

 

The binding data against which the above studies were compared was primarily retrieved from Chip-Atlas. Whenever possible, we provide the GSE-identifier of the data we used, so you can tinker with the underlying data yourself.* In other cases, the PMID ID is provided. Regarding the type of TF alteration (e.g. knockout or “ko”), splicing alterations are listed as mutations. The effect of the alteration may be A or I (activation or inhibition). In two experiments where the TF isoform was altered (against the dominant form), we refrain from the A/I designation. It’s not difficult to assign the A/I designation. If, for example, a TF is knocked down, and its gene products are upregulated, it’s obvious that the TF is predominantly an inhibitor, at least in the context of the experiment. The P-values are not adjusted; an adjusted P=.05 cutoff occurs around -8.5 above. Some results are included by virtue of the fact that, despite unimpressive P-values, the study was the single-best “aligner” with the TF binding data.

Despite the fact that all the TF-binding data was derived from human cells, 12 of the 50 studies above are mouse studies. This seems like a high number. Unlike human studies, mouse studies are usually conducted in vivo; is it possible that we'd see more TF-binding/transcriptome correspondence if human studies could be conducted in vivo? (No, I'm not bemoaning this state of affairs!)

You can see some repeat offenders, as well as some impressive P-values above. REST is seen five times, while ATF4, AR, TWIST1, and XBP1 are seen twice. Perhaps it’s not surprising that a number of these genes are involved in stress, hormone, or antiviral responses; these are actions that should be taken immediately, without dithering about waiting for confirmatory signals.

Chip-seq binding data can align strongly with studies that don’t seem to directly involve the transcription factor in question. For example, an MDM2 antagonist strongly downregulates LIN9 chip-seq targets (log(P)= -87, see GSE189152). TCF2 kd downregulates STAT2 chip-seq targets (-83, GSE48367). PRDM10 ko strongly downregulates BRCA2 chip-seq targets. Needless to say, there may be therapeutic implications there (-65, GSE135021). Along this line, the BRAF V600E mutation seems to downregulate SREBF2 targets. Sometimes, related TFs seem to regulate each other; SNAI1 oe downregulates SNAIL2 Chip-seq targets (GSE169735). In general, the transcripts perturbed on PRDM10, PRDM1, PRDM6, ATF6, and EZH2 alterations seem to do a fine job of overlapping the binding sites of numerous TFs.

 

 

*You can, of course, tinker with the data at our website. Just enter the gene name or GSE-id in the keyword box in the “Relevant Studies” app, and you’ll find the relevant database ID, which you can use for further investigations (e.g. plug the database ID into the Fisher app).

8/2023: Having greatly expanded our database, here's another TF that strongly activates the genes it binds: znf335.


whatismygene.com 


A Preprint

It has been a while since we posted. That's largely because of the effort put into generating a paper. Check it out on BioRxiv . This is...