WhatIsMyGene: April 2025

I'll add to the below list as thoughts pop into my brain....

*"Celebrity" genes are over-rated. Last I looked there are something like 10,000 papers primarily devoted to tp53. Every now and then I stumble across a knockout that has not, to my knowledge, been performed before. One might think that such knockouts would be less likely to generating a long list of significantly perturbed transcripts than, say, a tp53 knockout. I just entered a study involving a CHSY3 knockout into the database. That's the first instance of a CHSY3 perturbation in the database, and the knockout had a major effect on transcript abundances in the underlying study. This sort of thing happens again and again...it's not as if the list of genes whose knockout strongly alters cell activity was exhausted a decade ago.

*Our understanding of biology is strongly biased according to the order of discovery. As an example, it seems that folks have a fairly fixed idea of what micro-RNAs do, if they do anything at all. When perusing micro-RNA overexpression and inhibition studies, the studies in which large numbers of transcripts are significantly altered (versus few or none) usually seem to involve the symbol "mir" followed by a small number, not a large number (e.g. mir1 vs mir1234). This may seem odd until you consider that, in the early days of miRNA research, new mirnas would simply be given the first integer that had not already been taken. In other words, the early mirnas, which were discovered because they actually did something in cells, may have created the illusion that there may be thousands of interesting mirnas, all of which act according to the principles associated with the earliest mirnas.

*Also, regarding miRNAs: it's possible that, at "ground truth" level, the typical miRNA only targets one or a few transcripts (1). This is based on an observation I haven't quantified: that miRNA overexpression and inhibition studies, in contrast to these experiments conducted on ordinary transcripts, often seem to strongly alter the expression of one or a handful of transcripts, followed by a clear drop-off in significance and/or fold-change.

*Similarly, it's possible that in a typical lab-generated list of significantly altered genes, relatively few matter. That is, a large portion of these transcripts or proteins are basically junk, possibly generated to maintain the proper concentration of RNA and/or protein. I base this on admittedly flimsy evidence (4): that if you take a large database of perturbations (e.g. WhatIsMyGene's), generate all study/study overlap P-values, and cluster the data, you might be surprised at how few clusters you generate (using standard methods to determine optimal cluster numbers...e.g. the "elbow" method). To put it overly-dramatically, one study looks like the next. How can sophisticated biological decisions be made in that case? Because a lot of the sameness between studies is not interesting, while a handful of truly interesting genes make a big difference.

*Some genes may reach celebrity status because they are exceptions to a rule: that one type of study (e.g. knockouts) usually overlaps weakly, if at all, with other types (e.g. chip-seq). P53, for example, breaks the rule: the genes altered in P53 knockouts often do overlap with P53 chip-seq results. Of course, P53 also has the property that it is often mutated in cancer, so another possible "truth" is that genes that break the rule are precisely the ones that get targeted by viruses or cancers.

*Modern biology is hugely skewed by the results of experiments in which cells are essentially blasted with extreme conditions that are rarely, if ever, experienced in normal living creatures...complete knockouts, targeted alterations of single genes, micro-RNA levels 100X greater than anything ever experienced in nature, etc. These conditions are very often measured over extremely short time scales, largely due to the limitations of cell culture approaches. It is possible that these approaches have skewed biology in massive fashion. Again, consider Alzheimer's, a disease whose seeds may be planted decades before symptoms become obvious...I've yet to see any sort of experiment, either with mice or cell culture, that parallels the set of genes that are typically upregulated in Alzheimer's (2). Perhaps this is simply because of the near-impossibility of conducting experiments over the course of decades.

*OK, maybe more of a critique than a "truth": let's say you do a chip-seq experiment using transcription factor XYZ. You collect the list of most-strongly bound genes and test your results against a parallel XYZ knockout experiment. Let's assume your background figures are good. You perform Fisher's test and get -log(P) = 4. Should you be impressed with this statistic? Should you perform further experiments based on this very significant number? I say no. If you had tested your chip-seq list against 150,000 other lists, you may have found 5,000 lists that out-performed your knockout study. In fact, after correction for multiple testing, your knockout results would be rendered insignificant. Yes, I'm plugging WhatIsMyGene.

*Maybe the "replication crisis" is a bit exaggerated. Cells and organisms are simply very sensitive to seemingly minor differences in experimental settings. Perhaps stochastic effects are much more powerful than we generally believe (3). Let's say you knockout gene ABC in mouse kidneys in your lab. Somebody else does the same thing in their lab. The results overlap weakly. Did somebody screw up? Maybe not. Using a tool like WhatIsMyGene, you may find that both studies nevertheless overlap rather nicely with a third study involving gene XYZ. (Also, recall the above point that maybe only a few transcripts really "matter" in list of significantly altered genes, with the rest being relatively unimportant).

*Alzheimer's has some relationship to stem cells. I say this because time and time again the genes downregulated in Alzheimer's overlap strongly with studies involving stem cells and embryonic cells. The problem is...brains don't have a lot of stem cells, especially if you exclude the SVZ. I don't know how to work around this issue...perhaps some brain cells, neurons in particular, have a stem-like signature but lack the standard markers for stem-cells.

*A bit more speculatively, Alzheimer's may also have some connection to the appendix and appendicitis. In addition to papers suggesting a link (google it), I'd also point out an odd overlap between our own list of transcripts typically up-regulated in the Alzheimer's brain and a list of transcripts up-regulated in the mouse distal colon following appendicitis (GSE23914). The significance is not impressive...but it's difficult to find any studies that overlap strongly with those up-regulated in the human Alzheimer's brain. The study ranks as the 332nd best match to the Alzheimer's list (against 145,000 other studies), competing against studies primarily derived from human cells and brain cells.

*Some in-vivo studies may produce distorted results because of the time of day at which they cut open their subjects. I'm looking at a 12 week study where mice were treated with control vs drug. Out of 145,000 studies examined, the best match (at p=10^-45) would be one that examined mouse livers at zt21 vs zt12. It turns out that the drug in question, minocycline, actually does recalibrate circadian rhythms. But...how often is the possibility of circadian effects totally ignored?

*Methylation of DNA may do more than repress or activate transcription. It may also regulate consistency/variability in expression. I have admittedly shoddy evidence for this notion: a list of genes that do not commonly correlate with batch effects is replete with genes that are often seen in DNA methylation experiments (see Quantifying batch effects for individual genes in single-cell data) .

1. Just as an example, a study in which mir138 was inhibited (GSE173982) results in very significant downregulation of a single transcript, NDUFA9. Another one...mir-222-3p treatment results in the very significant downregulation of a single transcript (Gm10925) in GSE167753. Another one: mir144-3p inhibition in stress susceptible mice results in significant downregulation of a single gene (kcnj8) in GSE209673. Also, in GSE211749, only one transcript is downregulated with strong significance (zgrf1) on a triple miR-322-503-351 ko in white adipose. Also, in GSE216981, mir150 knockout downregulates tnfrsf26 at a significance of 10^-204, while the next most significant alteration comes in at 10^-20.

2. Downregulated genes in Alzheimer's are a different case...these are seen in many kinds of perturbation and clustering experiments involving brain tissue.

3. Another issue is this: if the only way you can replicate a study is via extreme rigor, how generalizable/interesting are your conclusions about your gene of interest?

4. Here's some more evidence: if you take two mouse strains and compare transcriptomics from a particular organ, you'll get a long list of differentially regulated genes...it wouldn't be surprising to see more than 50% of transcripts significantly altered. Thus, you have very different transcriptomics, but a very similar product...a mouse. One could surmise, therefore, that most of these transcripts aren't doing anything.

whatismygene.com

WhatIsMyGene

Friday, April 25, 2025

Stuff that might be true

Gene Order in Gene Lists

Report Abuse