The answer is drawn from HSF1 Inhibits Antitumor Immune Activity in Breast Cancer by Suppressing CCL5. Here, the list of genes downregulated in the c4-2 line upon dthib treatment overlaps significantly with about 10,000 other lists. Unlike GO lists and the like, it's not immediately obvious what dthib treatment is expected to do. A few seconds of googling reveals that it's an HSF1 inhibitor. Studies that intersect with extreme significance involve a diverse array of perturbations: hdac5 knockdown, her2 inhibition, CDK expression, fgf1 treatment, and many more. The highest ranking GO list comes in at position 1595: "GO:0022402 cell cycle process." It seems that the dthib list encapsulates some process far better than the GO list does. If the easily-grasped wording of the GO lists is appealing to you, then we could just rename the dthib list something like this: "WIMG:0000001 DTHIB downregulated." 😀
Interestingly, the dthib study intersects very strongly with our list of genes that are rarely downregulated in cancer. The drug is indeed being studied as a cancer treatment.
How about the study that intersects second-best with all other lists in the database? Actually, this is not so easy to determine. That's because the second best, and even 500th best study, intersects with the dthib theme. Therefore, we add a requirement: the list we deem to be second-best cannot overlap with the dthib list with a significance greater than -log(P) = 20. 20 might seem like a very liberal cutoff, but in the case of the dthib study, for example, there are 5924 lists that overlap with at least this level of significance. Given this requirement, genes upregulated in mouse plantaris muscle one day after synergist ablation (Time course of gene expression during mouse skeletal muscle hypertrophy) wins the silver medal. There's actually quite a drop-off from the dthib study here, with only about 5% of database studies significantly overlapping. Again, it's not so obvious what's going on in this study. For a clue, the highest ranking GO list (#1301) is "GO:0006955 immune response." Studies involving viral infections, adjuvant treatment, ischemia, various injuries, and radiation treatment match strongly. Again, to our way of thinking, the sheer volume of studies outperforming the GO list suggest a process, however murky or difficult to name, that should be considered "fundamental."
The third best list cannot overlap with the first or second-best list at -log(P)>20. These are genes upregulated in mouse medullary epithelial cells on raver2 knockdown (Aire-dependent transcripts escape Raver2-induced splice-event inclusion in the thymic epithelium). There is an impressive variety of means to recapitulate this result: lncrna over-expression, enhancer repression, various diets, aging, ezh2 over-expression, mettl3 knockout, etc. The best ranking GO list (#339) is "GO:0046649 lymphocyte activation." Do you think this GO list really captures what's happening here?
The fourth best list involves genes upregulated in the a549 cell line on IRF1 overexpression. Simply knowing that IRF1 is "interferon response factor 1" lets us know that we're talking about the innate immune response. Indeed, studies involving infection and interferon treatment dominate the top-ranked intersecting lists. Finally, a category that looks something like what we were taught in college! Nevertheless, the highest-ranked GO list comes in at position 749: "GO:0140546 defense response to symbiont."
The next three lists are these: 5) genes downregulated in the hair-m line on 12 hours copanlisib treatment (Copanlisib synergizes with conventional and targeted agents including venetoclax in B- and T-cell lymphoma models), 6) genes upregulated in the hn4 line on ngf treatment (Nerve growth factor (NGF)-TrkA axis in head and neck squamous cell carcinoma triggers EMT and confers resistance to the EGFR inhibitor erlotinib), 7) genes downregulated in rat lumbar dorsal spinal cord on injection with coronavirus p65-derived peptide (A human coronavirus OC43-derived polypeptide causes neuropathic pain). Some quick notes: 1) the best GO match to the copanlisib study comes in at rank #2717, 2) the NGF study matches up nicely to numerous tgfb treatment studies, simplifying conceptualization a bit and 3), the coronavirus p65-derived peptide study aligns well with numerous studies involving sub-cellular organization.
I ran the above text through a chat bot, hoping that it could return my words in a more succinct, elegant, or insightful form. It often works, but not this time. Thus, to wrap things up, I once again offer this: curated gene lists (CGLs, like GO) suck. It's difficult to imagine the number of experiments that never were performed and the potential insights that have been lost because of misleading and/or un-insightful CGL outputs. More generally, I think biology really suffers from an over-enthusiasm for categorization. On the positive side, there's plenty of room for improved delineation of patterns and processes in biology.
No comments:
Post a Comment