Why isn’t the gene enrichment tool you’re using spurring further insight and hypothesis? Or, worse yet, is it possible the gene enrichment tool you’re using is spurring unjustified insight and hypothesis? Without mentioning specific tools, here are some potential causes:
*With some standard tools, relatively few studies may be
combined to create a “canonical response” list of genes, with a majority of
relevant studies being ignored or excluded. Why? I note, for example, that
relatively few studies are embodied in a particular “response to hypoxia” list,
whereas there are certainly more than a hundred deep transcriptomic/proteomic
studies on the subject.
*Conversely, it’s possible that the response to hypoxia, for
example, is strongly dependent on cell type. In this case, a researcher may fail
to recognize that a particular drug indeed induced a hypoxia response if he/she
compares results against a “canonical regulation” list. It might have been
better to examine a single previous study that more closely mirrored the
researcher’s own study setup.
*”Low quality” genes may be excluded from canonical lists.
That is good, of course, and one assumes that there are reasonably stringent
criteria for such exclusion. If however, a “low quality” gene (for example, a
probe that lacks a well-described transcript) appears again and again in
studies of, say, hypoxia, perhaps it’s more relevant than had been thought.
*Researchers are not immune to fads in their own fields.
I’ve seen it myself…RNA-seq is performed and the 25th most
significantly altered transcript is selected for functional studies, not the
first.1 Why? It could be because the 25th gene rings a
bell for the researcher. Or…it fits his/her preconceptions as to what should be
altered in the experiment. Or…it’s a “hot” gene that would be more likely to
draw in grant money. Or…it’s easy to study, as antibodies against the protein
are already in the lab freezer. Or…the 25th gene is the subject of
previous studies, making it easier to formulate a hypothesis for its
involvement in a process. What other factors unrelated to biological
significance cause researchers to mention one entity versus another in papers? The
point here is that the folks who screen studies for genes that can be
incorporated into lists will be victims of these biases.
*To what extent are human screeners subject to their own
biases? Do they examine supplemental data?2
*A single study may contribute an excess of data to a
transcriptomic database. You could examine the effects of viral infection on a
cell line at 1, 2, 3, 4….72 hours and compare the transcriptomic results
against controls for each timepoint. Such studies could inflate the size of a
database to an impressive degree. However, does insight follow? Does one really
expect that the result at 16 hours is going to be interestingly different from
the result at 18 hours? Inclusion of multiple highly similar studies will also
confound large-scale co-expression analysis (e.g. gene ABC could be lumped
together with gene XYZ 72 times, even though the two genes aren’t associated in
other studies, in other cell types, under other infection conditions).
*Rare entities may be excluded from canonical lists.
Consider two transcripts. Transcript ABC is upregulated in hypoxia in 6 out of
10 studies. ABC is abundant and also tends to be altered in numerous
non-hypoxia studies. That is, if you perturb a cell, there’s a good chance
you’re perturbing ABC. Transcript XYZ, which may not even be represented by
probes in some microarrays, is upregulated in 2 out of 10 hypoxia studies. It’s
never mentioned in the body of hypoxia papers, and it’s rarely seen in
non-hypoxia studies. Shall we exclude XYZ from our list of transcripts altered
in hypoxia?
*Some enrichment tools do not incorporate estimates of the
“background” of an experiment. Even if a background is incorporated, shall we
assume that all gene ontology lists share the same background? As we’ve noted
previously, some of these lists are heavily overloaded with extremely abundant
proteins/transcripts. In these cases, it would appear that the genes that
compose these lists are more likely to be drawn from a pool of 2,000, as
opposed to 20,000, possible genes. In other cases, a gene ontology list does
not over-represent abundant entities, meaning that a background of 20,000 might
be appropriate for comparison against your own list of genes.3,4
*You add a drug to cell culture and perform transcriptomics
against controls. Performing “pathway analysis” on your list of up- and
down-regulated transcripts could certainly prove insightful. However, is that
all you wish to do with respect to enrichment analysis? Bear in mind that your significantly
altered transcripts may be more likely to be bundled in “modules” than in groups
of genes found in particular pathways. In other words, your transcripts may
contain a large dose of genes downregulated in autophagy, a moderate dose of
mitochondrial process, a smattering of genes upregulated in antiviral response,
and a heavy dose of genes upregulated in an esoteric process that isn’t even
represented in popular gene ontology lists. If there are other studies that
match up with your results, will you know?
*Try entering a standard gene-enrichment list (GO, Reactome, whatever), into our Fisher app. Despite the fact that a majority of the lists in our database are derived from individual studies, not mere copies of gene-enrichment lists that other folks have created, you'll probably find that the output is dominated by other gene-enrichment lists (be sure to set the "regulation" filter to "Any"). Basically: GO lists (and the like) best overlap with other GO lists, not data generated from studies involving specific tests versus controls.
The solution to the above concerns is not necessarily
tricky. All you need is a database of results from specific studies, as opposed
to (or in addition to), compiled lists of genes. To maximize the chance that
your own results will strongly align with results from another specific study,
the database should be large. This large database should contain a roughly
randomized set of studies versus, say, a strong focus on cancer. Inclusion of
multiple results from a single study should be avoided. Rare and/or
uncharacterized genes should not be eliminated without very good cause.
The above describes our database fairly well. Have we fully
eliminated all the above concerns? No. In addition to specific studies, we do
offer some compiled lists, described in some of our previous blog posts. On
some occasions, we do include multiple results from one study. However, we take
steps to make sure that such studies do not confound results from our
co-expression app.
1) Yes, I’ve got a particular study in mind. In fact, the
single most significantly altered transcript was not even mentioned in this
study.
2) Plenty of biologists believe that confirmation of a
protein alteration requires a Western blot. The mass-spec community scoffs at
this, believing that blots are vastly inferior to MS and antibody studies are a
waste of time if MS is performed properly. I side with the MS folks. In any
case, though, where do the screeners draw their particular lines? Even if
they’re consistently following particular criteria, can we assume the criteria
is reasonable?
3) If this bit seems difficult to understand, my apologies. It might help to bear in mind that Fisher’s exact test, or similar tests, require a “background” figure. Strictly speaking, this should be the intersection of ALL identified entities in study A with ALL identified entities in study B, regardless of metrics like significance and fold-change. This is not so difficult if you’re comparing results from two studies that used, say, the same brand of microarray. But what if Study A is generated by compiling multiple studies, or if study A is generated by humans who screen papers for genes involved in various processes? What is the sum of all identified (not simply "mentioned") transcripts/proteins in study A? This gets tricky. Things get particularly tricky if the process of compilation results in an excess of highly abundant entities. And we certainly do see cases where abundant entities are strongly over-represented.
4) If you've tinkered with Fisher's exact test, you know that small/moderate errors in the background figure don't necessarily make much difference. However, some potential errors go way beyond the "small/moderate" level. In gene enrichment analysis, the output often consists of a list of enriched groups ranked from most to least significant. Here, one naturally pays most attention to the top ranked groups. In the case of a significantly tweaked background, however, perhaps the top-ranked study should really belong at the 20th position.
No comments:
Post a Comment