As long as we’re in heatmap mode, we thought we’d throw some fairly “conventional” data into R’s “heatmap.2” function. This time, it’s simply the most abundant transcripts found in 61 different tissues. All the data, of course, is found in our database; if your data strongly overlaps with a particular tissue type and you run it through our apps, you’ll certainly know. You can find the underlying data (and a lot more) here: https://www.proteinatlas.org/about/download .
The presence of abundance data in our database is also
useful in another respect. If your data is biased with abundant
transcripts/protein, as opposed to a more typical mix of abundant and rare
entities, and you apply our Fisher app to the data, the app’s output will
be overloaded with abundance studies. This is a bit of a warning. It’s possible
that you simply need to adjust the “background” for your data (e.g. a typical
proteomic study contains around 4,000 proteins…our default background of 20,000
is not appropriate in this case, and you’ll need to tweak it). There may be
other, more problematic reasons for the bias in your data. Alternatively, there
may be a silly mistake…your data was sorted according to abundance. We’ve noted
problems in our own data entries this way (and then fixed them!). We also note
issues in external datasets. Our favorite is probably the GO “ESTABLISHMENT OF
PROTEIN LOCALIZATION TO ENDOPLASMIC RETICULUM” group, which is a wonderful proxy for
the most abundant proteins in human tissue. Of course, a final possibility is
this: your data is legitimately tweaked toward or against abundance. One
phenomenon we note is that cancer tissues often lose, to some extent, the
underlying tissue identity; stomach cancer tissue, for example, will become
less stomach-y, and you may note this via a cancer-related decrease in the most
abundant stomach entities.
One final technical point…if you’re using our co-expression
app, these abundance lists will not be examined unless you manually tweak the “regulation”
feature to “ANY.” In other words, only lists involving up/down-regulated
entities will be examined unless you overrule the “regulation” feature.
Here’s the heatmap, with “values” being –log(P-values),
as generated by Fisher’s exact test, applied to all combinations of cell types:
It shouldn’t be surprising that some extreme P-values were generated. The basic components for metabolism and structure (etc.) are both plentiful and don’t vary much from cell to cell.
The image probably aligns with your own ideas about
similarities between various cell types. I was surprised how cleanly, however,
the cell types clustered into various groups (see the dendrogram on top). The
left-most columns contain cell types that don’t seem to overlap with other cell
types with great significance: testis, liver, parathyroid, and placenta, in
particular, followed by cerebellum, granulocytes, skeletal muscle, thymus, heart,
and intestines. Next, there’s a square of red/orange/yellow color. That’s all
brain tissue: basal ganglia, pons/medulla, spinal cord, olfactory gland,
hypothalamus, amygdala, midbrain, cerebral cortex, corpus callosum, thalamus,
hippocampus. Perhaps it’s interesting that the cerebellum was not found in this
group. Next is a grouping of cell types that don’t overlap any other types with
extreme significance, with a few exceptions (total pbmcs/monocytes,
duodenum/colon, monocytes/dendritic cells). The next strong red/orange patch
consists of gall bladder, vagina, skeletal muscle, cervix, prostate, fallopian
tube, endometrium, and bladder. The next red/orange patch contains t-cells, NK
cells, b-cells, pbmcs, and the spleen. Next, the appendix, lymph nodes, and
tonsils group together strongly.
Examining the underlying data, the weakest overlap belongs to the cerebellum/liver pairing, with a P-value that doesn't even reach 0.05.
Oddly, the midbrain and the amygdala match up with the
rectum fairly significantly!
*Note to self and anybody who 1) doesn’t think my heatmap is
utter garbage and 2) would like to do something similar. Here’s the code that
generates the colors:
col =
c("navy","blue","dodgerblue","lightskyblue","palegreen","yellow","orange", "red")
breaks <- c(0, 2, 12, 25, 50, 100, 150, 200, 325)
heatmap.2(blah, blah, col = col, breaks = breaks, blah,
blah)
I like this approach because it’s easy. Just make sure you’ve
got one more break than colors (above there are 9 breaks and 8 colors). Of
course, if you must make a gradient, you can’t use this easy method. In any
case, here’s a nice color “cheatsheet”: https://www.nceas.ucsb.edu/sites/default/files/2020-04/colorPaletteCheatsheet.pdf
. The bottom of the sheet contains names for something like 600 colors that you
can plug in as above.
No comments:
Post a Comment