When focusing on cell types, we could make a list of the most
abundant transcripts in particular cell types. We could also focus on the
proteome. We could even ask, “what are the common transcripts/proteins that are
rarely seen in a particular cell type?” We could search for cell type markers
that are rarely or never seen in other cell types, even if these markers are
not particularly abundant in the cell type of interest. Our database is
chock-full of the above sorts of lists.
There’s another sort of list we are able to prepare, largely
because the sheer size of our database affords the opportunity. The largest
portion of the database falls into the category of “perturbation studies”
wherein cells are perturbed via drug, knockout, heat, whatever. We can thus ask
the question, “what transcripts/proteins are most commonly perturbed in a
particular cell type?” We can also ask which entities are least frequently
perturbed in a particular cell type. This is not a question of abundance or of
“uniqueness” to a particular cell type. Rather, we’re focusing on the entities
that fluctuate when you tweak a particular sort of cell.
Pulling data from about 10,000 studies, we’ve constructed
these lists for 20 different cell types: brain, liver, skin, muscle,
lymphocytes, stem, kidney, breast, colon, prostate, heart, lung, intestines,
glands, pancreas, dendritic, ovaries, adipose, fibroblasts, epithelial. Why not
other cell types in our database? That’s primarily because the above 20
designations are the most common in our database; we required at least 100
studies for each cell type. We could have also included “blood” as a category,
but we chose to break it down further into two common subtypes: lymphocytes and
dendritic cells. Some other choices were somewhat arbitrary (we have a lot of
macrophage studies…why didn’t we include them?) Note also that some of these
cell types can overlap….skin and liver are different tissues, but skin can
contain stem cells, and breast cells can be epithelial. For this initial stab
at the “perturbome”, this isn’t a problem.
With the 20 cell types, we generated 40 lists, as each cell
contains entities that are frequently perturbed, as well as entities that are
rarely/never perturbed; two lists per cell type. Entities that are “rarely
perturbed” most likely are simply never expressed in the particular cell type,
though it is possible that they are indeed expressed, but it’s difficult to
tweak them; we don’t discriminate between these two cases.
If you’re interested, the dirty details are as follows: We
first generated a list of all genes found in the above studies. We then simply
counted their occurrences in the above 20 cell types. We then used the binomial
distribution to calculate how significantly a particular gene may be
over/under-represented in a particular cell type. The “probability” input for
the binomial distribution (which is .5 if you’re talking about coin flips) is
calculated by dividing the total genes perturbed in a tissue (e.g. brain) by
the total genes perturbed over all 20 tissues. Liver, for example, constitutes
7% (.07) of all genes in our database’s perturbome. Thus, if you know that gene
ABC is found 100 times in our liver studies, and 500 times over all studies,
you’re equipped to perform a probability calculation. In the final step, we
simply rank genes according to these probabilities, making sure to discriminate
between significance generated from an excess, versus depletion, of a
particular gene.
So…what did we find? First, what is the most commonly
perturbed gene over all tissue types? The answer is EGR1, perturbed 1581 times, followed by SERPINA3,
IFIT1, GDF15, and FOS. What is the most common gene which was never perturbed
in a particular tissue? That distinction belongs to LUM (lumican), which was
never perturbed in dendritic cells, despite being altered a total of 758 times
over the other 19 tissues. Perhaps dendritic cells are adamant that they not be
confused with other cell types that express lumican, which is largely an
extracellular protein.
Of note, GAPDH, commonly used as a housekeeping control gene, was only perturbed 358 times. Actin-B was seen 610 times. Our gene lists primarily reflect perturbations, not abundance.
Below is one of the more obscure gene tables you’ll ever
stumble across. It details the top genes that were never perturbed in
particular cell types. The table is ordered by the count over all other
tissues; thus, the cell types at the bottom of the table express a large array
of transcripts/proteins.
CELL TYPE
|
GENE
|
COUNT OVER OTHER TISSUES
|
dendritic
|
lum
|
758
|
adipose
|
hopx
|
603
|
intestines
|
nav2
|
591
|
ovary
|
lam1
|
481
|
heart
|
jdp1
|
469
|
prostate
|
pck1
|
456
|
gland
|
hpp1
|
435
|
pancreas
|
cd38
|
422
|
muscle
|
slc6a14
|
405
|
colon
|
kcnk2
|
339
|
fibroblast
|
c1orf116
|
329
|
kidney
|
scca1
|
312
|
skin
|
sizn
|
230
|
brain
|
ugt2b15
|
205
|
lung
|
gpr37l1
|
183
|
breast
|
miat
|
175
|
stem
|
cyp2c9
|
164
|
liver
|
blcap
|
149
|
epithelial
|
ces1g
|
121
|
Another question: what are the genes that were uniquely
perturbed in particular tissues? The champion is probably GM1818, a mouse gene
that was perturbed 21 times in the brain, but never elsewhere. For human genes,
we have FAM90A7P, which was perturbed 16 times in the brain, and never
elsewhere. The brain, in fact, seems to have the largest number of uniquely
perturbed genes by a large margin; the first case of a non-brain gene that was
uniquely perturbed was the mouse gene AI132709 (liver), which was tweaked in 8
studies in our database…201 brain-unique genes are tweaked at least as frequently. The
lncRNA Lnc-CHSY1-3 was uniquely expressed in
lymphocytes, albeit with a mere 4 occurrences.
The special status of the
brain is also seen in the heatmap below. We took our 40 perturbation lists and
performed Fisher’s exact test on all combinations of lists, for a total of 760 P-values.
If the image is too small, you could click on it to get a bigger view. The first row is labeled “LO_COL”, which means “genes that were least
frequently perturbed in the colon.” Hopefully the other 39 labels are self-explanatory.
The color key shows the –log(P-values). Combinations with very significant
P-values tend to make sense…highly perturbed genes in ”breast” and “gland”
overlap with extreme significance, as do non-perturbed genes in the lymphocyte/dendritic
categories, and perturbed genes in the colon/intestine. There are, however, some
very interesting overlaps that might not be so intuitively obvious. For
example:
1) genes that are rarely
perturbed in the brain are rarely perturbed in stem cells.
2) genes highly perturbed
in glands are rarely perturbed in the brain.
3) genes highly tweaked
in the breast are rarely tweaked in stem cells.
4) genes that are rarely
perturbed in the brain are also rarely perturbed in muscle and lymphocytes.
5) looking at the “high_BR”
(highly perturbed in the brain) group, the best matching “highly perturbed”
cell type would be “stem”, with a –log(P-value) of about 4. This is a
bit of a cheat, since stem cells and brain cells are not exclusive (i.e. some
brain cells are stem cells). In truth, then, highly perturbed genes in brain cells do not overlap with the highly perturbed genes of “pure” cell types with any significance.
6) unlike the brain, the
rarely perturbed genes in some tissues don’t overlap rarely perturbed genes in
other tissues with great significance. For example, the rarely perturbed genes
in the pancreas don’t overlap with rarely perturbed genes in other tissues with
any amazing significance; the best match, in fact, would be to intestines, with
a -log(P) of 7.
You can tinker with the
data yourself at whatismygene.com. The table below gives you the dbase IDs that
allow you to perform operations with our various apps.
DBASE ID
|
CELLS
|
132346123
|
most frequently perturbed in the
brain
|
132346124
|
least frequently perturbed in brain
|
132346125
|
most frequently perturbed in the
liver
|
132346126
|
least frequently perturbed in the
liver
|
132346127
|
most frequently perturbed in skin
|
132346128
|
least frequently perturbed in skin
|
132346129
|
most frequently perturbed in muscle
|
132346130
|
least frequently perturbed in
muscle
|
132346131
|
most frequently perturbed in
lymphocytes
|
132346132
|
least frequently perturbed in
lymphocytes
|
132346133
|
most frequently perturbed in stem
cells
|
132346134
|
least frequently perturbed in stem
cells
|
132346135
|
most frequently perturbed in the
kidney
|
132346136
|
least frequently perturbed in the
kidney
|
132346137
|
most frequently perturbed in the
breast
|
132346138
|
least frequently perturbed in the
breast
|
132346139
|
most frequently perturbed in the
colon
|
132346140
|
least frequently perturbed in the
colon
|
132346141
|
most frequently perturbed in the
prostate
|
132346142
|
least frequently perturbed in the
prostate
|
132346143
|
most frequently perturbed in the
heart
|
132346144
|
least frequently perturbed in the
heart
|
132346145
|
most frequently perturbed in the
lung
|
132346146
|
least frequently perturbed in the
lung
|
132346147
|
most frequently perturbed in the
intestines
|
132346148
|
least frequently perturbed in the
intestines
|
132346149
|
most frequently perturbed in glands
|
132346150
|
least frequently perturbed in
glands
|
132346151
|
most frequently perturbed in the
pancreas
|
132346152
|
least frequently perturbed in the
pancreas
|
132346153
|
most frequently perturbed in
dendritic cells
|
132346154
|
least frequently perturbed in
dendritic cells
|
132346155
|
most frequently perturbed in
ovaries
|
132346156
|
least frequently perturbed in
ovaries
|
132346157
|
most frequently perturbed in
adipose tissue
|
132346158
|
least frequently perturbed in
adipose tissue
|
132346159
|
most frequently perturbed in
fibroblasts
|
132346160
|
least frequently perturbed in
fibroblasts
|
132346161
|
most frequently perturbed in
epithelial cells
|
132346162
|
least frequently perturbed in
epithelial cells
|
We’re not finished with our dissection of the perturbome. We’ll
resume the discussion in a couple weeks.
whatismygene.com