Wednesday, February 17, 2021

Underrated Genes

Nature magazine recently published a study of...biological studies. A number of questions were asked, one of them being, “which genes are most represented in the literature?” Not surprisingly, TP53 is the champion, with 9,232 publications. It’s a good read.

A question not addressed is, “What are the most under-represented genes in the literature?” Of course, it’s trivial to find genes that have no mentions at all. What we can do, however, is use our own database and ask, “Which genes have the largest disparity between inclusions in our database and inclusions in the literature?” The exercise is simple on its face, but there are a number of technicalities that make it a bit tricky. If we were writing an academic paper, we’d have to do 100X the work we’re putting into this post. Basically, though, our procedure works like this:  Download a list of genes ranked according to literature mentions. Convert these gene IDs into the format used in the database. Generate a frequency table of all genes in our database. Compare the frequencies in our database against the frequencies in the literature.

The list of genes according to literature mentions is found here: ftp://ftp.ncbi.nih.gov/gene/GeneRIF/generifs_basic.gz

With the understanding that there are a number of ways in which results can be skewed, here’s a list of the most under-rated players in the genomic universe:

RTP4

VSIG2

CLIC6

FAM198B

HIST1H2BD

MT1L

MOXD1

CENPK

ANKRD37

CMBL

PLBD1

TUBA1C

ARHGAP11A

TMEM154

HIST1H2BI

NMES1

TMEM140

PKIA

ADGRL2

KBTBD11

NT5DC2

C15orf15

RSL24D1

RPL27A

FAM49A

PGM5

RGL1

CLMN

EVI2A

TFEC

RPL18A

RPL21

SRM

CALML4

OLFML2A

RPS8

ENDOD1

KDELR3

RPS11

GNG4

TMEM56

SH3BGRL2

CIART

ENPP5

GBP6

RSRP1

COX6A2

GPRIN3

GPRC5C

TMEM71

NRIP3

MFAP3L

CPNE2

ABLIM3

SMIM14

HIST1H2BM

SLC46A3

EVI2B

PCP4L1

TRNP1

GBP4

SLC16A14

RBP7

SLFN13

FAM84A

RAPGEF5

TM6SF1

NSG2

VAT1L

EPPK1

RPL27

DNAJA4

PGAM2

TTC39C

TRANK1

GBP7

N4BP2L2

MEGF6

CDH19

FIBIN

TINAGL1

CCDC3

LONRF2

DDX60L

MXRA7

GPR137B

CENPV

GNG12

CCDC85A

GRAMD3

FAM105A

STRBP

ZNF608

KIAA1551

LRRC2

UAP1L1

MEGF9

EPB41L4A

PLEKHA4

METTL7B


RTP4 is the champion, with few mentions in the literature but more than 700 appearances in our database. Googling RTP4, it seems that there’s no dearth of studies on this gene, but we’re sticking with the above NIH list of literature mentions. Next on the list is VSIG2. A Google search does seem to indicate that nobody cares about this sad gene. It’s hard to even get a clue as to its function.* Nevertheless, it appears 699 times in the database; perturb a cell and there’s a decent chance you’ll alter VSIG2 expression.

We ran a Fisher analysis of an extended, 500-ID list of undervalued genes against our entire database. As might be expected, there’s no massive enrichment for any particular group. There does seem to be a tendency for genes with short transcripts and genes that are depleted in P-bodies to be represented on the list (unadjusted log(P-values) of -7.5 and -5.8). Eyeballing the list, a number of ribosomal proteins can be seen. Perhaps folks view the ribosome as a big unified glob, and don’t care to tinker with its individual components.

The opposite task, that of generating a list of “overrated” genes, is even trickier, and we won’t bother with it here. In the end, genes like TP53 would dominate the list and, given TP53’s role in cancer, labeling it “overrated” or “overstudied” would hardly be fair.

 

*Let’s say you want to know about VSIG2’s function. You can use our tools. First, you enter VSIG2 into our Coregulation app. You’ll get a list of coregulated genes. Take that list and enter it into the Fisher app. To spare you this [minimum] trouble, the swarm of genes with which VSIG2 is coexpressed looks to be hugely involved in the cell cycle, altered by a large array of common drugs (e.g. glucosamine), and also relevant to viral infections. Using the coregulation tool alone, individual genes that are strongly coexpressed with VSIG2 include TRIB3, CHAC1, ASNS, and many more. You can also note that CA9, FAM111B, NREP, and more have a fairly strong tendency to be expressed in the opposite direction to VSIG2 (i.e. when VSIG2 is up, CA9 tends to be down).


whatismygene.com 


No comments:

Post a Comment

T-cell Exhaustion

"T-Cell Exhaustion" is associated with an inability of the immune system to fight off cancer and other diseases. We grabbed 7 mark...