Monday, November 23, 2020

A New Tool: Intersect

We've added a new tool that we simply named "intersect." This tool allows you to find the genes that are common to two sets.

If you enter two gene sets of the same type (and click the "same type" box), you'll receive the intersecting genes. This could easily be performed in an Excel spreadsheet using, for example, the "vlookup" function. In this case, you'll receive output fairly quickly because there's no need to dip into any database of gene IDs.

However, this tool can be used for two situations that would be problematic for Excel. First, you can find the intersection of two gene sets that have different ID types (e.g. ENSG0000012345 and STAT6). The algorithm is simple, but requires a fair amount of processing: convert both the gene lists to the format used in our database, then find all common IDs, and then re-convert the IDs to uniprot (e.g. STAT6) format. If a uniprot ID is not available for a particular gene, we may output a different format. Second, you can use our own database IDs as input. This way you can find genes in common with your own set and the sets found in our database. Given the processor-intensive nature of the tool, we limit the size of a list of genes to 400 IDs, unless the "same type" box is clicked. 

Ideally, we'd output the intersecting genes any time you use, say, our "Fisher" tool. Again, though, this is problematic.

Have fun with the tool. What happens if you enter a list of human genes and a list of c elegans genes? You'll get a list of orthologs that are found in both sets, output as human uniprot IDs. We can't claim the process is infallible as, for example, our database may not have a absolutely complete list of c elegans genes that are orthologous to human genes. But it should work quite well.


whatismygene.com 


 

No comments:

Post a Comment

T-cell Exhaustion

"T-Cell Exhaustion" is associated with an inability of the immune system to fight off cancer and other diseases. We grabbed 7 mark...