GeneDistiller - technical documentation

Gene prioritisation

This figure depicts different ways how genes can be prioritised. There are model-free approaches that simply rely on gene length or number of interaction partners and approaches based on the researcher's assumptions. Data sources incorporated into GeneDistiller are shown as yellow boxes.

GeneDistiller supports the combination of different approaches and allows either an automatic prioritisation or simply prints all data selected by the researcher to an HTML page including hyperlinks to the external data. The HTML page can be printed or saved for later reference to the results. GeneDistiller can also be used to examine relevant information for genes obtained by other applications to validate the findings manually.

Updates (data)

GeneDistiller is updated on a regular basis, using the data sources described in the main manual. Gene-specific data is linked to the genes using the Entrez Gene ID, if applicable, or via the ENSEMBL gene ID. All positional data is using NCBI Build #36.3, reference sequence (Celera positions are ignored). Below the output, the last update date for every entity is shown.
Due to limited resources, old entries are not stored - i.e. it is not possible to restore past results.

Updates (software)

The software is updated whenever errors had to be fixed or new features are added.

API

Although we do not offer a fully featured API yet, GeneDistiller can already be called from external applications. The are two possibilities, you can either open the query interface with your own data filled in or the results page. The settings are the same for both.

target URL
query interface http://www.genedistiller.org/GD/API.cgi
results page http://www.genedistiller.org/GD/results.cgi

parameter parameter (CGI) values example

sorting
     
order / prioritisation order
meaning value
order by position start_pos
order by gene symbol genesymbol
order by expression similarity expression_similarity
order by interaction score interaction_score
order by Maestro score maestro
order by Mitopred score mitopred
order by tissue-specific expression expression
prioritise with focus on mitochondrial genes overall_score_mitochondrial
prioritise with focus on possible pathways (interaction and expression similarity) overall_score_pathway
prioritise with focus on tissue specifity overall_score_tissue_specifity
order=start_pos

interval definition / candidate genes
     
species (so far only humans - mice are in beta state and will follow) txid 9606 (homo sapiens) txid=9606
chromosome chromosome 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 0
(23: X, 24: Y, 0: MT)
chromosome=21
region start: phys. position start_pos position in bases or Mb (values below 400 are considered as Mb) start_pos=160000000
region end: phys. position end_pos position in bases or Mb (values below 400 are considered as Mb) end_pos=170
region start: start gene (gene symbol) start_gene gene symbol start_gene=rasgrp1
region end: start gene (gene symbol) end_gene gene symbol end_gene=rasgrp1
region start: SNP start_snp dbSNP ID (with or without leading rs) or
Affymetrix SNP ID (complete)
start_snp=36111378
region end: SNP end_snp dbSNP ID (with or without leading rs) or
Affymetrix SNP ID (complete)
end_snp=36111378
region start: microsatellite start_microsat microsatellite name start_microsat=d15s1042
region end: microsatellite end_microsat microsatellite name end_microsat=d15s1042
genes (Entrez Gene ID) gene_no number(s), comma-seperated gene_no=10125,400
genes (gene symbol) genesymbol gene symbols, comma-seperated, % or * as placeholders genesymbol=rasgrp1,lbr
gene type gene_type
all 0
snRNA 1
pseudo 2
rRNA 3
unknown 4
other 5
snoRNA 6
protein-coding 7
tRNA 8
miscRNA 9
gene_type=0

comparison with known genes
     
compare gene expression with known (disease) genes compare_expression 1 if true; compare_expression=1
search for interactions with known (disease) genes compare_interactions 1 if true; compare_interactions=1
comparison with these genes comparison_genes Entrez gene IDs or gene symbols, comma separated comparison_genes=10125,TP53
comparison with these OMIM entries comparison_mim OMIM ID or keyword, comma separated comparison_mim=epilepsy
expression similarity filter expression_similarity_cutoff -1 .. 1
values ~ 1: highly similar
values ~ 0: no similarity
values ~ -1: negative correlation
expression_similarity_cutoff=0.8

display options
     
output as spreadsheet create_spreadsheet empty for HTML
tsv for plain text
xls for Microsoft Excel
create_spreadsheet=tsv
show synonyms show_synonyms 1 if true show_synonyms=1
show NCBI geneRIFs show_generifs 1 if true show_generifs=1
show exons show_exons 1 if true show_exons=1
show transcripts show_transcripts 1 if true show_transcripts=1
show protein families show_proteinfamilies 1 if true show_proteinfamilies=1
show InterPro domains show_interpro 1 if true show_interpro=1
show Pfam domains show_pfam 1 if true show_pfam=1
show paralogs show_paralogs 1 if true show_paralogs=1
show pathways show_pathways 1 if true show_pathways=1
show OMIM record(s) show_omim 1 if true show_omim=1
(OMIM records full length) omim_full 1 if true omim_full=1
show MGD phenotypes show_mgd_phenotypes 1 if true show_mgd_phenotypes=1
(MGD phenotypes full length) mgd_full 1 if true mgd_full=1
show Maestro & Mitopred scores show_maestro 1 if true show_maestro=1
show Gene Ontology show_GO 1 if true show_GO=1
show interaction data show_interaction 1 if true show_interaction=1
show expression data as table
(both options for expression data can be used)
expression_as_table 1 if true expression_as_table=1
show expression data as image
(both options for expression data can be used)
expression_as_image 1 if true expression_as_image=1

highlighting / filtering
     
full-text search in OMIM etc. for these keywords
(will be highlighted)
keywords words, seperated by commas or spaces keywords=epilepsy,neuro,brain
show only genes to which at least one of the keywords applies restrict_to_keywords 1 if true restrict_to_keywords=1
highlight these MGD phenotypes mgd_phenotypes
pigmentation phenotype 1186
tumorigenesis 2006
normal phenotype 2873
no phenotypic analysis 3012
nervous system phenotype 3631
renal/urinary system phenotype 5367
muscle phenotype 5369
liver/biliary system phenotype 5370
limbs/digits/tail phenotype 5371
life span-post-weaning/aging 5372
lethality-postnatal 5373
lethality-embryonic/perinatal 5374
adipose tissue phenotype 5375
homeostasis/metabolism phenotype 5376
hearing/vestibular/ear phenotype 5377
growth/size phenotype 5378
endocrine/exocrine gland phenotype 5379
embryogenesis phenotype 5380
digestive/alimentary phenotype 5381
craniofacial phenotype 5382
cellular phenotype 5384
cardiovascular system phenotype 5385
behaviour/neurological phenotype 5386
immune system phenotype 5387
respiratory system phenotype 5388
reproductive system phenotype 5389
skeleton phenotype 5390
vision/eye phenotype 5391
touch/vibrissae phenotype 5392
skin/coat/nails phenotype 5393
taste/olfaction phenotype 5394
other phenotype 5395
haematopoietic system phenotype 5397
mgd_phenotypes=5386,2006
show only genes to whicht at least one of the MGD phenotypes applies restrict_to_mgd_phenotypes 1 if true restrict_to_mgd_phenotypes=1
highlight these GO IDs (or their children in the GO DAG) go_ids GO IDs, comma-seperated go_ids=5248,6814
show only genes to whicht at least one of these GO IDs (or their children in the GO DAG) applies restrict_to_go_ids 1 if true restrict_to_go_ids=1

gene expression
     
expression conditions (general)      
show expression in a tissue expression_show_TISSUE# 1 if true expression_show_7=1
  expression_operator_TISSUE# >: empty, <: '<' expression_operator_7=%3C
  expression_value_TISSUE# x times of median expression expression_value_7=2
  tissue numbers
heart 27
atrioventricular node 7
cardiac myocytes 14
kidney 29
adrenal cortex 3
adrenal gland 4
wholeblood 78
BDCA4+ dendritic cells 45
CD14+ monocytes 46
CD19+ B cells 47
CD4+ T cells 48
CD56+ NK cells 49
CD8+ T cells 50
721 B lymphoblasts 1
bone marrow 12
CD105+ endothelial 8
CD33+ myeloid 9
CD34+ 10
CD71+ early erythroid 11
appendix 6
tongue 72
tonsil 73
trachea 74
thymus 70
thyroid 71
uterus 76
uterus corpus 77
salivary gland 56
liver 33
lung 34
bronchial epithelial cells 13
lymph node 35
ovary 41
pancreas 42
pancreatic islets 43
placenta 52
prostate 55
skin 58
adipocyte 2
testis 64
testis germ cell 65
testis interstitial 66
Leydig cell (testis) 67
testis seminiferous tubule 68
whole brain 79
thalamus 69
hypothalamus 28
cerebellum 16
cerebellar peduncles 17
amygdala 5
prefrontal cortex 54
cingulate cortex 19
pons 53
parietal lobe 44
pituitary 51
temporal lobe 63
medulla oblongata 38
occipital lobe 39
olfactory bulb 40
subthalamic nucleus 61
globus pallidus 26
caudate nucleus 15
superior cervical ganglion 62
trigeminal ganglion 75
ciliary ganglion 18
spinal cord 60
dorsal root ganglion 21
smooth muscle 59
skeletal muscle 57
fetal brain 22
fetal liver 23
fetal lung 24
fetal thyroid 25
leukemia, chronic myelogenous (k562) 30
leukemia, lymphoblastic (molt4) 31
leukemia, promyelocytic (hl60) 32
colorectal adenocarcinoma 20
Burkitts lymphoma (Daudi) 36
Burkitts lymphoma (Raji) 37
 
shall expression conditions be joined by 'AND' or 'OR' ? expression_logic AND, OR expression_logic=AND
shall genes be filtered according to their expression? expression_logic2
exclusive show only genes to wthich the conditions apply
scoring score only, don't suppress anything
expression_logic2=scoring
(extra-) cellular localisation      
localisations to highlight (children are also highlighted) cell_comp_COMP# 1 if true

COMP#:
cell 5623
cell part 44464
cell fraction 267
PME fraction 1950
membrane fraction 5624
integral to membrane of membrane fraction 299
peripheral to membrane of membrane fraction 300
synaptosome 19717
vesicular fraction 42598
microsome 5792
rough microsome 19718
smooth microsome 19719
intracellular 5622
intracellular part 44424
exosome (RNase complex) 178
proteasome complex (sensu Eukaryota) 502
extrachromosomal circular DNA 5727
cytoplasm 5737
sarcoplasm 16528
sarcoplasmic reticulum 16529
sarcoplasmic reticulum membrane 33017
sarcoplasmic reticulum lumen 33018
junctional membrane complex 30314
cytoplasmic part 44444
cytoplasmic chromosome 229
mitochondrion 5739
mitochondrial chromosome 262
mitochondrial envelope 5740
mitochondrial permeability transition pore complex 5757
mitochondrial intermembrane space 5758
mitochondrial membrane 31966
mitochondrial matrix 5759
kinetoplast 20023
mitochondrial lumen 31980
mitochondrial degradosome 45025
endosome 5768
early endosome 5769
late endosome 5770
recycling endosome 55037
vacuole 5773
storage vacuole 322
lytic vacuole 323
lysosome 5764
endoplasmic reticulum 5783
smooth endoplasmic reticulum 5790
rough endoplasmic reticulum 5791
ER-Golgi intermediate compartment 5793
Golgi apparatus 5794
Golgi stack 5795
Golgi lumen 5796
Golgi-associated vesicle 5798
cis-Golgi network 5801
trans-Golgi network 5802
Golgi transport complex 17119
lipid particle 5811
microtubule organizing center 5815
cytosol 5829
ribosome 5840
large ribosomal subunit 15934
small ribosomal subunit 15935
cell cortex 5938
membrane coat 30117
small cytoplasmic ribonucleoprotein complex 30531
preribosome 30684
cytoplasmic vesicle 31410
mitochondrial cloud 32019
mitosome 32047
microbody 42579
peroxisome 5777
glyoxysome 9514
glycogen granule 42587
yolk granule 42718
contractile fiber 43292
myofibril 30016
smooth muscle contractile fiber 30485
signal recognition particle 48500
inclusion body 16234
intracellular organelle 43229
nucleus 5634
nuclear chromosome 228
nuclear nucleosome 788
nuclear chromatin 790
nuclear envelope 5635
nuclear envelope lumen 5641
nuclear lamina 5652
nuclear membrane 31965
nuclear inner membrane 5637
nuclear outer membrane 5640
nucleoplasm 5654
nucleoplasm part 44451
nucleolus 5730
nuclear microtubule 5880
nuclear matrix 16363
nuclear lumen 31981
cilium 5929
axoneme 5930
glycosome 20015
chromosome 5694
cytoskeleton 5856
cell surface 9986
membrane 16020
outer membrane 19867
external encapsulating structure 30312
cell wall 5618
cell septum 30428
periplasmic space 42597
cell projection 42995
neuron projection 43005
axon 30424
dendrite 30425
nerve terminal 43679
cell projection part 44463
cell soma 43025
axon hillock 43203
perikaryon 43204
type IV protein secretion system complex 43684
apical part of cell 45177
basal part of cell 45178
extracellular matrix 31012
proteinaceous extracellular matrix 5578
collagen 5581
basement membrane 5604
basal lamina 5605
interstitial matrix 5614
fibril 43205
collagen and cuticulin-based cuticle extracellular matrix 60102
extracellular matrix part 44420
synaptic cleft 43083
extracellular region 5576
fibrinogen complex 5577
extracellular space 5615
host cell nucleus 42025
extracellular organelle 43230
extraorganismal space 43245
intercellular bridge 45171
macromolecular complex 32991
symplast 55044
synapse 45202
neuromuscular junction 31594
asymmetric synapse 32279
symmetric synapse 32280
synapse part 44456
excitatory synapse 60076
inhibitory synapse 60077
cell_comp_1950=1
show only genes to whicht at least one of these localisations (or their children in the GO DAG) applies restrict_to_cell_comp 1 if true restrict_to_cell_comp=1
       
prioritisation weights / fine-tuning      
Mitopred score weight_mitopred [real] weight_mitopred=5
Maestro score weight_maestro [real] weight_maestro=5
any keyword found in OMIM title weight_omim_title [real] weight_omim_title=3
any keyword found in OMIM clinical synopsis weight_omim_cs [real] weight_omim_cs=2
any keyword found in OMIM record weight_omim_text [real] weight_omim_text=1
any keyword found in geneRIFs weight_generifs [real] weight_generifs=1
MGD phenotype weight_mgd_phenotype [real] weight_mgd_phenotype=2
another MGD phenotype found in the same gene weight_successive_mgd_phenotype [real] weight_successive_mgd_phenotype=2
expression similarity weight_expression_similarity [real] weight_expression_similarity=1
tissue-specific expression (Pearson correlation) weight_tissuespecific_expression [real] weight_tissuespecific_expression=1
cellular localisation (or children) found weight_cellular_component [real] weight_cellular_component=1
any GO term (or children) found weight_go_term [real] weight_go_term=1
another GO term found in the same gene weight_successive_go_term [real] weight_successive_go_term=1
interaction with a target gene
(self-self interactions are not counted!)
weight_interaction [real] weight_interaction=2
interaction with another target gene weight_successive_interaction [real] weight_successive_interaction=1
interaction confirmed by another interaction network weight_successive_network_interaction [real] weight_successive_network_interaction=1
existence of a target PFAM domain weight_pfam [real] weight_pfam=2
existence of another target PFAM domain weight_successive_pfam [real] weight_successive_pfam=1
existence of a target InterPro domain weight_interpro [real] weight_interpro=2
existence of another target InterPro domain weight_successive_interpro [real] weight_successive_interpro=1
member of a target pathway weight_pathway [real] weight_pathway=2
member of another target pathway weight_successive_pathway [real] weight_successive_pathway=1
protein paralogs, weight of sequence identity weight_identity [real] weight_identity=2
       

query interface settings
     
show target genes section target_genes_layer_b show, hide target_genes_layer_b=show
show known genes section known_genes_layer_b show, hide known_genes_layer_b=show
show display options display_options_layer_b show, hide display_options_layer_b=show
show phenotype section phenotypes_layer_b show, hide phenotypes_layer_b=show
show expression section expression_layer_b show, hide expression_layer_b=hide
show cellular localisation section cellular_compartments_layer_b show, hide cellular_compartments_layer_b=hide
show prioritisation fine tuning prioritisation_settings_layer_b show, hide prioritisation_settings_layer_b=hide

POST and GET requests are allowed.

GET requests can easily be formed by adding a '?' to the URL and joining parameter=value pairs with '&'. Example:
http://www.genedistiller.org/GD/API.cgi?&start_microsat=D15S1042&end_microsat=D15S659&order=start_pos&gene_type=7.
Please note that some special characters ('>', space, etc.) have to be encoded.
http://www.w3.org/TR/html4/interact/forms.html#h-17.13.3.3

All values are case-insensitive. If the 'API' is not working as you expect, please e-mail the query, the problem you encountered and what you wanted to do. We will try to fix the error or make the manual more precise.

Implementation

GeneDistiller consists of Perl scripts to automatically retrieve data from the Internet and update the database, show the query interface with user-defined parameters and query the database. The database has a good old plain SQL schema and is run under PostgreSQL 8.2 (recursive queries for GO terms and cellular localisations by a PL/pgSQL function). Special thanks to all those who have been developing these wonderful open source products.

The source code

... is not secret. Please write to the authors if you are interested in any of our Perl scripts or the database schema. We do not provide downloads of our database tables and their contents but with our update script you can easily get the data you're interested in from the original source.

Contact

If you discover bugs, have suggestions or questions, please write an e-mail to
Markus Schuelke (markus.schuelke AT charite.de) or to
Dominik Seelow
(dominik.seelow AT charite.de).
We also appreciate hearing about your general experiences using GeneDistiller.