This software is being further developed. You might occasionally discover new features that haven't yet made it into the manual. We apologise in advance if this causes confusion, but we want to make new features available as soon as possible.
general settingsThe general section allows to select the species (at present, only human genes can be queried, but mouse genes will be included soon) and the order in which the results are to be displayed (order / prioritise genes) in a drop-down menu. This can either be due to their physical position on the chromosome ( default) or to other features. There is also the possibility to let GeneDistiller perform a prioritisation on behalf of user-defined criteria. These options are indicated with a preceding 'prioritise' and focus the prioritisation on different information (e.g. phenotype data or tissue-specific expression). When prioritisation is selected, another segment will appear allowing to fine-tune the weights assigned to each parameter. This option can be used to raise or lower the influence of certain kinds of data in the prioritisation.
At first, possible candidate genes must be defined. This can be done in several ways:
* Physical positions below 400 are treated as Mb.gene-based
Target genes or gene IDs are specified either by their NCBI NCBI Entrez gene ID or by the
HGNC HGNC gene symbol (in
distinct input fields). It is also possible to
search for known disease genes (with OMIM IDs or keywords) that are treated as candidate genes. Here, the different
means of addressing target genes can be combined. However, the search within intervals and single genes cannot be
combined. If you specify both, the region will be ignored.
genome-wide / mitochondriome
If whole genome analysis or mitochondrial genes à la MitoCarta analysis is selected, the whole genome or all genes encoding mitochondrial proteins, respectively, will be analysed. This option ignores pre-set intervals!
The output can be restricted to certain types of genes using the type drop-down menu, e.g. to protein-coding genes, non coding RNAs (ncRNA) etc.
number of target genes
The number of genes within an interval can be quickly accessed by clicking the count button. Here, no filters except for the limits directly above will be applied.
comparison with known genes or phenotypes
Target genes (defined in Target genes) can be compared with genes known to be causing similar disease phenotypes. These genes can be defined either by their HGNC symbol (synonyms cannot be used) or NCBI ID (both specified in the same input field, comma separated). Furthermore, OMIM IDs or parts of the title of an OMIM record can be entered; here all genes connected with the respective OMIM records will be added to the list of genes to be compared.
Related genes (or proteins) can be found either by similarity of
expression patterns in all available tissues (co-expression, by
Pearson correlation) and/or via protein-protein interactions. Both
options can be activated by selecting the check boxes compare expression (which might take a few extra
milliseconds or even seconds) or search for
interactions in the first row of this menu. Additionally, the annotations
(such as protein domains, KEGG pathways or GO IDs)
of the genes used to compare the target genes with are studied. Similarities are highlighted and used for the
prioritisation (if desired so).
By entering a cut-off value, only similarly expressed genes (with a score above the cut-off) will be listed. Expression similarity can vary between -1 and 1 (-1 indicates perfect negative correlation).
Here, the users can choose which data shall be included in the output and how the output shall be formatted. They can also choose to shorten some phenotypic information to the first paragraph (OMIM) or the title (MGD phenotypes). Please note that the MGD phenotype information for human genes does not include any details.
Hyperlinks to the original data source are provided whenever possible. For each transcript, a further hyperlink to ExonPrimer is generated.The settings made here do not affect the GeneDistiller prioritisation or the selections made by the user, i.e. user-defined conditions are regarded even when the corresponding data is not displayed.
phenotypesThis section allows choosing which phenotypic information shall be included.
In this segment, users can choose among different tissues which can be
queried for expression. Tissues are hierarchically sorted.
Checkboxes on the left allow to
print tissue-specific expression in the output. The elements on the
right side allow to define conditions either for exclusion of genes
not fulfilling them (show only genes fulfilling the
conditions) or to prioritise genes according to their fulfillment
of these conditions (rank genes by their fulfillment of
conditions). To include conditions, the operator (usually >) must be selected and a
value specified. These values
indicate the genes' expression in this tissue and are defined as
[value] x median expression. Hence, to find a gene that is expressed
above median the condition should simply be '>1'. Use higher values
to screen for genes with an expected high expression. In case of
exclusive queries (Connect expression
conditions with), conditions can be combined either with
AND or OR.
When more than one probe exists for a given gene (according to Affymetrix' annotations), the mean of the expression values of all these probes will be used.
Please note that although prioritisation according to tissue-specific expression will normally list the correct gene under the top 5 genes (provided the right tissues have been selected), the correct gene will not always make it to the first place.
In this part of the query interface, the localisation (cellular, extracellular or organellar, respectively) of the gene products can be queried. The locations are presented in a hierarchical structure and can be selected by checking the boxes. If a gene product is located in one of the selected structures (or a substructure of them), its location will be highlighted in the output.
Above the localisations, a further checkbox allows to restrict output to those genes which fulfil the conditions, i.e. whose gene products are located in any of the structures selected.
The prioritisation settings allow fine-tuning the weight given to each parameter when a prioritisation approach is chosen. For instance, the impact of occurrences of search terms in OMIM reports can be increased by increasing the value assigned to OMIM text. When fields are set to zero, these parameters will not be used for the prioritisation.
This sections opens automatically when a prioritisation approach is selected under order / prioritise genes; values entered here are not considered when another sorting approach is chosen.
Show all protein-coding genes in the interval defined by the microsatellite markers D15S1042 and to D15S659 ordered by their position. To create the interval set microsatellite (from) to D15S1042 and microsatellite (to) to D15S659. Now change type to protein coding to display only the protein coding genes. The sorting order is adjusted with the drop-down menu order / prioritise genes, position is the default value.
Show example #1.
Leigh syndrome is a disease group with a number of different aetiologies. However, common to this disease group, genes are affected that have a function in the mitochondrium. As genes involved in the same disease, pathway or organelle (here the mitochondrial genes) are frequently co-regulated, the prioritisation is focussed on genes with a common mitochondrial expression pattern and mitochondrial organellar localization. Imagine, mutations in the LRPPRC gene had not yet been found as the cause of the French-Canadian type Leigh syndrome. A candidate region of 5.2 cM between the markers D2S2294 and D2S2291 would have been mapped that contains 15 genes. Therefore we choose the prioritisation setting (order / prioritise genes) prioritise with focus on possible pathways (interaction and expression similarity) and additionally increased the weight of the prioritisation settings Maestro score and Mitopred to 5.
Show example #2.
As a result we see the LRPPRC gene ranking top, mainly due to its expression correlation with other Leigh syndrome related genes (PDHA1, COX15, NDUFV1, PC, SURF1, NDUFS3, NDUFS4, NDUFS8, DLD, NDUFS7) and its mitochondrial localisation.
Imagine, the TSC2 had not yet been found as a second gene to cause Tuberous sclerosis if mutated. A pedigree with several individuals affected with Tuberous sclerosis would have been mapped and a candidate region between D16S521 and D16S3124 delineated. This interval comprises 2.3 Mbp and 126 known genes. As the researcher already knows that TSC1 may cause Tuberous sclerosis if mutated, he or she may assume that the new candidate in the interval might interact with TSC1 or show the same expression. Therefore we choose the prioritisation setting (order / prioritise genes) prioritise with focus on possible pathways (interaction and expression similarity).
Show example #3.
As a result we see the TSC2 gene ranking top (score = 36.9, next gene in succession only 8.2) between the 126 genes in the region, mainly due of its protein-protein interaction with TSC1.
In a homozygosity mapping for nephronophthisis, a target region between microsatellite D16S475 and SNP rs1529917 was found to be associated with the disease. These markers limit the interval to be analysed by GeneDistiller and we enter them at the correct places (microsatellite, from: D16S475, dbSNP ID, to: rs1529917) . We also hope to find a renal phenotype described for the disease causing gene in MGD. In the query interface, we therefore select renal/urinary system phenotype under highlight these MGD phenotypes and check show only genes to which at least one of these phenotypes was assigned to reduce the number of genes. We also expect the gene to be expressed in the kidney, so we open the expression tab and check the box left of kidney.
Assuming this is all the information we have in advance, the query mask would look like this:
Show example #4.
After clicking on submit, GeneDistiller will display 2 genes (out of 48 that are located in the interval); and we can see that while one of the genes (DNASE1) comes with detailed data, the information available for the other one (GLIS1) is scarce (there are, for instance, no interactions listed and no expression values available). However, we can read in the OMIM report that the latter gene is indeed responsible for nephronophthisis.
Note: We have not used OMIM terms, because the OMIM entry was created after this gene was identified.
Suggests candidate genes on behalf of their tissue-specific expression in the brain or its substructures (ordered by: prioritise with focus on tissue-specific expression). Here, genes likely to be involved in GEFs+ (Generalized epilepsy with febrile seizures plus) are ranked. Note that no phenotypic criteria are given in the first example. In example 5b, more background knowledge is being applied.
Show example #5a. | Show example #5b.
Genes are filtered for those with a known murine nervous system phenotype and behaviour/neurological phenotype (select both values in the MGD phenotypes and limiting the query to the respective genes with the show only genes to which at least one of these phenotypes was assigned checkbox). A further condensation can be reached when known human phenotypes are considered: Enter the broad term brain into the field highlight these keywords and restricts the search to genes in whose descriptions this keyword appear (check show only genes with at least one of these words in their OMIM reports). Note that the more specific epilepsy is not used in example because we cannot be sure in advance that our candidate is already known to cause epilepsy in humans.
Since a gene responsible for epilepsy is likely to be expressed in brain, open the expression tab and select >1 (x median) for the expression in whole brain. Restriction to the genes with an expression of more than median can be reached whenshow only genes fulfilling the conditions is selected. Setting a filter for prefrontal cortex expression > 3 (x median) and connecting both expression filters with AND further shortens the list. Add the Gene Ontology ID for ion transport (GO:0006811) into the highlight these GO IDs fields and restrict the search to those carrying this GO ID or a subclass (show only genes to which at least one of these GO IDs applies checkbox).
Now, only 2 genes, SCN1A and SCN3A remain in the list both of which are excellent candidates for an epilepsy phenotype.
If you change the order / prioritise genes drop-down to prioritise with focus on possible pathways, uncheck all the restricting checkboxes and change the expression setting to rank genes by their fulfillment of conditions, a prioritisation strategy will be applied. To search for similarities with genes known to be involved in epilepsy, enter the term epilepsy into the compare with these OMIM entries (MIM ID or keyword) field and check compare expression and search for interactions.
Again, SCN1A will be listed on top. Another gene, SCN2A, will appear as the second best candidate - it was not considered in the selection approach because no mice phenotypes have been described yet.
GeneDistiller prints the desired data in HTML format. If figures, e.g.
for expression data, are included, they will be produced as PNG and
seamlessly integrated into the output.
Below the actual output, two hyperlinks are presented. The first one will restore the query mask with the settings made by the user, the latter will restore the actual output. Bookmark the second link if you only want to return to the output page (you might as well save the page), bookmark the first one if you might want to modify your settings add a later date.
The output includes hyperlinks to the original data on the providers' web pages.
If you feel that GeneDistiller has helped you in your research, please cite the following publication:Seelow D, Schwarz JM, Schuelke M.