This software is being further developed. You might occasionally discover new features that haven't yet made it into the manual. We apologise in advance if this causes confusion, but we want to make new features available as soon as possible.
At first, possible candidate genes must be defined. This can be done in several ways:
interval-based
* Physical positions below 400 are treated as Mb.
gene-based
Target genes or gene IDs are specified either by their NCBI NCBI Entrez gene ID or by the
HGNC HGNC gene symbol (in
distinct input fields). It is also possible to
search for known disease genes (with OMIM IDs or keywords) that are treated as candidate genes. Here, the different
means of addressing target genes can be combined. However, the search within intervals and single genes cannot be
combined. If you specify both, the region will be ignored.
genome-wide / mitochondriome
If whole genome analysis or mitochondrial genes à la
MitoCarta analysis is selected, the whole genome or all genes encoding mitochondrial proteins, respectively, will be
analysed. This option ignores pre-set intervals!
output
The output can be restricted to certain types of genes using the
type drop-down menu, e.g. to
protein-coding genes, non coding RNAs (ncRNA) etc.
number of target genes
The number of genes within an interval can be quickly accessed by clicking the count button. Here, no filters except for the limits directly above will be applied.
comparison with known genes or phenotypes
Target genes (defined in Target genes) can be compared with genes known to be causing similar disease phenotypes. These genes can be defined either by their HGNC symbol (synonyms cannot be used) or NCBI ID (both specified in the same input field, comma separated). Furthermore, OMIM IDs or parts of the title of an OMIM record can be entered; here all genes connected with the respective OMIM records will be added to the list of genes to be compared.
Related genes (or proteins) can be found either by similarity of
expression patterns in all available tissues (co-expression, by
Pearson correlation) and/or via protein-protein interactions. Both
options can be activated by selecting the check boxes compare expression (which might take a few extra
milliseconds or even seconds) or search for
interactions in the first row of this menu. Additionally, the annotations
(such as protein domains, KEGG pathways or GO IDs)
of the genes used to compare the target genes with are studied. Similarities are highlighted and used for the
prioritisation (if desired so).
By entering a cut-off value, only similarly expressed genes (with a score above the cut-off) will be listed. Expression
similarity can vary
between -1 and 1 (-1 indicates perfect negative correlation).
Here, the users can choose which data shall be included in the output and how the output shall be formatted. They can also choose to shorten some phenotypic information to the first paragraph (OMIM) or the title (MGD phenotypes). Please note that the MGD phenotype information for human genes does not include any details.
Hyperlinks to the original data source are provided whenever possible. For each transcript, a further hyperlink to ExonPrimer is generated.
The settings made here do not affect the GeneDistiller prioritisation or the selections made by the user, i.e. user-defined conditions are regarded even when the corresponding data is not displayed. This section allows choosing which phenotypic information shall be included.
In this segment, users can choose among different tissues which can be
queried for expression. Tissues are hierarchically sorted.
Checkboxes on the left allow to
print tissue-specific expression in the output. The elements on the
right side allow to define conditions either for exclusion of genes
not fulfilling them (show only genes fulfilling the
conditions) or to prioritise genes according to their fulfillment
of these conditions (rank genes by their fulfillment of
conditions). To include conditions, the operator (usually >) must be selected and a
value specified. These values
indicate the genes' expression in this tissue and are defined as
[value] x median expression. Hence, to find a gene that is expressed
above median the condition should simply be '>1'. Use higher values
to screen for genes with an expected high expression. In case of
exclusive queries (Connect expression
conditions with), conditions can be combined either with
AND or OR.
When more than one probe exists for a given gene (according to
Affymetrix' annotations), the mean of the expression values of all
these probes will be used.
Please note that although prioritisation according to
tissue-specific expression will normally list the correct gene under
the top 5 genes (provided the right tissues have been selected), the
correct gene will not always make it to the first place.
In this part of the query interface, the localisation (cellular, extracellular or organellar, respectively) of the gene products can be queried. The locations are presented in a hierarchical structure and can be selected by checking the boxes. If a gene product is located in one of the selected structures (or a substructure of them), its location will be highlighted in the output.
Above the localisations, a further checkbox allows to restrict output to those genes which fulfil the conditions, i.e. whose gene products are located in any of the structures selected.
The prioritisation settings allow fine-tuning the weight given to each parameter when a prioritisation approach is chosen. For instance, the impact of occurrences of search terms in OMIM reports can be increased by increasing the value assigned to OMIM text. When fields are set to zero, these parameters will not be used for the prioritisation.
This sections opens automatically when a prioritisation approach is selected under order / prioritise genes; values entered here are not considered when another sorting approach is chosen.
First example:
Show all protein-coding genes in the interval defined by the
microsatellite markers D15S1042 and to D15S659
ordered by their position. To create the interval set microsatellite (from) to D15S1042 and
microsatellite (to) to
D15S659. Now change type to
protein coding to display only the protein coding genes. The
sorting order is adjusted with the drop-down menu order / prioritise genes, position is
the default value.
Show example #1.
Second example:
Leigh syndrome is a disease group
with a number of different aetiologies. However, common to this
disease group, genes are affected that have a function in the
mitochondrium. As genes involved in the same disease, pathway or
organelle (here the mitochondrial genes) are frequently co-regulated,
the prioritisation is focussed on genes with a common mitochondrial
expression pattern and mitochondrial organellar localization. Imagine,
mutations in the LRPPRC gene had not yet been found as the
cause of the French-Canadian type Leigh syndrome. A candidate region
of 5.2 cM between the markers D2S2294 and D2S2291 would
have been mapped that contains 15 genes. Therefore we choose the
prioritisation setting (order / prioritise
genes) prioritise with focus on possible pathways
(interaction and expression similarity) and additionally increased
the weight of the prioritisation settings Maestro score and
Mitopred to 5.
Show example #2.
As a result we see the LRPPRC gene ranking top, mainly due to its expression correlation
with other Leigh syndrome related genes (PDHA1, COX15, NDUFV1, PC,
SURF1, NDUFS3, NDUFS4, NDUFS8, DLD, NDUFS7) and its mitochondrial
localisation.
Third example:
Imagine, the TSC2 had not yet been
found as a second gene to cause Tuberous sclerosis if mutated. A
pedigree with several individuals affected with Tuberous sclerosis
would have been mapped and a candidate region between
D16S521 and D16S3124 delineated.
This interval comprises 2.3 Mbp and 126 known genes. As the researcher
already knows that TSC1 may cause Tuberous sclerosis if
mutated, he or she may assume that the new candidate in the interval
might interact with TSC1 or show the same expression. Therefore we
choose the prioritisation setting (order /
prioritise genes) prioritise with focus on possible
pathways (interaction and expression similarity).
Show example #3.
As a result we see the TSC2 gene ranking top (score = 36.9,
next gene in succession only 8.2) between the 126 genes in the region,
mainly due of its protein-protein interaction with TSC1.
Fourth example:
In a homozygosity mapping for nephronophthisis, a target region
between microsatellite D16S475 and SNP rs1529917 was
found to be associated with the disease. These markers limit the
interval to be analysed by GeneDistiller and we enter them at the
correct places (microsatellite,
from: D16S475, dbSNP ID,
to: rs1529917) . We also hope to find a renal
phenotype described for the disease causing gene in MGD. In the query
interface, we therefore select renal/urinary system phenotype
under highlight these MGD phenotypes
and check show only genes to which at least
one of these phenotypes was assigned to reduce the number of
genes. We also expect the gene to be expressed in the kidney, so we
open the expression tab and check
the box left of kidney.
Assuming this is all the information we have in advance, the query
mask would look like this:
Show example #4.
After clicking on submit,
GeneDistiller will display 2 genes (out of 48 that are located in the
interval); and we can see that while one of the genes
(DNASE1) comes with detailed data, the information available
for the other one (GLIS1) is scarce (there are, for instance,
no interactions listed and no expression values available). However,
we can read in the OMIM report that the latter gene is indeed
responsible for nephronophthisis.
Note: We have not used OMIM terms, because the OMIM entry was created
after this gene was identified.
Fifth example:
Suggests candidate genes on behalf of their tissue-specific expression
in the brain or its substructures (ordered
by: prioritise with focus on tissue-specific expression).
Here, genes likely to be involved in GEFs+ (Generalized epilepsy with febrile seizures
plus) are ranked. Note that no
phenotypic criteria are given in the first example. In example 5b,
more background knowledge is being applied.
Show example #5a. | Show example #5b.
Selection:
Genes are filtered for those with a known murine nervous system phenotype and behaviour/neurological phenotype (select both values in the MGD phenotypes and limiting the query to the respective genes with the show only genes to which at least one of these phenotypes was assigned checkbox). A further condensation can be reached when known human phenotypes are considered: Enter the broad term brain into the field highlight these keywords and restricts the search to genes in whose descriptions this keyword appear (check show only genes with at least one of these words in their OMIM reports).
Note that the more specific epilepsy is not used in example because we cannot be sure in advance that our candidate is already known to cause epilepsy in humans.
Since a gene responsible for epilepsy is likely to be expressed in brain, open the expression tab and select >1 (x median) for the expression in whole brain. Restriction to the genes with an expression of more than median can be reached whenshow only genes fulfilling the conditions is selected. Setting a filter for prefrontal cortex expression > 3 (x median) and connecting both expression filters with AND further shortens the list. Add the Gene Ontology ID for ion transport (GO:0006811) into the highlight these GO IDs fields and restrict the search to those carrying this GO ID or a subclass (show only genes to which at least one of these GO IDs applies checkbox).
Selection example
Now, only 2 genes, SCN1A and SCN3A remain in the list both of which are excellent candidates for an epilepsy phenotype.
Prioritisation:
If you change the order / prioritise genes drop-down to prioritise with focus on possible pathways, uncheck all the restricting checkboxes and change the expression setting to
rank genes by their fulfillment of conditions, a prioritisation strategy will be applied. To search for similarities with genes known to be involved in epilepsy, enter the term epilepsy into the compare with these OMIM entries (MIM ID or keyword) field and check compare expression
and
search for interactions.
Prioritisation example
Again, SCN1A will be listed on top. Another gene, SCN2A, will appear as the second best candidate - it was not considered in the selection
approach because no mice phenotypes have been described yet.
GeneDistiller prints the desired data in HTML format. If figures, e.g.
for expression data, are included, they will be produced as PNG and
seamlessly integrated into the output.
Below the actual output, two hyperlinks are presented. The first one
will restore the query mask with the settings made by the user, the
latter will restore the actual output. Bookmark the second link if you
only want to return to the output page (you might as well save the
page), bookmark the first one if you might want to modify your
settings add a later date.
The output includes hyperlinks to the original data on the providers' web pages.
GeneDistiller has been developed on Mozilla Firefox 2. It has also been tested with Microsoft Internet Explorer. However, it should work with any web browser with JavaScript enabled.
If you feel that GeneDistiller has helped you in your research, please cite the following publication:
Seelow D, Schwarz JM, Schuelke M.