Help

In this section, you can find informations to understand how to use this tool and how are created the data. If something is missing or not clear, please contact Maxime Vallée (valleem@iarc.fr).

Query the database by exon

Selecting one exon will open a new page : the viewer.
For BRCA1 : data are available for all exons except exon 1 (5' UTR) and exon 4 (inserted alu sequence).

Query the database by nucleotide position

Enter in the text box a nucleotide position (by default, nucleotides are numbered from A of start codon ATG : A is #1, T is #2, G is #3, etc.). When you validate, you will see the viewer with the corresponding exon and the informations for the nucleotide you have chosen (and its 3 possible variants).

The chosen nucleotide will be displayed directly at the left of the scrollable panel (if the nucleotide is far from the beginning of the exon).

Query the database by amino acid position

Enter in the text box an amino acid position. When you validate, you will see the viewer with the corresponding exon and the informations for the amino acid you have chosen (and its 9 possible variants : 3 variants by 3 nucleotide of the codon).

The chosen amino acid will be displayed directly at the left of the scrollable panel (if the nucleotide is far from the beginning of the exon).

How to use the viewer page

The viewer is composed of 2 panels : one that contains the exon and its translation, and a second that contains the informations.

The exon is displayed this way : 25 nucleotides from previous intron (in lower case), the exon itself, 25 nucleotides from next intron (in lower case).

Just under the exonic sequence, the amino acid sequence is displayed. The reading frame is easily viewable because under every codon, its amino acid is displayed. For example : under 'ATG' codon, an '|M|' is displayed.

The two sequences in the upper panel are clickable. Hover with your mouse on every nucleotide and every amino acid will display its position.

When you click on one nucleotide, informations for each substitution at this position will be displayed on the second panel (for example, if an A is clicked, informations if this nucleotide is substituted with a C or a G or a T will be displayed).

When you click on one amino acid, informations will be displayed the same way, for each nucleotides composing the codon.

Splice junction scores

4 types of splice site scores are recorded in the database, within two methods of scoring:

Wild-type acceptor splice site score
Wild-type donor splice site score
De novo acceptor splice site score
De novo donor splice site score

>> NNSplice

The scores are retrieved by a systematic analysis with the NNSPLICE 0.9 algorithm (Reese MG, Eeckman, FH, Kulp, D, Haussler, D, 1997. "Improved Splice Site Detection in Genie". J Comp Biol 4(3), 311-23.)
You can use it at http://www.fruitfly.org/seq_tools/splice.html.
It is a prediction tool of splice sites (donor and acceptor).

Basically, if a splice junction obtains a score of 1.0 means that this part of the sequence matches completely with the splice junction consensus sequence. The non-detection by the algorithm of an actual wild-type splice site can occur (i.e. an actual splice juction with a score of 0.0).

>> MaxEnt

MaxEntScan algorithm (Yeo G and Burge C.B., Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals, RECOMB 2003) takes sequences from a determined length (23-mer for acceptor splice site, 9-mer for donor splice sites) and scores them, available at http://genes.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html.

Wild-type splice junctions were given and scored easily because their location are obviously known.
For de novo splice junctions, a piece of software was done in order to compute all possible substitutions by systematically "shifting" 23bp and 9bp windows by base on the gene.

Dinucleotide rate constant

Substitution rates were calculated for all single base substitutions according to the dinucleotide substitution rates derived from human-mouse aligned sequences of chromosomes 21 and 10 (Lunter and Hein 2004).

The substitution probabilities for a given single nucleotide substitution are calculated by averaging the dinucleotide substitution rates at that position for the forward and reverse strands.

Human Genome Variation Society nomenclatures

Single nucleotide mutations in cDNA sequences are written, taking an example of a substitution of T #181 by a G : 'c.181T>G'. The 'c.' stands for cDNA.

At the amino acid level, taking an example of a substitution of Cys #61 by a Glu : 'p.C61G'. The 'p.' stands for protein.

Align-GVGD scores & grades

Align-GVGD is a freely available, web-based program that combines the biophysical characteristics of amino acids and protein multiple sequence alignments to predict where missense substitutions in genes of interest fall in a spectrum of grades of clinical significance.

Align-GVGD is an extension of the original Grantham difference to multiple sequence alignments and true simultaneous multiple comparisons.

Users can either supply their own protein multiple sequence alignments (in FASTA format) or else select built-in library of alignments. Align-GVGD currently has alignments for ATM, BRCA1, BRCA2, CHEK2, and TP53.

The sequence alignment depth recorded in this database is :

for ATM : From Human To Sea Urchin
for BRCA1 : From Human To Sea Urchin
for BRCA2 : From Human To Sea Urchin

BIC Informations

The Breast cancer Information Core (BIC) database was established in 1995 to capture information about naturally occurring variation in the human BRCA1 gene (BRCA2 was added to the database after it was cloned).

In addition to mutation information the database contains a collection of mutation detection protocols, lists of gene specific DNA primers and published protocols.

Mutation data is entered by individual investigators, hospital-based labs and a commercial lab performing the bulk of BRCA1/BRCA2 tests in North America.

If the systematic generated mutation in the BReast CAncer genes IARC database is already reported in the BIC database, its clinical importance and its additional informations are displayed.

URL to a Priors page

To link to a specific nucleotide's Priors page, the following syntax should be used:
hci-priors.hci.utah.edu/PRIORS/BRCA/viewer.php?gene=GGGG&subs_type=SSSS&nt_pos=PPPP where
GGGG (gene) should be BRCA1 or BRCA2
SSSS (substitution type) should be HGVS, BRCA or GENOMIC
PPPP (position) should be a numerical number. (Click on a gene name on the homepage for the available range.)
For example:
hci-priors.hci.utah.edu/PRIORS/BRCA/viewer.php?gene=BRCA1&subs_type=HGVS&nt_pos=-19-10
hci-priors.hci.utah.edu/PRIORS/BRCA/viewer.php?gene=BRCA2&subs_type=GENOMIC&nt_pos=32918793

To link to a specific amino acid's Priors page, the following syntax should be used:
hci-priors.hci.utah.edu/PRIORS/BRCA/viewer.php?gene=GGGG&aa_pos=PPPP where
GGGG (gene) should be BRCA1 or BRCA2
PPPP (position) should be a numerical number. (Click on a gene name on the homepage for the available range.)
For example:
hci-priors.hci.utah.edu/PRIORS/BRCA/viewer.php?gene=BRCA1&aa_pos=100

To avoid error, please consider all strings in the URL to be case sensitive.

BReast CAncer genes database