Our approach to identifying AS consists of following three phases(figure1 ):
Our approach to detect alternative splicing is based on whole genome alignment of EST. We used MUGUP[1] to align the entire ESTs set and mRNA to the whole genome. We used one hundred personal computers to perform this task in thirty days
Introns that begin with the bases GT or
GC and end with the bases AG are referred to as “canonical introns”. The
overwhelming majority (98.12%) of introns are of the GT/AG kind, and 0.76% are
of the GC/AG kind. For reasons mentioned above, splice pair prediction is
accepted if intron has the two canonical introns: “GT..AG, GC..AG” on
forward strand and “CT..AC, CT..GC” on reverse strand. MUGUP
Parameters: minimum intron length is 20 bp and maximum intron is 40000 bp.
We noted a little earlier that EST Libraries may be contaminated, so only ESTs alignments with similarity scores greater than 94% were used for further alternative splicing analysis. Besides, ESTs contained any non-canonical intron were not used for further analysis.
First, we had identified exons and introns boundary. Grouping each intron by splicing sites match within 3 bp and intron all be in the same orientation (strand). Exon in EST alignment could be partitioned into three classes (Figure 2). Both boundaries of exons in the middle can be determined by the introns information. Yet, the information of the upstream (downstream) boundaries of first (last) exons are missing.

Figure 2.
Exon Classification of EST alignmentsSecond, we have detected three type of alternative splicing by comparison of introns. The three types are exon skipping, 3’ & 5’ alternative splicing.
Methods of alternative splicing site
detection (See Figure 3).
Detection of exon skipping:
The AS types are characterized by two short introns including a longer intron. The one of short introns has same position at 5’ splicing site with long intron. The other intron has same 3’ position with long intron.
Detection of 3’ & 5’
alternative splicing:
3’ alternative splicing has two introns. The position of 3’ splicing site is differentially, but it’s same position at upstream exon. The converse holds for 5’ alternative splicing.

Figure 3. The method of alternative splicing site detection.
Detection of Mutually Exclusive
The AS type: “Mutually Exclusive”, have two or three and more splicing form with different EST. The two splicing form must has a shorter introns and a longer intron in the same EST. the first intron (i1, i3) and next intron (i2, i4) have same boundary in 5’and 3’splicing site (Form1 and Form2 have the same splicing site at a and d). Next step, we test the intron i2 and i3 of overlapping. We should notice that intron i1 and i2 must splice with the same EST. The same is true of form2 (i3, i4). The method is illustrated in Figure 4.

Figure 4. Type of mutually exclusive detected
The all EST sequences were downloaded from NCBI (ftp.ncbi.nih.gov/repository/dbEST/gzipped/dbEST.reports.date.no.gz). As of JAN 16, 2004; the dbEST database has nearly 5.4 million human ESTs. Human genomic sequences retrieved contig build 34 (genbank format) from NCBI (ftp.ncbi.nih.gov/genbank/genomes/). Genomic build numbers of different organisms are shown in Table 1.Gene information and mRNA sequence were fetched from NCBI RefSeq project, parsed by GenBank chromosome flat file.
Table 1.
The data build of genomic sequences
|
Organism |
Data version |
|
Homo sapiens |
Build 35 |
|
Mus musculus |
Build 32 |
|
Rattus norvegicus |
Build 2 |
|
Caenorhabditis elegans |
Build 1 |
|
Drosophila melanogaster |
BDGP Release 3.2 |
|
Arabidopsis thaliana |
Build 1 |
|
Database |
Clustering method |
Tools |
No. of genes/spliced transcripts/alternative spliced |
AS type
|
Organism
|
AS visualized display
|
Ref.
|
|
3 |
BLAST |
7,991 genes/ 30,793 spliced transcripts/ |
3’,5’,exon
skipping, mutually exclusive |
Human |
show splicing sites |
1.
|
|
|
cDNA sequences and literature |
literature |
AS 8,314 genes |
-
|
Human and mouse
|
showed cDNAs only
|
2.
|
|
|
SWISS-PROT/ GenBank literature |
BLAST CLUSTALW |
1,922 protein and 2,486 DNA sequences |
- |
Human, Mouse, Rat, Fly,
Worm, Chicken, Virus, Bovine and Rabbit |
- |
3.
|
|
|
3 |
BLAST, sim4 |
22,127 genes |
3’,5’,exon skipping, Intron retentions |
human |
show 3’,5’,exon skipping AS |
4.
|
|
|
1 |
BLASTN |
1,229 genes/ 9,073 spliced
transcripts |
exon skipping
|
Human |
show
exons |
5.
|
|
|
2 |
BLASTN (EST to mRNA), BLAST EST cluster |
Near 30,000 spliced transcripts |
3’,5’,exon skipping, retained intron |
Human, Plant, Cow, Worm,
Fly, Fish, Mouse, Rat and Frog |
show splicing sites |
6.
|
|
|
2 |
BLAST, SAGE |
- |
Inherited from ASmoldeler |
Human,mouse,rat |
UCSC browser |
7.
|
|
|
3 |
BLAST EST cluster |
19,936 (human)16,615 (mouse) UniGene clusters/14,106 AS of 26,324UniGene clusters (human) 10,129 AS of
18,614UniGene clusters (mouse) 2,705 AS of 14,393UniGene clusters (worm) |
3’,5’,exon skipping |
Human, mouse and worm |
show all ESTs and mRNAs |
8.
|
|
|
3 |
BLAST (protein) SIM4 (mRNA, and ESTs) |
21,786 genes |
3’,5’,exon skipping |
Human |
show
proteins mRNAs and ESTs’ exon only |
9.
|
|
|
2 |
d2_cluster (EST cluster) |
270,515 clusters |
- |
Human |
- |
10.
|
|
|
1 |
WU2BLASTN,sim4 |
1,124 transcripts |
- |
Human and mouse |
- |
11.
|
|
|
1 |
Mugup |
26,377genes (human) /*AS 19,175(six taxo) |
3’,5’,exon skipping |
Human, mouse, rat, fruit fly, worm,
thale cress |
show 3’, 5’,exon skipping AS |
|
note:*=>only splicing sites with at least two ESTs supported are considered
Clustering method :
1->EST/mRNA/cDNA to genome
2->EST to mRNA
3->Unige
There are four functions (EST updating, Genome sequences updating, data report and Mail report component) added in Avatar. EST report provides user querying the chromosome, contig and gene name which how many ESTs is added from dbEST recently. Genome report provides user the two genome( build 34 and build 35) structure overview. Mail report provides user EST and Genome report updated information by mail in Avatar.
![]() |
figure 5 The flowchart of ESTs updated omponent
The component includes four steps aligned, grouping with ESTs, detected AS forms(3’,5’,exon skipping, etc.), and inserted database(figure 5). The authors would perform the three steps when dbEST have added new sequences.
Genome sequences updating component:
figure 6. The flowchart of genome updated component
There are two inputs , previous results of ESTs aligning to genome, and BAC location in the annotation in this component. First we get the previous results of ESTs aligning to genome. Those results are the input of the newly version of genome. Second, we parse the detailed information of BAC. Those are the most important in our work.We transfer each position of ESTs’ exon belong to the boundary of BAC. (The example is shown in figure 7.) However, part of ESTs do not belong to the boundary in any BAC. Cause some BACs are removed or rename. So those ESTs must realign with new genome again.

figure 7.The example of EST position transferred to new one.
In this component, we show two parts genome sequence updating and EST sequences updating.
1. Genome sequence updating
We parse genomic annotation and compare genomic structure (build 34 and build 35), than we provide a viewer showing them. Each BAC, gene, contig, has detailed information in annotated files (*.gbs or *.gbk). We use those data to reconstruct the genomic overview.
2. EST sequences updating
When we add EST sequences to our database, this component would count the number of ESTs by gene’s boundary. We also provide reader with a simple way querying number of EST by chromosome, contig, and gene. We count est newly-increased quantity by gene each season and calculate AS form quantity , and and the last difference situation of one number of times amount.
We provide user a free service that about our AS database’s updated message include ESTs and genome updating events. The URL for Avatar is http://avatar.iecs.fcu.edu.tw.
[1] Jeremy D. Glasner, Paul Liss, Guy Plunkett III, Aaron
Darling, Tejasvini Prasad, Michael Rusch, Alexis Byrnes, Michael Gilson, Bryan
Biehl, Frederick R. Blattner, and Nicole T. Perna, ASAP,
a systematic annotation package for community analysis of genomes, Nucleic
Acids Res., Jan 2003; 31: 147 - 151.
[2] T. A. Thanaraj, Stefan Stamm, Francis Clark, Jean-Jack
Riethoven, Vincent Le Texier, and Juha Muilu
ASD: the Alternative Splicing
Database
Nucleic Acids Res., Jan 2004; 32: 64 - 69.
[3] I. Dralyuk, M. Brudno, M. S. Gelfand, M. Zorn, and I.
Dubchak
ASDB: database of alternatively
spliced genes
Nucleic Acids Res., Jan 2000; 28: 296 - 297.
[4] Jeremy Leipzig, Pavel Pevzner, and Steffen Heber
The Alternative Splicing Gallery (ASG):
bridging the gap between genome and transcriptome
Nucleic Acids Res., Aug 2004; 32: 3977 - 3983.
[5]
akharkar MK, Perumal BS, Lim Y.P., Lee P.C., Yu Y, Kangueane P., Alternatively
spliced human genes by exon skipping :A database (ASHESdb), In Silico Biology,
5, 0021, Dec 2004.
Bioinformation Systems e.V.
[6] Heike Pospisil, Alexander Herrmann, Ralf H. Bortfeldt, and
Jens G. Reich
EASED: Extended Alternatively
Spliced EST Database
Nucleic Acids Res., Jan 2004; 32: 70 - 74.
[7] Pora Kim, Namshin Kim, Younghee Lee, Bumjin Kim, Youngah
Shin, and Sanghyuk Lee
ECgene: genome annotation for
alternative splicing
Nucleic Acids Res., Jan 2005; 33: D75 - D79.
[8] Y.-H. Huang, Y.-T. Chen, J.-J. Lai, S.-T. Yang, and U.-C.
Yang
PALS db: Putative Alternative Splicing database
Nucleic Acids Res., Jan 2002; 30: 186 - 190.
[9] Hsien-Da Huang, Jorng-Tzong Horng, Feng-Mao Lin, Yu-Chung
Chang, and Chen-Chia Huang
SpliceInfo: an information repository for mRNA alternative splicing in
human genome
Nucleic Acids Res., Jan 2005; 33: D80 - D85.
[10] Alan Christoffels, Antoine van Gelder, Gary Greyling,
Robert Miller, Tania Hide, and Winston Hide
STACK: Sequence Tag Alignment and Consensus Knowledgebase
Nucleic Acids Res., Jan 2001; 29: 234 - 238.
©2004 BioGrid
Lab. Tel:+886-4-24517250:3730 FCU
Taichung , TAIWAN 433 ![]()