Introduction                               

                                                                                                              Methods    Materials    Comparison    Released    HOME

Alternative splicing (AS) is an important mechanism that gene expression in eukaryotic cells. The AS allows one pre-mRNA to be processed intodifferent transcript isoforms within a cell. Each of AS leads to protein diversity and can have distinct functions. Recent bioinformatic analyses have predicted that at least 50% of the human genes undergo certain forms of alternative splicing . Recent analysis of reconstructed mRNAs that are derived from chromosome 22 indicated that ~60% of genes are represented by more than one transcript . As EST data collection continues, it even seems probably that alternative splicing variant may be observed for all genes. It is one of the most meaningful components of the functional complexity of the genome. Processes as life cycle of many viruses or the fundamental as the sex-determination pathway in Drosophila  are regulated in a large part via alternative pre-mRNA splicing events. A number of studies have shown that alternative splicing is widespread in human as well as mouse genes . And, it is apparent that many splicing events are conserved between human and mouse . With a large increase in frequency of recent exon creation and/or loss, it does indicate that alternative splicing in these genomes has been associated with increased evolutionary change. These results indicate that an extensive number of splice variants have yet to be discovered and analyzed. To address these problems we used ESTs to detect alternative splicing variants. ESTs are short, single pass cDNA sequences generated from randomly selected library clones. And it was produced in a high throughput manner from many tissues, individuals and different conditions. However, it is sometimes of poor quality, and may be contaminated by diverse vectors, non-processed mRNAs, or genomic sequences. Consequently, this great source of information must be handled with care and requires cautious checking before using them. It’s important data with many applications, such as gene discovery, locating exons, comparing and contrast genomes of different organisms and detecting alternative splicing. Actually, the transcript information is essential to many applications such as the discovery of new drugs and diagnostics. Therefore, we develop a value added transcriptome database – Avatar.

Methods

        Our approach to identifying AS consists of following three phases(figure1 ):

             

Phase 1: EST Alignment(a)

Our approach to detect alternative splicing is based on whole genome alignment of EST. We used MUGUP[1] to align the entire ESTs set and mRNA to the whole genome. We used one hundred personal computers to perform this task in thirty days

Introns that begin with the bases GT or GC and end with the bases AG are referred to as “canonical introns”. The overwhelming majority (98.12%) of introns are of the GT/AG kind, and 0.76% are of the GC/AG kind. For reasons mentioned above, splice pair prediction is accepted if intron has the two canonical introns: “GT..AG, GC..AG” on forward strand and “CT..AC, CT..GC” on reverse strand. MUGUP Parameters: minimum intron length is 20 bp and maximum intron is 40000 bp. 

Phase 2: EST filtering(b、c)

We noted a little earlier that EST Libraries may be contaminated, so only ESTs alignments with similarity scores greater than 94% were used for further alternative splicing analysis. Besides, ESTs contained any non-canonical intron were not used for further analysis.

 

Phase 3:Detection of alternative splicing(d)

First, we had identified exons and introns boundary. Grouping each intron by splicing sites match within 3 bp and intron all be in the same orientation (strand). Exon in EST alignment could be partitioned into three classes (Figure 2). Both boundaries of exons in the middle can be determined by the introns information. Yet, the information of the upstream (downstream) boundaries of first (last) exons are missing.

 

Figure 2.

Exon Classification of EST alignmentsSecond, we have detected three type of alternative splicing by comparison of introns. The three types are exon skipping, 3’ & 5’ alternative splicing.

Methods of alternative splicing site detection (See Figure 3). 

Detection of exon skipping:

The AS types are characterized by two short introns including a longer intron. The one of short introns has same position at 5’ splicing site with long intron. The other intron has same 3’ position with long intron.

Detection of 3’ & 5’ alternative splicing:

3’ alternative splicing has two introns. The position of 3’ splicing site is differentially, but it’s same position at upstream exon. The converse holds for 5’ alternative splicing.

Figure 3.  The method of alternative splicing site detection.

Detection of Mutually Exclusive

The AS type: Mutually Exclusive, have two or three and more splicing form with different EST. The two splicing form must has a shorter introns and a longer intron in the same EST. the first intron (i1, i3) and next intron (i2, i4) have same boundary in 5and 3splicing site (Form1 and Form2 have the same splicing site at a and d). Next step, we test the intron i2 and i3 of overlapping. We should notice that intron i1 and i2 must splice with the same EST. The same is true of form2 (i3, i4). The method is illustrated in Figure 4.

Figure 4. Type of mutually exclusive detected

Materials

The all EST sequences were downloaded from NCBI (ftp.ncbi.nih.gov/repository/dbEST/gzipped/dbEST.reports.date.no.gz). As of JAN 16, 2004; the dbEST database has nearly 5.4 million human ESTs.  Human genomic sequences retrieved contig build 34 (genbank format) from NCBI (ftp.ncbi.nih.gov/genbank/genomes/). Genomic build numbers of different organisms are shown in Table 1.Gene information and mRNA sequence were fetched from NCBI RefSeq project, parsed by GenBank chromosome flat file.

Table 1. The data build of genomic sequences 

Organism

Data version

Homo sapiens

Build 35

Mus musculus

Build 32

Rattus norvegicus

Build 2

Caenorhabditis elegans

Build 1

Drosophila melanogaster

BDGP Release 3.2

Arabidopsis thaliana

Build 1

Comparison of alternative splicing databases

Database

Clustering method

Tools

No. of genes/spliced transcripts/alternative spliced

AS type

Organism

AS visualized display

Ref.

ASAP

3

BLAST

7,991 genes/ 30,793 spliced transcripts/

3’,5’,exon  skipping, mutually exclusive

Human

show splicing sites

1.             

ASD

cDNA sequences and literature

literature

AS 8,314 genes

-

Human and mouse

showed cDNAs only

2.             

ASDB

SWISS-PROT/ GenBank literature

BLAST

CLUSTALW

1,922 protein and 2,486 DNA sequences

-

Human, Mouse, Rat, Fly, Worm, Chicken, Virus, Bovine and Rabbit

-

3.             

ASG

3

BLAST, sim4

22,127 genes

3’,5’,exon skipping, Intron retentions

human

show 3’,5’,exon skipping AS

4.             

ASHESdb

1

BLASTN

1,229 genes/ 9,073 spliced transcripts

exon skipping

Human

show exons

5.             

EASED

2

BLASTN

(EST to mRNA), BLAST

EST cluster

Near 30,000 spliced transcripts

3’,5’,exon skipping, retained intron

Human, Plant, Cow, Worm, Fly, Fish, Mouse, Rat and Frog

show splicing sites

6.             

ECgene

2

BLAST,

SAGE

-

Inherited from ASmoldeler

Human,mouse,rat

UCSC browser

7.             

PALSDB

3

BLAST EST cluster

19,936 (human)16,615 (mouse) UniGene clusters/14,106 AS of 26,324UniGene clusters (human) 10,129 AS of 18,614UniGene clusters (mouse) 2,705 AS of 14,393UniGene clusters (worm)

3’,5’,exon skipping

Human, mouse and worm

show all ESTs and mRNAs

8.             

ProSplicer

3

BLAST (protein)

SIM4 (mRNA, and ESTs)

21,786 genes

3’,5’,exon skipping

Human

show proteins mRNAs and ESTs’ exon only

9.             

STACKDB

2

d2_cluster

(EST cluster)

270,515 clusters

-

Human

-

10.          

TAP

1

WU2BLASTN,sim4

1,124 transcripts

-

Human and mouse

-

11.          

Avatar

1

Mugup

26,377genes

(human)

/*AS 19,175(six taxo)

3’,5’,exon skipping

Human, mouse, rat, fruit fly, worm, thale  cress

show 3’, 5’,exon skipping AS

 

                                note:*=>only splicing sites with at least two ESTs supported are considered

                                Clustering method :

                                 1->EST/mRNA/cDNA to genome

                                 2->EST to mRNA     

                                 3->Unige

Released

               There are four functions (EST updating, Genome sequences updating, data report and Mail report component)  added in Avatar. EST report provides user querying the chromosome, contig and gene name which how many ESTs is added from dbEST recently. Genome report provides user the two genome( build 34 and build 35) structure overview. Mail report provides user EST and Genome report updated information by mail in Avatar.

        EST updating component:

 

 

 

 

 

 

                                                         figure 5 The flowchart of ESTs updated omponent

The component includes four steps aligned, grouping with ESTs, detected AS forms(3’,5’,exon skipping, etc.), and inserted database(figure 5). The authors would perform the three steps when dbEST have added new sequences.

        Genome sequences updating component:

                                    

                    figure 6. The flowchart of genome updated component

There are two inputs , previous results of ESTs aligning to genome, and BAC location in the annotation in this component.  First we get the previous results of ESTs aligning to genome. Those results are the input of the newly version of genome. Second, we parse the detailed information of BAC. Those are the most important in our work.We transfer each position of ESTs’ exon belong to the boundary of BAC. (The example is shown in figure 7.) However, part of ESTs do not belong to the boundary in any BAC. Cause some BACs are removed or rename. So those ESTs must realign with new genome again.

 

figure 7.The example of EST position transferred to new one. 

Data reporting component:       

        In this component, we show two parts genome sequence updating and EST sequences updating.

1.      Genome sequence updating

We parse genomic annotation and compare genomic structure (build 34 and build 35), than we provide a viewer showing them. Each BAC, gene, contig, has detailed information in annotated files (*.gbs or *.gbk). We use those data to reconstruct the genomic overview.

2.      EST sequences updating

When we add EST sequences to our database, this component would count the number of ESTs by gene’s boundary. We also provide reader with a simple way querying number of EST by chromosome, contig, and gene. We count est newly-increased quantity by gene each season and calculate AS form quantity , and and the last difference situation of one number of times amount.

       Mail report component:      

We provide user a free service that about our AS database’s updated message include ESTs and genome updating events. The URL for Avatar is http://avatar.iecs.fcu.edu.tw.

Reference    

[1] Jeremy D. Glasner, Paul Liss, Guy Plunkett III, Aaron Darling, Tejasvini Prasad, Michael Rusch, Alexis Byrnes, Michael Gilson, Bryan Biehl, Frederick R. Blattner, and Nicole T. Perna, ASAP, a systematic annotation package for community analysis of genomes, Nucleic Acids Res., Jan 2003; 31: 147 - 151.

[2] T. A. Thanaraj, Stefan Stamm, Francis Clark, Jean-Jack Riethoven, Vincent Le Texier, and Juha Muilu
ASD: the Alternative Splicing Database
Nucleic Acids Res., Jan 2004; 32: 64 - 69.

[3] I. Dralyuk, M. Brudno, M. S. Gelfand, M. Zorn, and I. Dubchak
ASDB: database of alternatively spliced genes
Nucleic Acids Res., Jan 2000; 28: 296 - 297.

[4] Jeremy Leipzig, Pavel Pevzner, and Steffen Heber
The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome
Nucleic Acids Res., Aug 2004; 32: 3977 - 3983.

[5] akharkar MK, Perumal BS, Lim Y.P., Lee P.C., Yu Y, Kangueane P., Alternatively spliced human genes by exon skipping :A database (ASHESdb), In Silico Biology, 5, 0021, Dec 2004. Bioinformation Systems e.V.

[6] Heike Pospisil, Alexander Herrmann, Ralf H. Bortfeldt, and Jens G. Reich
EASED: Extended Alternatively Spliced EST Database
Nucleic Acids Res., Jan 2004; 32: 70 - 74.

[7] Pora Kim, Namshin Kim, Younghee Lee, Bumjin Kim, Youngah Shin, and Sanghyuk Lee
ECgene: genome annotation for alternative splicing
Nucleic Acids Res., Jan 2005; 33: D75 - D79.

[8] Y.-H. Huang, Y.-T. Chen, J.-J. Lai, S.-T. Yang, and U.-C. Yang
PALS db: Putative Alternative Splicing database
Nucleic Acids Res., Jan 2002; 30: 186 - 190.

[9] Hsien-Da Huang, Jorng-Tzong Horng, Feng-Mao Lin, Yu-Chung Chang, and Chen-Chia Huang
SpliceInfo: an information repository for mRNA alternative splicing in human genome
Nucleic Acids Res., Jan 2005; 33: D80 - D85.

[10] Alan Christoffels, Antoine van Gelder, Gary Greyling, Robert Miller, Tania Hide, and Winston Hide
STACK: Sequence Tag Alignment and Consensus Knowledgebase
Nucleic Acids Res., Jan 2001; 29: 234 - 238.

[11] Zhengyan Kan, Eric C. Rouchka, Warren R. Gish, and David J. States
     Gene Structure Prediction and Alternative Splicing Analysis Using Genomically Aligned ESTs
     Genome Res., May 2001; 11: 889 - 900 ; 10.1101/gr.155001.

                                                                                                          HOME

©2004 BioGrid Lab. Tel:+886-4-24517250:3730     FCU  Taichung , TAIWAN 433