Email updates

Keep up to date with the latest news and content from Arthritis Research & Therapy and BioMed Central.

This article is part of the supplement: 25th European Workshop for Rheumatology Research

Open Badges Poster presentation

A complete phylogenetic analysis coupling expression data from EST databases. An example with a family of genes: the peptidyl arginine deiminase genes

N Balandraud, P Gouret, E Danchin, M Blanc, D Zinn and P Pontarotti

Author Affiliations

EA 3781 Evolution, Genome, Environment, Université de Provence, Marseille, France

For all author emails, please log on.

Arthritis Research & Therapy 2005, 7(Suppl 1):P34  doi:10.1186/ar1555

The electronic version of this article is the complete one and can be found online at:

Received:11 January 2005
Published:17 February 2005

© 2005 BioMed Central Ltd


For functional annotation, similarity-based approaches [1] do not take into account all the information from comparative and evolutionary biology. They do not differentiate between orthologs and paralogs among homologs and, furthermore, the closest BLAST is often not the nearest neighbour [2]. Phylogenetic approaches taking into account duplication and speciation events are necessary to solve these problems. But they do not blend any data from transcriptional behaviour. Nevertheless, orthologs can have very similar 'molecular function' but undergo a different 'macroscopic function' because of a transcriptional shift.

Growing data for gene expression profiling are available in various databases concerning normal or pathological tissues (Expressed Sequence Tags [ESTs] from NR, TIGR, GeneNote, Gepis, etc.). Some works recently examined the correlation between evolution (duplication and speciation) of genes and expression divergence within and between species [3,4], and some examine the expression profile between orthologous genes in sequenced species [5].


We performed a phylogenetic analysis of a protein family, using EST databases. This allowed us to enlarge the dataset of species containing homologs and consequently to improve the reconstruction of the genes' evolutionary history. We then extracted all the transcriptional data contained in EST databases, to decipher the gene expression pattern. Because gene annotation is currently labour intensive, we used a locally developed platform dedicated to phylogenetic annotation (named FIGENIX) [6]. We validated this approach on a family of genes possibly implied in rheumatoid arthritis; the peptidyl arginine deiminase (PADI) genes.


We show here a phylogenetic annotation with an enlarged dataset including EST contigs and expression data. It allowed us to integrate more functional data for analysis of a set of genes and permits us to give a transcriptional footprint of the gene. Our analysis showed that the PADI-2 paralog group have kept the ancestral molecular function coupled with a probable ancestral expression profile. These classified data permitted us to perform an updated footprint of the transcriptional data for each paralog group from this protein family.


We believe this method announces a new way to annotate uncharacterized ESTs. More than classical phylogeny, it allows highlighting of the transcriptional shift between paralogs, and is thus a good tool to improve annotation. It showed that functional shift can occur in differential tissue expression rather than in biochemical function of the protein.

This method of analysis is at its beginning and has to be extended to all kinds of expression database, including databases where expression data are normalized such as UniGene. In the future it cannot be ignored in annotating new unknown ESTs, underlined by DNA microarray assays for example.


This work is supported by the French Society of Rheumatology (SFR).


  1. Altschul SF, et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

    Nucleic Acids Res 1997, 25:3389-3402. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  2. Koski LB, Golding GB: The closest BLAST hit is often not the nearest neighbor.

    J Mol Evol 2001, 52:540-542. PubMed Abstract | Publisher Full Text OpenURL

  3. Gu Z, et al.: Duplicate genes increase gene expression diversity within and between species.

    Nat Genet 2004, 36:577-579. PubMed Abstract | Publisher Full Text OpenURL

  4. Huminiecki L, Wolfe KH: Divergence of spatial gene expression profiles following species-specific gene duplications in human and mouse.

    Genome Res 2004, 14:1870-1879. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  5. Yanai I, et al.: Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification.

    Bioinformatics 2004, in press. OpenURL

  6. Gouret P, et al.: 'Intelligent' automation of genomics annotation: expertise integration in a new software platform, Figenix.

    Genome Res 2004, in press. OpenURL