Construction of a Medicinal Leech Transcriptome Database and Its Application to the Identification of Leech Homologs of Neural and Innate Immune Genes

Eduardo R. Macagno1,+,*, Terry Gaasterland1,2,+,*, Lee Edsall2, Vineet Bafna3, Marcelo B. Soares4, Todd Scheetz5, Thomas Casavant5, Corinne Da silva6, Patrick Wincker6, and Michel Salzet7*

Addresses: 1Division of Biological Sciences, University of California, San Diego, CA, USA; 2Scripps Institution of Oceanography, University of California, San Diego, CA, USA; 3Department of Computer Science, University of California, San Diego, CA, USA; 4Cancer Biology and Epigenomics Program, Children's Memorial Research Center, and Department of Pediatrics, Northwestern University's Feinberg School of Medicine, Chicago, IL, USA; 5Department of Biomedical Engineering, Center for Bioinformatics and Computational Biology, University of Iowa, Iowa, USA; 6CEA, DSV, IG, Genoscope, 2 rue Gaston Crémieux CP5706, 91057 Evry Cedex France., 7Université de Lille 1, CNRS, Laboratoire de Neuroimmunologie des Annélides, FRE2933, 59655 Villeneuve d'Ascq, France.

Background:The medicinal leech, Hirudo medicinalis, is an important model system for the study of nervous system structure, function, development, regeneration and repair. It is also a unique species in being presently approved for use in medical procedures, such as clearing of pooled blood following certain surgical procedures. It is a current, and potentially also future, source of medically useful molecular factors, such as anticoagulants and antibacterial peptides, which may have evolved as a result of its parasitizing large mammals, including humans. Despite the broad focus of research on this system, little has been done at the genomic or transcriptomic levels and there is a paucity of openly available sequence data. To begin to address this problem, we have constructed whole embryo and adult central nervous system (CNS) EST libraries and have created a clustered sequence database of the Hirudo transcriptome that is available to the scientific community.

Results: A total of ~133,000 EST clones from two directionally-cloned cDNA libraries, one constructed from mRNA derived from whole embryos at several developmental stages and the other from adult CNS cords, were sequenced in either or in both directions by three different groups: Genoscope (French National Sequencing Center), the University of Iowa Sequencing Facility and the DOE Joint Genome Institute. These were assembled using the PHRAP software package into 31,232 unique contigs and singletons, with an average length of 827 nt. The assembled transcripts were then translated in all six frames and compared to proteins in NCBI's non-redundant (NR) and to the Gene Ontology (GO) protein sequence databases, resulting in 15,565 matches to 11,236 proteins in NR and 13,935 matches to 8,073 proteins in GO. Searching the database for transcripts of genes homologous to those thought to be involved in the innate immune responses of vertebrates and other invertebrates yielded a set of nearly one hundred evolutionarily conserved sequences, representing all known pathways involved in these important functions.

Conclusions: The sequences obtained for Hirudo transcripts represent the first major database of genes expressed in this important model system. Comparison of translated open reading frames (ORFs) with the other openly available leech datasets, the genome and transcriptome of Helobdella robusta, shows an average identity at the amino acid level of 58% in matched sequences. Interestingly, comparison with other available Lophotrochozoans shows similar high levels of amino acid identity where sequences match, 64% for Capitella capitata (a polychaete) and 56% for Aplysia californica (a mollusk). Phylogenetic comparisons of putative Hirudo innate immune response genes present within the Hirudo transcriptome database herein described show a strong resemblance to the corresponding mammalian genes, indicating that this important physiological response may have older origins than what has been previously proposed.

               +: co-first authors
               *: co-corresponding authors:,,

               Leechmaster Database - password required
               Supplementary Table 1  - neural transcripts in the top 30 neural categories
               Supplementary Table 2  - neural transcripts
               Supplementary Table 3  - immune system transcripts