The chemokine information source: identification and characterization of novel chemokines using the WorldWideWeb and expressed sequence tag databases

T N Wells; M C Peitsch

doi:10.1002/jlb.61.5.545

The chemokine information source: identification and characterization of novel chemokines using the WorldWideWeb and expressed sequence tag databases

J Leukoc Biol. 1997 May;61(5):545-50. doi: 10.1002/jlb.61.5.545.

Authors

T N Wells¹, M C Peitsch

Affiliation

¹ Geneva Biomedical Research Institute, Switzerland.

PMID: 9129202
DOI: 10.1002/jlb.61.5.545

Abstract

The chemokine superfamily is a large group of more than 30 small proteins. Many of these were originally identified because of their role in the selective recruitment and activation of leukocytes during inflammation. More recently, some of the chemokine receptors and ligands have been implicated in the mechanism of viral infection for primate lentiviruses such as HIV-1. From the original identification of interleukin-8 (IL-8; the most studied member of the superfamily), the number of new family members has mushroomed over the last few years. Two events have dramatically altered the speed at which sequence information concerning novel chemokines has become available to the scientific community. First, many groups have been obtaining large amounts of sequence information from cDNA libraries by sequencing the clones at random, generating expressed sequence tags (ESTs). Although these ESTs are relatively short, typically less than 500 bases, this amount of sequence is usually sufficient to obtain the entire open reading frame for chemokines. Second, there has been a rapid growth in the use of the WorldWideWeb by bioinformatics groups. The Web was originally set up by the European Centre for Particle Physics (CERN) in Geneva as a method of transferring data between collaborating groups throughout the world. It has enabled biologists throughout the world to have almost instantaneous access both to the databases containing the EST sequences and to the automated tools that are required for searching such databases. With such methods we have been able to rapidly identify more than 10 new human chemokines from public domain databases. In addition to the known categories of chemokines, which are named C, CC, and CXC based on the spacings of N-terminal cysteine residues, we have been able to identify the first member of a novel chemokine subfamily, with a novel CXXXC cysteine spacing. Furthermore, we can subdivide the CC chemokines into monocyte chemotactic protein and macrophage inflammatory protein families based on their sequence identity levels, but also their clustering on the human genome, as identified on other Web sites. The rapid availability of all this data has reduced the amount of time spent on conventional gene identification, enabling us to move quickly on to trying to understand the biology and physiological relevance of these molecules. The novel chemokine sequences obtained and alignments with existing members of the superfamily are now contained within a Chemokine Information Source on an open access server, allowing further searching of chemokine sequences and increasing the availability of such data to the scientific community.

Publication types

Review

MeSH terms

Amino Acid Sequence
Chemokines*
Databases, Factual*
Humans
Information Systems*
Molecular Sequence Data
Sequence Homology, Amino Acid

Substances

Chemokines

Associated data

GENBANK/U67775
PDB/1HUM
PDB/1IL8