Glossary of Terms Commonly Used in Genomics Research

 

accession Number: a unique code that identifies a sequence in a database

 

algorithm: a procedure embedded in a computer program

 

alignment: The process of lining up two or more sequences to compare the degree of identity, for the purpose of assessing the degree of similarity and the possibility of homology

 

alternative splicing: mechanism by which different introns (intervening sequences found within a gene) are removed to form alternative sets of functional genes

 

amino acids:  The 20 organic compounds that are building blocks of proteins. The sequence of nucleotide bases in DNA determines the sequence of amino acids in proteins. Some examples of amino acids are: alanine, glycine, arginine, leucine

 

base/base pair: the 4 nitrogenous subunits (nucleotides) of DNA: adenine (abbreviated as A), guanine (G), cytosine (C), and thymine (T). Most organisms contain thousands or more of these, in a long double-stranded chain wrapped around itself in a double helix. The linear order (sequence) of the nucleotides defines the organism. DNA is double-stranded, and the 4 bases are complementary to each other (eg. A and T can only bind with each other, likewise G and C), thus determining the order of the bases on one strand infers the order of the other strand, so the terms base and base pair are often used interchangeably, and used as a measurement of the size of a genome (for example, the human genome is approximately 3 billion base pairs long).

 

BAC: Bacterial artificial chromosome, used as a vector to carry the DNA of another organism for cloning and molecular biology purposes

 

bioinformatics: the new field combining computer science, biology, and information technology, involving the storage, managing and analysis of large amounts of data

 

BLAST: basic local alignment search tool, a sequence comparison algorithm used in comparing sequences, available through NCBI

 

cDNA: complementary DNA, synthesized from mRNA, from which the introns have been spliced out

 

chromosome: the structure in cells that carries the linearly arranged genetic material. The number of chromosomes varies among species (for example, humans have 23 pairs, Arabidopsis has 5)

 

codon: a set of 3 nucleotides in a DNA sequence, which corresponds to a specific amino acid

 

comparative genomics: comparing the sequences of 2 or more organisms, used in identifying gene functions and evolutionary studies

 

computational biology:  analyzing and interpreting biological data data

 

COT analysis: uses the principles of DNA renaturation kinetics, where the rate at which a particular sequence reassociates (returns to the double-stranded state) is proportional to the number of times it is found in the genome (Cot stands for nucleotide concentration times reassociation time). This is used as a way of filtering out highly repetitive sequences, better enabling the sequencing of low copy sequences (more likely to be genes)

 

Database: a collection of data. See also relational database

 

DNA: Deoxyribose Nucleic Acid, the carrier molecule of genetic information. Made up of 2 long chains made up of nucleotides, which consist of a sugar (deoxyribose), a phosphate group and one of 4 nitrogenous bases (see also base).

 

DNA chip: see microarray

 

DNA fingerprinting: the creation of a unique DNA profile of an individual using molecular techniques

 

DNA sequence: the sequence of nucleotide bases that are in a DNA molecule. Expressed by the sequence of the letters representing each of the 4 nucleotide bases, for example GCATATTGCT.  This sequence is specific to each living organism.

 

ESTs: expressed sequence tags; partial gene sequences of the expressed part of the genome; used for gene discovery, particularly in organisms that have not yet been sequenced

 

Exon: the part of a DNA sequence that codes for a protein (usually in conjunction with other exons)

 

FASTA:  the first widely used search algorithm for database similarity searching; now sometimes used simply to denote the file format that sequences are commonly expressed in

 

functional genomics: studies of the structure and organization, and function of the genome in developmental and other life processes of an organism

 

gap: a space introduced into an alignment to compensate for insertions and deletions in one sequence relative to another

 

GenBank: the most used public database for sequence data and related information. Managed by NCBI, supported by the National Library of Medicine and NIH, available at http://www.ncbi.nlm.nih.gov

 

gene: the functional subunit of heredity, a sequence of nucleotides on a particular position of a chromosome which usually encodes for a specific functional product

 

gene expression: when a gene is "turned on", making a product

 

gene family: a group of closely related genes that produces similar protein products

 

genetic engineering: the technique of copying a gene from one living thing, such as a bacteria, plant or animal, and adding it to another. Most commonly used to add a new gene to a crop plant, giving it traits that may be beneficial to the farmer or consumer

 

genome: the entire genetic endowment of an organism. Genome sizes vary widely among organisms

 

genotype: the genetic constitution of an organism, see also phenotype

 

GMO: genetically modified organism. Although technically this could refer to any organism that has been genetically modified, even through traditional breeding and selection methods, typically the term now refers to an organism that has been modifed through genetic engineering methods. Also called transgenics

 

Haplotype: A collection of variable DNA sequences that tend to be inherited together

 

Heuristic: a procedure that derives an approximation to the real answer of a problem in a more economical or faster way than using the more mathematically "strict" algorithm.  However, obtaining the "True" answer is not guaranteed to a 100%. In computer science, heuristics are applied when finding the exact solution to a problem via strict algorithms is computationally impractical.

 

homology: having a common evolutionary origin, relatedness (now often used simply to describe similarity in DNA sequence)

 

imprinting: The phenomenon in which a gene may be expressed differently in an offspring depending on whether it was inherited from the father or the mother.

 

intron: A DNA sequence that interrupts the sequences coding for a gene product (exons).

 

Junk DNA: non-coding DNA; DNA that does not directly code for proteins. May have functions as structural stabilizers, controlling gene expression, or other

 

library: a set of sequences or clone, usually generated at once for a specific purpose. Examples include EST libraries, BAC libraries, etc.

 

mapping: identifying the location of a gene or DNA segment along a chromosome

 

metabolomics: the study of the unique chemical fingerprints that specific cellular processes leave behind; the study of an organismÕs small-molecule metabolite profiles

 

microarray (or DNA chip, gene chip): device where tens of thousands of genes or DNA segments are attached to a small, thumb-sized chip, and can be simultaneously assessed to detect specific genes or gene activity or expression

 

minimal tiling path: the minimum number of overlapping clones in a physical map needed to generate a sequence of the whole genome

 

molecular marker: gene or DNA segment with a known location on a chromosome. (for a good tutorial on the uses of markers, see the downloadable training materials available from the International Plant Genetic Resources Institute, http://www.ipgri.cgiar.org/

 

mutation: abrupt change in the genotype of an organism that is not the result of recombination

 

NCBI: National Center for Biotechnology Information, which manages GenBank, PubMed (a database of publications), and other databases (available at http://www.ncbi.nlm.nih.gov)

 

nucleic acids: see base/base pair and DNA

 

nucleotide: contains one base, one phosphate molecule, and the sugar molecule deoxyribose. The bases in DNA nucleotides are adenine, thymine, guanine, and cytosine (abbreviated A, T, G, and C). See also base/base pair

 

orthologs: homologous genes from different species that are derived from a common ancestral gene at the time of the last common ancestor

 

paralogs: genes within a species that arose from  gene duplication

 

PCR: polymerase chain reaction, the process by which a small fragment of DNA can be replicated into millions of copies. It is done in a small desktop machine called a thermalcycler, which, through temperatures cycles, stimulates the DNA synthesis process

 

phenotype: the traits displayed by an organism as a result of its genetic constitution (genotype)

 

phylogenetics is the field of biology that deals with identifying and understanding the relationships between the different kinds of life on earth.

 

phylogenomics: a method of assigning a function to a gene based on its evolutionary history in a Phylogenetic tree; Phylogenomics uses knowledge on the evolution of a gene to improve function prediction.

 

Physical map: the linear order of sequences along a chromosome, often generated by the use of overlapping clones such as BAC clones

 

polyploidy: the occurrence of whole genome or large scale duplications within a genome, typically in plants

 

promoter: the part of a gene that contains the information to turn the gene on or off

 

proteins: large complex molecules made up of amino acids that make up most cellular structures and catalyze most reactions

 

proteome: the set of all proteins in a cell. Unlike the relatively unchanging genome, the dynamic proteome changes from minute to minute in response to tens of thousands of intra- and extracellular environmental signals

 

proteomics: the large-scale analysis of an organism's proteins to reveal expression and functions

 

recombination: formation in offspring of genetic combinations not present in the parents through the physical exchange of genetic material during cell division

 

regulatory DNA: DNA that controls the activity of genes. These DNA sequences tend to be short and located near the genes they control (but not always).

 

relational database: a database that cross-references the different types of data it contains, and allows queries of any type (a sequence, the sequence name, etc.) to retrieve data

 

RNA: Ribonucleic acid, the molecule responsible for translating the genetic information into proteins. Made up of one long chain of nucleotides, the bases of which are the same as DNA except that uracil is used instead of thymine. There are three main types of RNA: messenger RNA, transfer RNA, and ribosomal RNA.

 

RNA interference (RNAi): a system in cells for Òturning off,Ó or silencing, particular genes. Scientists can now mimic this process to help identify the functions of individual genes.

 

sequencing: determining the order (sequence) of nucleotide bases in a segment of DNA (or RNA or protein, less commonly). Samples are run on an electrophoresis gel, on which the 4 bases give distinctive banding patterns. By ordering the overlapping fragments, the sequence of the entire DNA segment can be deduced

 

SNPs: single nucleotide polymorphisms (pronounced "snip"). A single nucleotide difference between 2 or more sequences, caused by allelic variation or mutations. Can be used as genetic markers, to track inheritance in families or species

 

structural genomics: identifying the 3-D structures of proteins, which will help identify their functions and provide targets for drug design

 

syntenic: orthologous loci from different species that are in the same order in their respective species (originally meant only that they were located on the same chromosome, but is now used to mean colinear as well). For example, genes on a human chromosome that are found in the same order on a mouse chromosome

 

transgenic: an organism containing genetic material from another organism transferred by genetic engineering. See also GMO.

 

transcription: the process by which RNA is formed from DNA, thereby activating the genes

 

transcriptome: the sum of all the regions of a genome that are transcribed

 

transcriptomics: depicts the expression level of genes, often using techniques capable of sampling tens of thousands of different mRNA molecules at a time, using technologies such as microarrays.

 

transformation: the process of adding a gene from one organism into another

 

transposon: a genetic element that can move within the genome

 

unigene: non-redundant set of gene-oriented clusters, often generated by clustering large amounts of ESTs

 

universal primers: primers that will amplify orthologous sequences in different species

 

UTR: untranslated region, that part of a gene that is not translated into protein.

 

 

Main Sources and other glossaries

 

Chemis Interactive Molecular Library: nucleic acids http://www.geneticengineering.org/chemis/Chemis-NucleicAcid/DNA.htm, 2000, Dr Didier Collomb 2/13/02

Friend, S.H. and Stoughton, R.B. (2002, February). The magic of microarrays. Scientific American, pp. 44-53

Hartwell, L.H., Hood, L., Goldberg, M., Reynolds, A.E., Silver, L.M., & Veres, R.C. (2000). Genetics: from genes to genomes. New York: McGraw-Hill Companies, Inc.

Interagency Working Group on Plant Genomes (2000). National Plant Genome Initiative. Washington, D.C.: National Science and Technology Council

Genomics Initative, a supplement to the Cornell Chronicle. (1999, January). Cornell University

Human Genome Management Information System (HGMIS) (2001). Genomics and its impact on medicine and society: a primer, [pdf]. HGMIS at Oak Ridge National Laboratory, Oak Ridge, TN, for the U.S. Department of Energy Human Genome Program. Available at http://www.ornl.gov/hgmis

National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/

National Institutes of Health, National Institute of General Medical Sciences (2001) Genetics Basics. NIH Publication No. 01-662. Also available at: http://publications.nigms.nih.gov/genetics/

Genome News Network glossary http://www.genomenewsnetwork.org/

Wikipedia, the free encyclopedia http://en.wikipedia.org/

 

 

 

For reviews of some online glossaries in genomics and biotechnology, see http://www.sciencegenomics.org