What do you call the process that controls the expression of genetic information?

Genes act by determining the structure of proteins, which are responsible for directing cell metabolism through their activity as enzymes. The identification of DNA as the genetic material and the elucidation of its structure revealed that genetic information must be specified by the order of the four bases (A, C, G, and T) that make up the DNA molecule. Proteins, in turn, are polymers of 20 amino acids, the sequence of which determines their structure and function. The first direct link between a genetic mutation and an alteration in the amino acid sequence of a protein was made in 1957, when it was found that patients with the inherited disease sickle-cell anemia had hemoglobin molecules that differed from normal ones by a single amino acid substitution. Deeper understanding of the molecular relationship between DNA and proteins came, however, from a series of experiments that took advantage of E. coli and its viruses as genetic models.

Colinearity of Genes and Proteins

The simplest hypothesis to account for the relationship between genes and enzymes was that the order of nucleotides in DNA specified the order of amino acids in a protein. Mutations in a gene would correspond to alterations in the sequence of DNA, which might result from the substitution of one nucleotide for another or from the addition or deletion of nucleotides. These changes in the nucleotide sequence of DNA would then lead to corresponding changes in the amino acid sequence of the protein encoded by the gene in question. This hypothesis predicted that different mutations within a single gene could alter different amino acids in the encoded protein, and that the positions of mutations in a gene should reflect the positions of amino acid alterations in its protein product.

The rapid replication and the simplicity of the genetic system of E. coli were of major help in addressing these questions. A variety of mutants of E. coli could be isolated, including nutritional mutants that (like the Neurospora mutants discussed earlier) require particular amino acids for growth. Importantly, the rapid growth of E. coli made feasible the isolation and mapping of multiple mutants in a single gene, leading to the first demonstration of the linear relationship between genes and proteins. In these studies, Charles Yanofsky and his colleagues mapped a series of mutations in the gene that encodes an enzyme required for synthesis of the amino acid tryptophan. Analysis of the enzymes encoded by the mutant genes indicated that the relative positions of the amino acid alterations were the same as those of the corresponding mutations (Figure 3.10). Thus, the sequence of amino acids in the protein was colinear with that of mutations in the gene, as expected if the order of nucleotides in DNA specifies the order of amino acids in proteins.

What do you call the process that controls the expression of genetic information?

Figure 3.10

Colinearity of genes and proteins. A series of mutations (arrowheads) were mapped in the E. coli gene encoding tryptophan synthetase (top line). The amino acid substitutions resulting from each of the mutations was then determined by sequence analysis (more...)

The Role of Messenger RNA

Although the sequence of nucleotides in DNA appeared to specify the order of amino acids in proteins, it did not necessarily follow that DNA itself directs protein synthesis. Indeed, this appeared not to be the case, since DNA is located in the nucleus of eukaryotic cells, whereas protein synthesis takes place in the cytoplasm. Some other molecule was therefore needed to convey genetic information from DNA to the sites of protein synthesis (the ribosomes).

RNA appeared a likely candidate for such an intermediate because the similarity of its structure to that of DNA suggested that RNA could be synthesized from a DNA template (Figure 3.11). RNA differs from DNA in that it is single-stranded rather than double-stranded, its sugar component is ribose instead of deoxyribose, and it contains the pyrimidine base uracil (U) instead of thymine (T) (see Figure 2.10). However, neither the change in sugar nor the substitution of U for T alters base pairing, so the synthesis of RNA can be readily directed by a DNA template. Moreover, since RNA is located primarily in the cytoplasm, it appeared a logical intermediate to convey information from DNA to the ribosomes. These characteristics of RNA suggested a pathway for the flow of genetic information that is known as the central dogma of molecular biology:

What do you call the process that controls the expression of genetic information?

What do you call the process that controls the expression of genetic information?

Figure 3.11

Synthesis of RNA from DNA. The two strands of DNA unwind, and one is used as a template for synthesis of a complementary strand of RNA.

According to this concept, RNA molecules are synthesized from DNA templates (a process called transcription), and proteins are synthesized from RNA templates (a process called translation).

Experimental evidence for the RNA intermediates postulated by the central dogma was obtained by Sidney Brenner, Francois Jacob, and Matthew Meselson in studies of E. coli infected with the bacteriophage T4. The synthesis of E. coli RNA stops following infection by T4, and the only new RNA synthesized in infected bacteria is transcribed from T4 DNA. This T4 RNA becomes associated with bacterial ribosomes, thus conveying the information from DNA to the site of protein synthesis. Because of their role as intermediates in the flow of genetic information, RNA molecules that serve as templates for protein synthesis are called messenger RNAs (mRNAs). They are transcribed by an enzyme (RNA polymerase) that catalyzes the synthesis of RNA from a DNA template.

In addition to mRNA, two other types of RNA molecules are important in protein synthesis. Ribosomal RNA (rRNA) is a component of ribosomes, and transfer RNAs (tRNAs) serve as adaptor molecules that align amino acids along the mRNA template. The structures and functions of these molecules are discussed in the following section and in more detail in Chapters 6 and 7.

The Genetic Code

How is the nucleotide sequence of mRNA translated into the amino acid sequence of a protein? In this step of gene expression genetic information is transferred between chemically unrelated types of macromolecules—nucleic acids and proteins—raising two new types of problems in understanding the action of genes.

First, since amino acids are structurally unrelated to the nucleic acid bases, direct complementary pairing between mRNA and amino acids during the incorporation of amino acids into proteins seemed impossible. How then could amino acids align on an mRNA template during protein synthesis? This question was solved by the discovery that tRNAs serve as adaptors between amino acids and mRNA during translation (Figure 3.12). Prior to its use in protein synthesis, each amino acid is attached by a specific enzyme to its appropriate tRNA. Base pairing between a recognition sequence on each tRNA and a complementary sequence on the mRNA then directs the attached amino acid to its correct position on the mRNA template.

What do you call the process that controls the expression of genetic information?

Figure 3.12

Function of transfer RNA. Transfer RNA serves as an adaptor during protein synthesis. Each amino acid (e.g., histidine) is attached to the 3′ end of a specific tRNA by an appropriate enzyme (an aminoacyl tRNA synthetase). The charged tRNAs then (more...)

The second problem in the translation of nucleotide sequence to amino acid sequence was determination of the genetic code. How could the information contained in the sequence of four different nucleotides be converted to the sequences of 20 different amino acids in proteins? Because 20 amino acids must be specified by only four nucleotides, at least three nucleotides must be used to encode each amino acid. Used singly, four nucleotides could encode only four amino acids and, used in pairs, four nucleotides could encode only sixteen (42) amino acids. Used as triplets, however, four nucleotides could encode 64 (43) different amino acids—more than enough to account for the 20 amino acids actually found in proteins.

Direct experimental evidence for the triplet code was obtained by studies of bacteriophage T4 bearing mutations in an extensively studied gene called rII. Phages with mutations in this gene form abnormally large plaques, which can be clearly distinguished from those formed by wild-type phages. Hence, isolating and mapping a number of rII mutants was easy and led to the establishment of a detailed genetic map of this locus. Study of recombinants between rII mutants that had arisen by additions or deletions of nucleotides revealed that phages containing additions or deletions of one or two nucleotides always exhibited the mutant phenotype. Phages containing additions or deletions of three nucleotides, however, were frequently wild-type in function (Figure 3.13). These findings suggested that the gene is read in groups of three nucleotides, starting from a fixed point. Additions or deletions of one or two nucleotides would then alter the reading frame of the entire gene, leading to the coding of abnormal amino acids throughout the encoded protein. In contrast, additions or deletions of three nucleotides would lead to the addition or deletion of only a single amino acid; the rest of the amino acid sequence would remain unaltered, frequently yielding an active protein.

What do you call the process that controls the expression of genetic information?

Figure 3.13

Genetic evidence for a triplet code. A series of mutations consisting of additions of one, two, or three nucleotides were studied in the rII gene of bacteriophage T4. Additions of one or two nucleotides alter the reading frame of the remainder of the (more...)

Deciphering the genetic code thus became a problem of assigning nucleotide triplets to their corresponding amino acids. This problem was approached using in vitro systems that could carry out protein synthesis (in vitro translation). Cell extracts containing ribosomes, amino acids, tRNAs, and the enzymes responsible for attaching amino acids to the appropriate tRNAs (aminoacyl-tRNA synthetases) were known to catalyze the incorporation of amino acids into proteins. However, such protein synthesis depends on the presence of mRNA bound to the ribosomes, and can be greatly enhanced by the addition of purified mRNA. Since added mRNA directs protein synthesis in such systems, the genetic code could be deciphered by study of the translation of synthetic mRNAs of known base sequence.

The first such experiment, performed by Marshall Nirenberg and Heinrich Matthaei, involved the in vitro translation of a synthetic RNA polymer containing only uracil (Figure 3.14). This poly-U template was found to direct the incorporation of only a single amino acid—phenylalanine—into a polypeptide consisting of repeated phenylalanine residues. Therefore, the triplet UUU encodes the amino acid phenylalanine. Similar experiments with RNA polymers containing only single nucleotides established that AAA encodes lysine and CCC encodes proline. The remainder of the code was deciphered using RNA polymers containing mixtures of nucleotides, leading to the coding assignment of all 64 possible triplets (called codons) (Table 3.1). Of the 64 codons, 61 specify an amino acid; the remaining three (UAA, UAG, and UGA) are stop codons that signal the termination of protein synthesis. The code is degenerate; that is, many amino acids are specified by more than one codon. With few exceptions (discussed in Chapter 10), all organisms utilize the same genetic code, providing strong support for the conclusion that all present-day cells evolved from a common ancestor.

What do you call the process that controls the expression of genetic information?

Figure 3.14

The triplet UUU encodes phenylalanine. In vitro translation of a synthetic RNA consisting of repeated uracils (a poly-U template) results in the synthesis of a polypeptide containing only phenylalanine.

RNA Viruses and Reverse Transcription

With the elucidation of the genetic code, the fundamental principles of the molecular biology of cells appeared to have been established. According to the central dogma, the genetic material consists of DNA, which is capable of self-replication as well as being transcribed into mRNA, which serves in turn as the template for protein synthesis. However, as noted in Chapter 1, many viruses contain RNA rather than DNA as their genetic material, implying the use of other modes of information transfer.

RNA genomes were first discovered in plant viruses, many of which were found to be composed of only RNA and protein. Direct proof that RNA acts as the genetic material of these viruses was obtained in the 1950s by experiments demonstrating that RNA purified from tobacco mosaic virus could infect new host cells, giving rise to infectious progeny virus. The mode of replication of most viral RNA genomes was subsequently determined by studies of the RNA bacteriophages of E. coli. These viruses were found to encode a specific enzyme that could catalyze the synthesis of RNA from an RNA template (RNA-directed RNA synthesis), using the same mechanism of base pairing between complementary strands as is employed during DNA replication or transcription of RNA from DNA.

However, RNA-directed RNA synthesis did not appear to account for the replication of certain animal viruses (RNA tumor viruses), which were of particular interest because of their ability to cause cancer in infected animals. Although these viruses contain genomic RNA in their viral particles, experiments performed by Howard Temin in the early 1960s indicated that their replication requires DNA synthesis in infected cells, leading to the hypothesis that the RNA tumor viruses (now called retroviruses) replicate via synthesis of a DNA intermediate, called a DNA provirus (Figure 3.15). This hypothesis was initially met with widespread disbelief because it involves RNA-directed synthesis of DNA—a reversal of the central dogma. In 1970, however, Temin and David Baltimore independently discovered that the RNA tumor viruses contain a novel enzyme that catalyzes the synthesis of DNA from an RNA template. In addition, clear-cut evidence for the existence of viral DNA sequences in infected cells was obtained. The synthesis of DNA from RNA, now called reverse transcription, was thus established as a mode of information transfer in biological systems.

What do you call the process that controls the expression of genetic information?

Figure 3.15

Reverse transcription and retrovirus replication. Retroviruses contain RNA genomes in their viral particles. When a retrovirus infects a host cell, however, a DNA copy of the viral RNA is synthesized via reverse transcription. This viral DNA is then integrated (more...)

Reverse transcription is important not only in the replication of retroviruses, but also in at least two other broad aspects of molecular and cellular biology. First, reverse transcription is not restricted to retroviruses; it also occurs in cells and, as discussed in Chapter 5, is frequently responsible for the transposition of DNA sequences from one chromosomal location to another. Second, enzymes that catalyze RNA-directed DNA synthesis (reverse transcriptases) can be used experimentally to generate DNA copies of any RNA molecule. The use of reverse transcriptase has thus allowed mRNAs of eukaryotic cells to be studied using the molecular approaches that are currently applied to the manipulation of DNA, as discussed in the following section.

What do you call the process that controls the expression of genetic information?

Box

Key Experiment: The DNA Provirus Hypothesis.

What process controls gene expression?

Specifically, gene expression is controlled on two levels. First, transcription is controlled by limiting the amount of mRNA that is produced from a particular gene. The second level of control is through post-transcriptional events that regulate the translation of mRNA into proteins.

What is controlled by genetic information?

Genetic material, including genes and DNA, controls the development, maintenance and reproduction of organisms. Genetic information is passed from generation to generation through inherited units of chemical information (in most cases, genes).

Which process converts genetic information?

The answer is 'transcription'. The genetic information of DNA (deoxyribonucleic acid) gets converted into RNA (ribonucleic acid) by the process of transcription. This RNA molecule encodes proteins. The transcription process starts when RNA polymerase bind to the specific DNA sequence known as the promoter.

What processes affect gene expression?

Epigenetic processes, including DNA methylation, histone modification and various RNA-mediated processes, are thought to influence gene expression chiefly at the level of transcription; however, other steps in the process (for example, translation) may also be regulated epigenetically.