The practical significance of deciphering the genetic code. What is a genetic code: general information

In any cell and organism, all anatomical, morphological and functional features are determined by the structure of the proteins that comprise them. The hereditary property of the body is the ability to synthesize certain proteins. Amino acids are located in a polypeptide chain, on which biological characteristics depend.
Each cell has its own sequence of nucleotides in the polynucleotide chain of DNA. This is the genetic code of DNA. Through it, information about the synthesis of certain proteins is recorded. This article describes what the genetic code is, its properties and genetic information.

A little history

The idea that there might be a genetic code was formulated by J. Gamow and A. Down in the mid-twentieth century. They described that the nucleotide sequence responsible for the synthesis of a particular amino acid contains at least three units. Later they proved the exact number of three nucleotides (this is a unit of genetic code), which was called a triplet or codon. There are sixty-four nucleotides in total, because the acid molecule where RNA occurs is made up of four different nucleotide residues.

What is genetic code

The method of encoding the sequence of amino acid proteins due to the sequence of nucleotides is characteristic of all living cells and organisms. This is what the genetic code is.
There are four nucleotides in DNA:

  • adenine - A;
  • guanine - G;
  • cytosine - C;
  • thymine - T.

They are denoted by capital Latin or (in Russian-language literature) Russian letters.
RNA also contains four nucleotides, but one of them is different from DNA:

  • adenine - A;
  • guanine - G;
  • cytosine - C;
  • uracil - U.

All nucleotides are arranged in chains, with DNA having a double helix and RNA having a single helix.
Proteins are built on where they, located in a certain sequence, determine it biological properties.

Properties of the genetic code

Tripletity. A unit of genetic code consists of three letters, it is triplet. This means that the twenty amino acids that exist are encoded by three specific nucleotides called codons or trilpets. There are sixty-four combinations that can be created from four nucleotides. This amount is more than enough to encode twenty amino acids.
Degeneracy. Each amino acid corresponds to more than one codon, with the exception of methionine and tryptophan.
Unambiguity. One codon codes for one amino acid. For example, in the gene healthy person with information about the beta target of hemoglobin, the triplet of GAG and GAA encodes A in everyone with sickle cell disease, one nucleotide is changed.
Collinearity. The sequence of amino acids always corresponds to the sequence of nucleotides that the gene contains.
The genetic code is continuous and compact, which means that it has no punctuation marks. That is, starting at a certain codon, continuous reading occurs. For example, AUGGGUGTSUUAAUGUG will be read as: AUG, GUG, TSUU, AAU, GUG. But not AUG, UGG and so on or anything else.
Versatility. It is the same for absolutely all terrestrial organisms, from humans to fish, fungi and bacteria.

Table

Not all available amino acids are included in the table presented. Hydroxyproline, hydroxylysine, phosphoserine, iodine derivatives of tyrosine, cystine and some others are absent, since they are derivatives of other amino acids encoded by m-RNA and formed after modification of proteins as a result of translation.
From the properties of the genetic code it is known that one codon is capable of encoding one amino acid. The exception is the genetic code that performs additional functions and encodes valine and methionine. The mRNA, being at the beginning of the codon, attaches t-RNA, which carries formylmethione. Upon completion of the synthesis, it is cleaved off and takes the formyl residue with it, transforming into a methionine residue. Thus, the above codons are the initiators of the synthesis of the polypeptide chain. If they are not at the beginning, then they are no different from the others.

Genetic information

This concept means a program of properties that is passed down from ancestors. It is embedded in heredity as a genetic code.
The genetic code is realized during protein synthesis:

  • messenger RNA;
  • ribosomal rRNA.

Information is transmitted through direct communication (DNA-RNA-protein) and reverse communication (medium-protein-DNA).
Organisms can receive, store, transmit it and use it most effectively.
Passed on by inheritance, information determines the development of a particular organism. But due to interaction with environment the latter's reaction is distorted, due to which evolution and development occur. In this way, new information is introduced into the body.


Computing patterns molecular biology and the discovery of the genetic code illustrated the need to combine genetics with Darwin's theory, on the basis of which a synthetic theory of evolution emerged - non-classical biology.
Darwin's heredity, variation and natural selection are complemented by genetically determined selection. Evolution is realized at the genetic level through random mutations and the inheritance of the most valuable traits that are most adapted to the environment.

Decoding the human code

In the nineties, the Human Genome Project was launched, as a result of which genome fragments containing 99.99% of human genes were discovered in the two thousandths. Fragments that are not involved in protein synthesis and are not encoded remain unknown. Their role remains unknown for now.

Last discovered in 2006, chromosome 1 is the longest in the genome. More than three hundred and fifty diseases, including cancer, appear as a result of disorders and mutations in it.

The role of such studies cannot be overestimated. When they discovered what the genetic code is, it became known according to what patterns development occurs, how the morphological structure, psyche, predisposition to certain diseases, metabolism and defects of individuals are formed.

In the body's metabolism leading role belongs to proteins and nucleic acids.
Protein substances form the basis of all vital cell structures and have an unusually high reactivity, endowed with catalytic functions.
Nucleic acids are part of the most important organ of the cell - the nucleus, as well as the cytoplasm, ribosomes, mitochondria, etc. Nucleic acids play an important, primary role in heredity, body variability, and protein synthesis.

Plan synthesis protein is stored in the cell nucleus, and direct synthesis occurs outside the nucleus, so it is necessary delivery service encoded plan from the nucleus to the place of synthesis. This delivery service is performed by RNA molecules.

The process starts at core cells: part of the DNA “ladder” unwinds and opens. Thanks to this, the RNA letters form bonds with the open DNA letters of one of the DNA strands. The enzyme transfers the RNA letters to join them into a strand. This is how the letters of DNA are “rewritten” into the letters of RNA. The newly formed RNA chain is separated, and the DNA “ladder” twists again. The process of reading information from DNA and synthesizing it using its RNA matrix is ​​called transcription , and the synthesized RNA is called messenger or mRNA .

After further modifications, this type of encoded mRNA is ready. mRNA comes out of the nucleus and goes to the site of protein synthesis, where the letters of the mRNA are deciphered. Each set of three i-RNA letters forms a “letter” that represents one specific amino acid.

Another type of RNA finds this amino acid, captures it with the help of an enzyme, and delivers it to the site of protein synthesis. This RNA is called transfer RNA, or t-RNA. As the mRNA message is read and translated, the chain of amino acids grows. This chain twists and folds into a unique shape, creating one type of protein. Even the protein folding process is remarkable: it takes a computer to calculate everything options folding an average-sized protein consisting of 100 amino acids would take 1027 (!) years. And it takes no more than one second to form a chain of 20 amino acids in the body, and this process occurs continuously in all cells of the body.

Genes, genetic code and its properties.

About 7 billion people live on Earth. Apart from the 25-30 million pairs of identical twins, genetically all people are different : everyone is unique, has unique hereditary characteristics, character traits, abilities, and temperament.

These differences are explained differences in genotypes- sets of genes of the organism; Each one is unique. The genetic characteristics of a particular organism are embodied in proteins - therefore, the structure of the protein of one person differs, although very slightly, from the protein of another person.

This doesn't mean that no two people have exactly the same proteins. Proteins that perform the same functions may be the same or differ only slightly by one or two amino acids from each other. But doesn't exist on Earth of people (with the exception of identical twins) who would have all their proteins are the same .

Information about primary structure squirrel encoded as a sequence of nucleotides in a section of a DNA molecule, gene – a unit of hereditary information of an organism. Each DNA molecule contains many genes. The totality of all the genes of an organism constitutes it genotype . Thus,

Gene is a unit of hereditary information of an organism, which corresponds to a separate section of DNA

Coding of hereditary information occurs using genetic code , which is universal for all organisms and differs only in the alternation of nucleotides that form genes and encode proteins of specific organisms.

Genetic code consists of triplets (triplets) of DNA nucleotides, combined in different sequences (AAT, HCA, ACG, THC, etc.), each of which encodes a specific amino acid (which will be built into the polypeptide chain).

Actually code counts sequence of nucleotides in an mRNA molecule , because it removes information from DNA (process transcriptions ) and translates it into a sequence of amino acids in the molecules of synthesized proteins (the process broadcasts ).
The composition of mRNA includes nucleotides A-C-G-U, whose triplets are called codons : a triplet on DNA CGT on i-RNA will become a triplet GCA, and a triplet DNA AAG will become a triplet UUC. Exactly mRNA codons the genetic code is reflected in the record.

Thus, genetic code - unified system records of hereditary information in molecules nucleic acids as a sequence of nucleotides . The genetic code is based on the use of an alphabet consisting of only four letters-nucleotides, distinguished by nitrogenous bases: A, T, G, C.

Basic properties of the genetic code:

1. Genetic code triplet. A triplet (codon) is a sequence of three nucleotides encoding one amino acid. Since proteins contain 20 amino acids, it is obvious that each of them cannot be encoded by one nucleotide ( Since there are only four types of nucleotides in DNA, in this case 16 amino acids remain uncoded). Two nucleotides are also not enough to encode amino acids, since in this case only 16 amino acids can be encoded. This means that the smallest number of nucleotides encoding one amino acid must be at least three. In this case, the number of possible nucleotide triplets is 43 = 64.

2. Redundancy (degeneracy) The code is a consequence of its triplet nature and means that one amino acid can be encoded by several triplets (since there are 20 amino acids and 64 triplets), with the exception of methionine and tryptophan, which are encoded by only one triplet. In addition, some triplets perform specific functions: in an mRNA molecule, triplets UAA, UAG, UGA are stop codons, i.e. stop-signals that stop synthesis polypeptide chain. The triplet corresponding to methionine (AUG), located at the beginning of the DNA chain, does not code for an amino acid, but performs the function of initiating (exciting) reading.

3. Unambiguity code - at the same time as redundancy, code has the property unambiguity : each codon matches only one a certain amino acid.

4. Collinearity code, i.e. nucleotide sequence in a gene exactly corresponds to the sequence of amino acids in a protein.

5. Genetic code non-overlapping and compact , i.e. does not contain “punctuation marks”. This means that the reading process does not allow the possibility of overlapping columns (triplets), and, starting at a certain codon, reading proceeds continuously, triplet after triplet, until stop-signals ( stop codons).

6. Genetic code universal , i.e. the nuclear genes of all organisms encode information about proteins in the same way, regardless of the level of organization and systematic position these organisms.

There are genetic code tables for decryption codons mRNA and construction of chains of protein molecules.

Matrix synthesis reactions.

Reactions unknown in inanimate nature occur in living systems - matrix synthesis reactions.

The term "matrix" in technology they designate a mold used for casting coins, medals, and typographic fonts: the hardened metal exactly reproduces all the details of the mold used for casting. Matrix synthesis resembles casting on a matrix: new molecules are synthesized in exact accordance with the plan laid down in the structure of existing molecules.

The matrix principle lies at the core the most important synthetic reactions cells, such as the synthesis of nucleic acids and proteins. These reactions ensure the exact, strictly specific sequence of monomer units in the synthesized polymers.

There is directional action going on here. pulling monomers to a specific location cells - into molecules that serve as a matrix where the reaction takes place. If such reactions occurred as a result of random collisions of molecules, they would proceed infinitely slowly. The synthesis of complex molecules based on the template principle is carried out quickly and accurately. The role of the matrix macromolecules of nucleic acids play in matrix reactions DNA or RNA .

Monomeric molecules from which the polymer is synthesized - nucleotides or amino acids - in accordance with the principle of complementarity, are located and fixed on the matrix in a strictly defined, specified order.

Then it happens "cross-linking" of monomer units into a polymer chain, and the finished polymer is discharged from the matrix.

After that matrix is ​​ready to assembling a new one polymer molecule. It is clear that just as on a given mold only one coin or one letter can be cast, so on a given matrix molecule only one polymer can be “assembled”.

Matrix reaction type - specific feature chemistry of living systems. They are the basis of the fundamental property of all living things - its ability to reproduce its own kind.

Template synthesis reactions

1. DNA replication - replication (from Latin replicatio - renewal) - the process of synthesis of a daughter molecule of deoxyribonucleic acid on the matrix of the parent DNA molecule. During the subsequent division of the mother cell, each daughter cell receives one copy of a DNA molecule that is identical to the DNA of the original mother cell. This process ensures that genetic information is accurately passed on from generation to generation. DNA replication is carried out by a complex enzyme complex consisting of 15-20 different proteins, called replisome . The material for synthesis is free nucleotides present in the cytoplasm of cells. The biological meaning of replication lies in the accurate transfer of hereditary information from the mother molecule to the daughter molecules, which normally occurs during the division of somatic cells.

A DNA molecule consists of two complementary strands. These chains are held weak hydrogen bonds, capable of breaking under the action of enzymes. The DNA molecule is capable of self-duplication (replication), and on each old half of the molecule a new half is synthesized.
In addition, an mRNA molecule can be synthesized on a DNA molecule, which then transfers the information received from DNA to the site of protein synthesis.

Information transfer and protein synthesis proceed according to a matrix principle, comparable to the operation of a printing press in a printing house. Information from DNA is copied many times. If errors occur during copying, they will be repeated in all subsequent copies.

True, some errors when copying information with a DNA molecule can be corrected - the process of error elimination is called reparation. The first of the reactions in the process of information transfer is the replication of the DNA molecule and the synthesis of new DNA chains.

2. Transcription (from Latin transcriptio - rewriting) - the process of RNA synthesis using DNA as a template, occurring in all living cells. In other words, it is the transfer of genetic information from DNA to RNA.

Transcription is catalyzed by the enzyme DNA-dependent RNA polymerase. RNA polymerase moves along the DNA molecule in the direction 3" → 5". Transcription consists of stages initiation, elongation and termination . The unit of transcription is an operon, a fragment of a DNA molecule consisting of promoter, transcribed part and terminator . mRNA consists of a single chain and is synthesized on DNA in accordance with the rule of complementarity with the participation of an enzyme that activates the beginning and end of the synthesis of the mRNA molecule.

The finished mRNA molecule enters the cytoplasm onto ribosomes, where the synthesis of polypeptide chains occurs.

3. Broadcast (from lat. translation- transfer, movement) - the process of protein synthesis from amino acids on a matrix of information (messenger) RNA (mRNA, mRNA), carried out by the ribosome. In other words, this is the process of translating the information contained in the sequence of nucleotides of mRNA into the sequence of amino acids in the polypeptide.

4. Reverse transcription is the process of forming double-stranded DNA based on information from single-stranded RNA. This process is called reverse transcription, since the transfer of genetic information occurs in the “reverse” direction relative to transcription. The idea of ​​reverse transcription was initially very unpopular because it contradicted the central dogma of molecular biology, which assumed that DNA is transcribed into RNA and then translated into proteins.

However, in 1970, Temin and Baltimore independently discovered an enzyme called reverse transcriptase (revertase) , and the possibility of reverse transcription was finally confirmed. In 1975, Temin and Baltimore were awarded Nobel Prize in the field of physiology and medicine. Some viruses (such as the human immunodeficiency virus, which causes HIV infection) have the ability to transcribe RNA into DNA. HIV has an RNA genome that is integrated into DNA. As a result, the DNA of the virus can be combined with the genome of the host cell. The main enzyme responsible for the synthesis of DNA from RNA is called reversease. One of the functions of reversease is to create complementary DNA (cDNA) from the viral genome. The associated enzyme ribonuclease cleaves RNA, and reversease synthesizes cDNA from the DNA double helix. The cDNA is integrated into the host cell genome by integrase. The result is synthesis of viral proteins by the host cell, which form new viruses. In the case of HIV, apoptosis (cell death) of T-lymphocytes is also programmed. In other cases, the cell may remain a distributor of viruses.

The sequence of matrix reactions during protein biosynthesis can be represented in the form of a diagram.

Thus, protein biosynthesis- this is one of the types of plastic exchange, during which hereditary information encoded in DNA genes is implemented into a specific sequence of amino acids in protein molecules.

Protein molecules are essentially polypeptide chains made up of individual amino acids. But amino acids are not active enough to combine with each other on their own. Therefore, before they combine with each other and form a protein molecule, amino acids must activate . This activation occurs under the action of special enzymes.

As a result of activation, the amino acid becomes more labile and, under the action of the same enzyme, binds to t- RNA. Each amino acid corresponds to a strictly specific t- RNA, which finds “its” amino acid and transfers it into the ribosome.

Consequently, various activated amino acids combined with their own T- RNA. The ribosome is like conveyor to assemble a protein chain from various amino acids supplied to it.

Simultaneously with t-RNA, on which its own amino acid “sits,” “ signal"from the DNA that is contained in the nucleus. In accordance with this signal, one or another protein is synthesized in the ribosome.

The directing influence of DNA on protein synthesis is not carried out directly, but with the help of a special intermediary - matrix or messenger RNA (m-RNA or mRNA), which synthesized into the nucleus e under the influence of DNA, so its composition reflects the composition of DNA. The RNA molecule is like a cast of the DNA form. The synthesized mRNA enters the ribosome and, as it were, transfers it to this structure plan- in what order must the activated amino acids entering the ribosome be combined with each other in order for a specific protein to be synthesized? Otherwise, genetic information encoded in DNA is transferred to mRNA and then to protein.

The mRNA molecule enters the ribosome and stitches her. That segment of it that is currently located in the ribosome is determined codon (triplet), interacts in a completely specific manner with those that are structurally similar to it triplet (anticodon) in transfer RNA, which brought the amino acid into the ribosome.

Transfer RNA with its amino acid matches a specific codon of the mRNA and connects with him; to the next, neighboring section of mRNA another tRNA with a different amino acid is added and so on until the entire chain of i-RNA is read, until all the amino acids are reduced in the appropriate order, forming a protein molecule. And tRNA, which delivered the amino acid to a specific part of the polypeptide chain, freed from its amino acid and exits the ribosome.

Then, again in the cytoplasm, the desired amino acid can join it and again transfer it to the ribosome. In the process of protein synthesis, not one, but several ribosomes - polyribosomes - are involved simultaneously.

The main stages of the transfer of genetic information:

1. Synthesis on DNA as a template for mRNA (transcription)
2. Synthesis of a polypeptide chain in ribosomes according to the program contained in mRNA (translation) .

The stages are universal for all living beings, but the temporal and spatial relationships of these processes differ in pro- and eukaryotes.

U prokaryote transcription and translation can occur simultaneously because DNA is located in the cytoplasm. U eukaryotes transcription and translation are strictly separated in space and time: the synthesis of various RNAs occurs in the nucleus, after which the RNA molecules must leave the nucleus by passing through the nuclear membrane. The RNAs are then transported in the cytoplasm to the site of protein synthesis.

Lecture 5. Genetic code

Definition of the concept

The genetic code is a system for recording information about the sequence of amino acids in proteins using the sequence of nucleotides in DNA.

Since DNA is not directly involved in protein synthesis, the code is written in RNA language. RNA contains uracil instead of thymine.

Properties of the genetic code

1. Triplety

Each amino acid is encoded by a sequence of 3 nucleotides.

Definition: a triplet or codon is a sequence of three nucleotides encoding one amino acid.

The code cannot be monoplet, since 4 (the number of different nucleotides in DNA) is less than 20. The code cannot be doublet, because 16 (the number of combinations and permutations of 4 nucleotides of 2) is less than 20. The code can be triplet, because 64 (the number of combinations and permutations from 4 to 3) is more than 20.

2. Degeneracy.

All amino acids, with the exception of methionine and tryptophan, are encoded by more than one triplet:

2 AK for 1 triplet = 2.

9 AK, 2 triplets each = 18.

1 AK 3 triplets = 3.

5 AK of 4 triplets = 20.

3 AK of 6 triplets = 18.

A total of 61 triplets encode 20 amino acids.

3. Presence of intergenic punctuation marks.

Definition:

Gene - a section of DNA that encodes one polypeptide chain or one molecule tRNA, rRNA orsRNA.

GenestRNA, rRNA, sRNAproteins are not coded.

At the end of each gene encoding a polypeptide there is at least one of 3 triplets encoding RNA stop codons, or stop signals. In mRNA they have the following form: UAA, UAG, UGA . They terminate (end) the broadcast.

Conventionally, the codon also belongs to punctuation marks AUG - the first after the leader sequence. (See Lecture 8) It functions as a capital letter. In this position it encodes formylmethionine (in prokaryotes).

4. Unambiguity.

Each triplet encodes only one amino acid or is a translation terminator.

The exception is the codon AUG . In prokaryotes, in the first position (capital letter) it encodes formylmethionine, and in any other position it encodes methionine.

5. Compactness, or absence of intragenic punctuation marks.
Within a gene, each nucleotide is part of a significant codon.

In 1961, Seymour Benzer and Francis Crick experimentally proved the triplet nature of the code and its compactness.

The essence of the experiment: “+” mutation - insertion of one nucleotide. "-" mutation - loss of one nucleotide. A single "+" or "-" mutation at the beginning of a gene spoils the entire gene. A double "+" or "-" mutation also spoils the entire gene.

A triple “+” or “-” mutation at the beginning of a gene spoils only part of it. A quadruple “+” or “-” mutation again spoils the entire gene.

The experiment proves that The code is transcribed and there is no punctuation marks inside the gene. The experiment was carried out on two adjacent phage genes and showed, in addition, presence of punctuation marks between genes.

6. Versatility.

The genetic code is the same for all creatures living on Earth.

In 1979, Burrell opened ideal human mitochondria code.

Definition:

“Ideal” is a genetic code in which the rule of degeneracy of the quasi-doublet code is satisfied: If in two triplets the first two nucleotides coincide, and the third nucleotides belong to the same class (both are purines or both are pyrimidines), then these triplets code for the same amino acid .

There are two exceptions to this rule in the universal code. Both deviations from the ideal code in the universal relate to fundamental points: the beginning and end of protein synthesis:

Codon

Universal

code

Mitochondrial codes

Vertebrates

Invertebrates

Yeast

Plants

STOP

STOP

With UA

A G A

STOP

STOP

230 substitutions do not change the class of the encoded amino acid. to tearability.

In 1956, Georgiy Gamow proposed a variant of the overlapping code. According to the Gamow code, each nucleotide, starting from the third in the gene, is part of 3 codons. When the genetic code was deciphered, it turned out that it was non-overlapping, i.e. Each nucleotide is part of only one codon.

Advantages of an overlapping genetic code: compactness, less dependence of the protein structure on the insertion or deletion of a nucleotide.

Disadvantage: the protein structure is highly dependent on nucleotide replacement and restrictions on neighbors.

In 1976, the DNA of phage φX174 was sequenced. It has single-stranded circular DNA consisting of 5375 nucleotides. The phage was known to encode 9 proteins. For 6 of them, genes located one after another were identified.

It turned out that there is an overlap. Gene E is located entirely within the gene D . Its start codon results from a frame shift of one nucleotide. Gene J starts where the gene ends D . Start codon of the gene J overlaps with the stop codon of the gene D as a result of a shift of two nucleotides. The construction is called a “reading frame shift” by a number of nucleotides not a multiple of three. To date, overlap has only been shown for a few phages.

Information capacity of DNA

There are 6 billion people living on Earth. Hereditary information about them
enclosed in 6x10 9 spermatozoa. According to various estimates, a person has from 30 to 50
thousand genes. All humans have ~30x10 13 genes, or 30x10 16 base pairs, which make up 10 17 codons. The average book page contains 25x10 2 characters. The DNA of 6x10 9 sperm contains information equal in volume to approximately

4x10 13 book pages. These pages would take up the space of 6 NSU buildings. 6x10 9 sperm take up half a thimble. Their DNA takes up less than a quarter of a thimble.

Each living organism has a special set of proteins. Certain nucleotide compounds and their sequence in the DNA molecule form the genetic code. It conveys information about the structure of the protein. A certain concept has been accepted in genetics. According to it, one gene corresponded to one enzyme (polypeptide). It should be said that research on nucleic acids and proteins has been carried out over a fairly long period. Later in the article we will take a closer look at the genetic code and its properties. Will also be given brief chronology research.

Terminology

The genetic code is a way of encoding the sequence of amino acid proteins involving the nucleotide sequence. This method of generating information is characteristic of all living organisms. Proteins - natural organic matter with high molecularity. These compounds are also present in living organisms. They consist of 20 types of amino acids, which are called canonical. Amino acids are arranged in a chain and connected in a strictly established sequence. It determines the structure of the protein and its biological properties. There are also several chains of amino acids in a protein.

DNA and RNA

Deoxyribonucleic acid is a macromolecule. She is responsible for the transmission, storage and implementation of hereditary information. DNA uses four nitrogenous bases. These include adenine, guanine, cytosine, thymine. RNA consists of the same nucleotides, except that it contains thymine. Instead, there is a nucleotide containing uracil (U). RNA and DNA molecules are nucleotide chains. Thanks to this structure, sequences are formed - the “genetic alphabet”.

Implementation of information

Protein synthesis, which is encoded by the gene, is realized by combining mRNA on a DNA template (transcription). The genetic code is also transferred into the amino acid sequence. That is, the synthesis of the polypeptide chain on mRNA takes place. To encrypt all amino acids and the signal for the end of the protein sequence, 3 nucleotides are enough. This chain is called a triplet.

History of the study

Protein and nucleic acid studies were carried out long time. In the middle of the 20th century, the first ideas about the nature of the genetic code finally appeared. In 1953, it was discovered that some proteins consist of sequences of amino acids. True, at that time they could not yet determine their exact number, and there were numerous disputes about this. In 1953, two works were published by the authors Watson and Crick. The first stated about the secondary structure of DNA, the second spoke about its permissible copying using template synthesis. In addition, emphasis was placed on the fact that a specific sequence of bases is a code that carries hereditary information. American and Soviet physicist Georgy Gamow made a coding hypothesis and found a method for testing it. In 1954, his work was published, during which he proposed to establish correspondences between amino acid side chains and diamond-shaped “holes” and use this as a coding mechanism. Then it was called rhombic. Explaining his work, Gamow admitted that the genetic code could be a triplet. The physicist's work was one of the first among those that were considered close to the truth.

Classification

Over the years, various models of genetic codes have been proposed, of two types: overlapping and non-overlapping. The first was based on the inclusion of one nucleotide in several codons. It includes a triangular, sequential and major-minor genetic code. The second model assumes two types. Non-overlapping codes include combination code and comma-free code. The first option is based on the encoding of an amino acid by triplets of nucleotides, and the main thing is its composition. According to the "code without commas", certain triplets correspond to amino acids, but others do not. In this case, it was believed that if any significant triplets were arranged sequentially, others located in a different reading frame would be unnecessary. Scientists believed that it was possible to select a nucleotide sequence that would satisfy these requirements, and that there were exactly 20 triplets.

Although Gamow and his co-authors questioned this model, it was considered the most correct over the next five years. At the beginning of the second half of the 20th century, new data appeared that made it possible to discover some shortcomings in the “code without commas”. It was found that codons are capable of inducing protein synthesis in vitro. Closer to 1965, the principle of all 64 triplets was comprehended. As a result, redundancy of some codons was discovered. In other words, the amino acid sequence is encoded by several triplets.

Distinctive Features

The properties of the genetic code include:

Variations

The first deviation of the genetic code from the standard was discovered in 1979 during the study of mitochondrial genes in the human body. Further similar variants were further identified, including many alternative mitochondrial codes. These include the decoding of the UGA stop codon, which is used to determine tryptophan in mycoplasmas. GUG and UUG in archaea and bacteria are often used as starting options. Sometimes genes encode a protein with a start codon that differs from that normally used by the species. Additionally, in some proteins, selenocysteine ​​and pyrrolysine, which are nonstandard amino acids, are inserted by the ribosome. She reads the stop codon. This depends on the sequences found in the mRNA. Currently, selenocysteine ​​is considered the 21st and pyrrolysane the 22nd amino acid present in proteins.

General features of the genetic code

However, all exceptions are rare. In living organisms, the genetic code mainly has a number of common features. These include the composition of a codon, which includes three nucleotides (the first two belong to the defining ones), the transfer of codons by tRNA and ribosomes into the amino acid sequence.

When protein synthesis is necessary, one cell appears in front of the cell. serious problem– information in DNA is stored as a sequence encoded 4 characters(nucleotides), and proteins consist of 20 different symbols(amino acids). If you try to use all four symbols at once to encode amino acids, you will get only 16 combinations, while there are 20 proteinogenic amino acids. There are not enough...

There is an example of brilliant thinking on this matter:

"Take, for example, a deck of playing cards, in which we pay attention only to the suit of the card. How many triplets of the same type can you get? Four, of course: three of hearts, three of diamonds, three of spades and three of clubs. How many triplets are there with two cards of the same suit and one of a different suit? Let's say we have four choices for the third card. Therefore we have 4x3 = 12 possibilities. In addition we have four triplets with all three different cards. So, 4+12+4=20, and this is the exact number of amino acids that we wanted to get" (George Gamow, eng. George Gamow, 1904-1968, Soviet and American theoretical physicist, astrophysicist and popularizer of science).

Indeed, experiments have proven that for each amino acid there are two mandatory nucleotides and a third variable, less specific (“ rocking effect"). If you take three characters out of four, you get 64 combinations, which greatly exceeds the number of amino acids. Thus, it is found that any amino acid is encoded by three nucleotides. This trio is called codon. As already mentioned, there are 64 options. Three of them do not code for any amino acid; these are the so-called " nonsense codons"(French) non-sense- nonsense) or "stop codons".

Genetic code

The genetic (biological) code is a way of encoding information about the structure of proteins in the form of a nucleotide sequence. It is designed to translate the four-character language of nucleotides (A, G, U, C) into the twenty-character language of amino acids. It has characteristic features:

  • Triplety– three nucleotides form a codon that codes for an amino acid. There are a total of 61 sense codons.
  • Specificity(or unambiguity) – each codon corresponds to only one amino acid.
  • Degeneracy– one amino acid can correspond to several codons.
  • Versatility– the biological code is the same for all types of organisms on Earth (however, there are exceptions in the mitochondria of mammals).
  • Colinearity– the sequence of codons corresponds to the sequence of amino acids in the encoded protein.
  • Non-overlapping– triplets do not overlap each other, being located next to each other.
  • No punctuation– there are no additional nucleotides or any other signals between the triplets.
  • Unidirectionality– during protein synthesis, codons are read sequentially, without skipping or going back.

However, it is clear that the biological code cannot express itself without additional molecules that perform a transition function or adapter function.

Adapter role of transfer RNAs

Transfer RNAs are the only intermediary between the 4-letter nucleic acid sequence and the 20-letter protein sequence.

Each transfer RNA has a specific triplet sequence in the anticodon loop ( anticodon) and can only attach an amino acid that matches this anticodon. It is the presence of one or another anticodon in tRNA that determines which amino acid will be included in the protein molecule, because neither the ribosome nor the mRNA recognizes the amino acid.

Thus, adapter role of tRNA is:

  1. in specific binding to amino acids,
  2. in specific, according to codon-anticodon interaction, binding to mRNA,
  3. and, as a result, in the incorporation of amino acids into the protein chain in accordance with the information in the mRNA.

The addition of an amino acid to tRNA is carried out by an enzyme aminoacyl-tRNA synthetase, which has specificity for two compounds simultaneously: any amino acid and its corresponding tRNA. The reaction requires two high-energy ATP bonds. The amino acid attaches to the 3" end of the tRNA acceptor loop through its α-carboxyl group, and the bond between the amino acid and the tRNA becomes macroergic. The α-amino group remains free.