Information

Sequencing two strands of dna

Sequencing two strands of dna


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

My background is not genetics. 2.I am not interested in knowing how dna sequencing or genotyping is done. 3. I am interested just in the nature of the results as described here.

Now coming to the question: Humans have two sets of chromosome in each cell. So if I say sequencing chromosome1, what does that actually mean? Which of the two chromosome's gets sequenced? Of genotyping-do comparison take place between two+two=four chromosome1's? Or is my entire undertanding flawed?


I am interested just in the nature of the results as described here.

By aligning this sequence to all nucleotides of all organisms, we would realize that the given (short) sequence occurs in various organisms besides humans. Therefore it might come from a species with only one chromosome see NCBI Blast !

For the sake of the example, let's assume that the species had multiple instances of each chromosome (e.g. one by human mother, and one by human father), and look at "the results as described here". As no detail is given on the protocol, let's further assume the most common scenario - namely that there was no experimental distinction or separation of the individual instances of chromosomes.

Looking at the results, we would realize that is not "next generation sequencing", but Sanger-sequencing.

Now we look at the provided scheme again (note: this is not real experimental data): clearly the bands on the left side only occur for one letter/base, and at a given position of the sequence, there only is one very clear peak on the right hand side (as opposed to the possibility, to have multiple peaks corresponding to different bases).

We would have to conclude that every instance of the chromosome has an absolutely identical sequence. Thus we would conclude that it would not be important to distinguish individual instances.

But wait - aren't maternal and paternal chromosomes different? Absolutely, the given example however only looks at 25 bases rather than the full chromosome - and it would quite possible that they are absolutely identical. (note: there are various experimental techniques to select only specific regions of DNA or RNA for analysis).


I would say your understanding here isn't necessarily flawed - rather, there's a little trick being played on you.

The figure depicts typical results of Sanger sequencing. This type of sequencing always requires a so-called primer, a short sequence of around 20 bases which are known. The code following those 20 bases will be sequenced. Using this technique alone, you can't practically sequence a whole chromosome - it's usually only good for about 700-1200 bases after the primer.

The bands in the graphic on the left and the peaks on the right side correspond to the base at position "primer + n" (n being the number of bands/peaks starting to count at the bottom of the graphic).

Now consider that in practice, this technique is performed on a liquid sample containing of course a huge number of DNA molecules. In an ideal scenario, every single molecule is identical. In that case, the sequence behind the 20 primer bases is identical and you will get clear bands or peaks as shown in the graphic.

When you take a sample from an organism like a human, there will be a variety of DNA molecules in the mix.

What gets sequenced? Simple, everything that sits behind the primer you're using to sequence. If different molecules have different sequences there, the result will be bands in different columns at the same position on the gel (left side of the graphic) or two different-coloured peaks at the same location in the chromatography (right side of the graphic).

Example: Genotyping

We consider the XYZ gene. There are two variants of this gene in humans, and the only difference is at position 167 of the gene, where one variant has an A and the other a G. Thanks to the human genome project, I know the full sequence of the XYZ gene, so I design my primer to span positions 50-70 of XYZ. Therefore, my sequencing result should yield the XYZ gene, starting somewhere around position 100 (even the highest quality sequencing reactions miss out a few bases after the primer). If I take a sample from a human who has the same XYZ variant on both sister chromosomes, I will only get a single sequencing result - either with A or G at position ~67 of my sequencing. If my sample came from a human who has one variant on one chromosome and the other variant on the other, my sequencing will be ambiguous at position ~67, and upon closer inspection I sequenced two variants from this sample: one with an A and one with a G at this ambiguous position. This way I can determine a human's genotype for this gene: homozygous XYZA/A, XYZB/B or heterozygous XYZA/B.

PS: Don't let the double-stranded nature of DNA confuse your thinking here, Sanger sequencing always only sequences one strand (the one that you designed the primer for)! So two sister chromosomes = two double strands = four single strands, of which two will be sequenced because the other two contain the complementary sequences.

PPS: The way this works can cause huge problems if the primer isn't designed properly. With 20 bases, it's possible that the same 20 bases appear elsewhere in the genome, and suddenly the sequences following the primer aren't only different in one or two positions, but everywhere.


That's a picture of Sanger sequencing. If that sequencing had been done in a region where a diploid organism was heterozygous for a simple SNP, the trace file would instead of having single clean peaks, would at the point of the SNP have two half sized peaks of two colors for the two letters present at that site. If the difference between the two alleles was more involve than a single nucleotide polymorphism, the trace file might look quite different.

So if I say sequencing chromosome1, what does that actually mean?

People don't often just sequence a whole chromosome like that, and pretty much no one would do it with Sanger technology anymore. But if you were sequencing a diploid non-inbred organism, you would expect at points of heterozygosity to get a mix of signals.

Of genotyping-do comparison take place between two+two=four chromosome1's?

Diploid organisms have 2 copies of a given chromosome, not 4. Each chromosome has two strands, but the information on complementary strands is identical.


UPDATE


Before coming to sequencing you have to consider that you are not sequencing a single chromosome from a single cell. You are sequencing a DNA sample corresponding to a population of cells, containing all the chromosomes. I do not know of a procedure to target a single chromosome, targeted procedures for specific regions on the genome sorrounding genes are in existence.

Or is my entire undertanding flawed?

Yes, I would say lacking not flawed.

That said,

So if I say sequencing chromosome1, what does that actually mean?

Hypothetically that would mean you are sequencing DNA from chromosome 1 corresponding to a population of cells. Meaning you are retrieving the sequence of chromosome 1 for usage in downstream analysis by polymerizing its complementary DNA sequence by taking advantage of the procedure of DNA replication.

Which of the two chromosome's gets sequenced?

There is no way to differentiate between two chromosomes in mainstream sequencing experiments. The cell does not have an idea about which chromosome is which and neither do you, so both chromosomes get sequenced at the same time. Please note, I said no way to differentiate between two chromosomes in mainstream sequencing experiments

do comparison take place between two+two=four chromosome1's?

This has two parts. What do you mean by four? If you are implying that chromosome 1 has two strands and that equates to 4 then no, the two strands together make one chromosome. Your cell has 2n or two chromosome 1's. If you are sequencing a population of cells (let's say 1000), you have two thousand chromosome 1's which are being sequenced all of which are present in pairs across 1000 cells.

Do comparisons take place?

In mainstream experiments we do not compare. That said, methods for differentiating between the two chromosome 1's exist. The oldest I could find is an article by M Nagano on allele specific sequencing. In this case, since your parents equally contribute towards your DNA, each of them carry some mutations specific for their DNA. Using this feature we can differentiate between the paternal and maternal chromosome 1.

There's also single cell sequencing, in this case you would be sequencing the two chromosome 1's from a single cell, theoretically you can then also do allele specific sequencing on a single cell allowing you to differentiate between the two chromosome 1's present in the cell.

Finally, there is no way to differentiate the two strands, after sequencing when you align your data back to the reference genome, you will get to know the strandedness of the data.

Answer to old question


Nowadays the word sequencing is synonymous with High-throughput sequencing, and the article you linked to in the comment relates also to high-throughput sequencing techniques (I got this from a cursory glance).

While sequencing dna which of the two strands get sequenced?

Both get sequenced.

In what order is the results provided

Random

while sequencing how are these two strands sequenced?

If it is single end sequencing, we do not know which strand was sequenced first. (There are ways, but for simplicity's sake let's not go there)

If it is paired end sequencing, one read originates from the sense strand and the other read originates from the opposite strand.

You can watch a youtube video here to find out what is the difference

Is it that result of one strand is provide first followed by the other or is it that only one of the two strands get sequenced?

Both are reported.

The question you have asked is very broad, I can go on for a handful of A4 pages and still not finish. Coming to your first question, I said both get sequenced, because we have no way of knowing while sequencing which strand is getting sequenced (there are stranded sequencing techniques, but you should digest what is mentioned/linked here first and later on with some more research come back with a question on stranded sequencing).

After sequencing you get your result in random order, because what you get is not really result but more raw data files which are called FASTQ files. Read up on FASTQ a bit. After sequencing you don't really get a FASTQ but you get a BCL or base call file, which must be converted to FASTQ. These FASTQ must then be either filtered or not based on quality then aligned to a reference genome. This is where you get to know which read came from which strand.

You should read up more on single end sequencing and paired end sequencing to better understand how DNA is sequenced. Also check out this video. This is the most simplified version which I could find. It should make you curious without bombarding you with too much information.

Last of all, in a single end sequencing you have no way of knowing without alignment which strand got sequenced, but in paired end (where a DNA fragment is sequenced from both ends, generating two paired reads) generally, one mate aligns to the sense strand while the other aligns to the antisense strand.


MCAT Biology : DNA and RNA Sequencing

An important part of creating DNA primers when performing a PCR or other quantitative analysis is the melting point of the primer. Which set of primers would most likely work well together as the forward and reverse primers of a PCR?

CGGACATGCTGG and GTTACCGCAGGC

ATCGCTTTGTAC and GTTACCGCAGGC

ATCGCTTTGTAC and CGGACATGCTGG

CACACTATAAAA and ATCGCTTTGTAC

GTGTGATACCCC and CACACTATAAAA

CGGACATGCTGG and GTTACCGCAGGC

The melting point of a strand of DNA can be predicted by the bases that make it up. Cytosine and guanine have three hydrogen bonds to each other, so they bond more strongly than adenine and thymine's two hydrogen bonds. This means that strands containing the same amount of Cs and Gs would work best together. There is only answer choice in which both strands have the same amount of Cs and Gs (or Ts and As).

Example Question #1391 : Biology

Which piece of DNA has the lowest melting point?

Note: only one strand is shown

Cytosine and guanine bond more strongly to each other than adenine and guanine because they have three hydrogen bonds as opposed to two. Therefore, a piece of DNA with a high concentration of Ts and As will have a low melting point. The correct choice has 8 Ts and As, while the rest have less than that.

Example Question #1391 : Biology

Human chromosomes are divided into two arms, a long q arm and a short p arm. A karyotype is the organization of a human cell’s total genetic complement. A typical karyotype is generated by ordering chromosome 1 to chromosome 23 in order of decreasing size.

When viewing a karyotype, it can often become apparent that changes in chromosome number, arrangement, or structure are present. Among the most common genetic changes are Robertsonian translocations, involving transposition of chromosomal material between long arms of certain chromosomes to form one derivative chromosome. Chromosomes 14 and 21, for example, often undergo a Robertsonian translocation, as below.

A karyotype of this individual for chromosomes 14 and 21 would thus appear as follows:

Though an individual with aberrations such as a Robertsonian translocation may be phenotypically normal, they can generate gametes through meiosis that have atypical organizations of chromosomes, resulting in recurrent fetal abnormalities or miscarriages.

The principal chemical component of chromosomes is nucleic acid, though proteins are also important elements. Which of the following is true of nucleic acids?

DNA contains uracil residues, while RNA contains thymine

DNA is translated directly by tRNA linked to amino acids

Guanine-cytosine rich regions have higher melting points than adenine-thymine rich regions

RNA provides the main storage form of genetic information

Ribosomes are important in the synthesis of RNA molecules

Guanine-cytosine rich regions have higher melting points than adenine-thymine rich regions

Guanine-cytosine pairing forms three hydrogen bonds, instead of the two bonds formed by adenine and thymine. The other choices are all tempting, but subtly wrong. RNA contains uracil, DNA is the main storage form for information, mRNA is directly translated by tRNA, and ribosomes are important in the synthesis of proteins. It is worth noting that the 2' hydroxyl group of RNA's pentose sugar backbone is lost in DNA, which increases the stability and allows DNA to serve as a stable storage medium.

Example Question #1 : Understanding Nucleic Acids

Pick the reason that is least likely to explain why two purines will never be seen attached to each other in a DNA helix.

The functional groups at the end of one purine would not correctly match with the other purine.

Two purines could cause a bump in the DNA, causing problems with transcription and replication.

Purine bases will never be found on opposite DNA strands, so they do not have the ability to pair with one another.

The bulky two-ring structure of purines would cause too much hindrance in the inside of the helix.

Purine bases will never be found on opposite DNA strands, so they do not have the ability to pair with one another.

DNA strands are composed of millions of nucleotides. As a result, it would be virtually impossible to find a single strand that did not have all four nucleotides.

Nucleotides combine in purine-pyrimidine pairs due to the sterically appropriate fit of the bases, as well as the preferred combination of hydrogen bonds between the two nucleotides. As a result, two purines would not be seen combined. This is due to both being too large when together, and the incorrect hydrigen bonding between their functional groups.

Example Question #1391 : Biology

Which segment of DNA would have the highest melting point when paired with its complimentary strand?

DNA nucleotide base pairs are held together by hydrogen bonding. Cytosine and guanine are held together by three hydrogen bonds, where adenine and thymine are held together by only two. Increased hydrogen bonding within a strand of DNA will increase the melting point. The DNA segment with the most guanine-cytosine base pairs will have the highest melting point.

Example Question #1391 : Biology

Which of the following options include degenerate codons?

The term "degenerate codons" refers to codons with different nucleotide base sequences that specify the same amino acid. In the provided examples, two codons (UCU and UCA) both specify serine, indicating this is the correct answer.

Example Question #1396 : Biology

In 2013, scientists linked a cellular response called the unfolded protein response (UPR) to a series of neurodegenerative diseases, including such major health issues as Parkinson’s and Alzheimer’s Disease. According to their work, the unfolded protein response is a reduction in translation as a result of a series of enzymes that modify a translation initiation factor, eIF2, as below:

In the above sequence, the unfolded protein sensor binds to unfolded protein, such as the pathogenic amyloid-beta found in the brains of Alzheimer’s Disease patients. This sensor then phosphorylates PERK, or protein kinase RNA-like endoplasmic reticulum kinase. This leads to downstream effects on eIF2, inhibition of which represses translation. It is thought that symptoms of neurodegenerative disease may be a result of this reduced translation.

During translation, the genetic code is used to convert a sequence of nitrogenous bases in mRNA to an amino acid sequence. Which of the following is true of the genetic code?

I. More than one codon sequence codes for a single amino acid

II. The most 5' position of the codon on mRNA is the wobble position

III. Each codon sequence only codes for one amino acid

The genetic code is unambiguous, because each codon only codes for one amino acid. It is also degenerate, so that each amino acid can be coded for by multiple codons. Choice 2 is incorrect, as the most 3' position on the mRNA is the wobble position.

Example Question #1397 : Biology

A short polynucleotide strand with the base sequence of AUCCCUGG must be __________ .

Polynucleotide sequences are nucleic acids, so they must be DNA or RNA. Any sequence containing U (uracil) must be RNA, however there is no way to determine the type of RNA simply by looking at the sequence. This sequence could code for mRNA, rRNA, or tRNA.

mRNA is used to translate proteins. rRNA plays a structural and functional role in composing ribosomes. tRNA carries amino acids to the ribosome during translation.

Example Question #1 : Dna And Rna Sequencing

Which of the following correctly arranges the bases on the anti-codon loop of a tRNA carrying tryptophan?

Tryptophan, which is encoded on mRNA as 3'-UGG-5', is going to be transported to the ribosome via tryptophan t-RNA. The anti-codon loop must be complementary to the mRNA strand. Since the code for Tryptophan is 3'-UGG-'5, the anti-codon loop of the t-RNA must read 3'-CCA-5' in order to line up.

Example Question #1392 : Biology

The codons GGU, GGA, GGC, and GGG all code for the same amino acid, glycine. What biological term is used to describe this phenomenon?

Degeneracy refers to the fact that more than one codon can code for the same amino acid. These codons generally differ in their third or "wobble" base. Degeneracy explains how there can be a total of sixty-four possible codons corresponding to only twenty amino acids.

All MCAT Biology Resources

Report an issue with this question

If you've found an issue with this question, please let us know. With the help of the community we can continue to improve our educational resources.


Sanger Sequencing

The DNA sequencing method developed by Fred Sanger forms the basis of automated "cycle" sequencing reactions today. Scaling up to sequence. In the 1980s, two key developments allowed researchers to believe that sequencing the entire genome could be possible. The first was a technique called polymerase chain reaction (PCR) that enabled many copies of DNA sequence to be quickly and accurately produced. The second, an automated method of DNA sequencing, built upon the chemistry of PCR and the sequencing process developed by Frederick Sanger in 1977.
(DNAi Location: Genome > The Project > Putting it together > Animations > Sanger sequencing)

The first method of sequencing the genetic code was devised by Fred Sanger. To sequence the DNA, it must first be separated into two strands. The strand to be sequenced is copied using chemically altered bases. These altered bases cause the copying process to stop each time one particular letter is incorporated into the growing DNA chain. This process is carried out for all four bases, and then the fragments are put together like a jigsaw to reveal the sequence of the original piece of DNA.

Sanger sequencing, Fred Sanger, Frederick Sanger, polymerase chain reaction, polymerase chain reaction PCR, Sanger DNA, DNA


Scientists Say: DNA sequencing

These letters and colors represent a DNA code &mdash the chemicals that make up a strand of our genetic blueprint. Scientists can take samples using the swab from organisms and &ldquoread&rdquo their DNA through a process called DNA sequencing.

Share this:

DNA sequencing (noun, “D. N. A. SEE-kwen-sing”)

Each of us has our own unique DNA — long molecules that carry instructions for how to make and run our bodies. DNA is made up of four chemicals called nucleotides. They pair up with each other to form a sequence. Those nucleotides are adenine, cytosine, guanine and thymine (or A, C, G and T). Adenine pairs up with thymine. Cytosine pairs with guanine. Our cells decode enormously long sequences of those pairings to get directions for what proteins to make.

Educators and Parents, Sign Up for The Cheat Sheet

Weekly updates to help you use Science News for Students in the learning environment

Now, scientists can take cells from any living organism and perform DNA sequencing — matching each nucleotide up with its pair. This process allows scientists to determine exactly what each DNA strand “says.” Determining the DNA sequence helps scientists answer important questions — from what species the sample came from to what instructions the strand of DNA might contain.

In a sentence

A teen used DNA sequencing to find out which bacteria in a worm’s gut could digest plastic.

Power Words

(for more about Power Words, click here)

cell The smallest structural and functional unit of an organism. Typically too small to see with the naked eye, it consists of watery fluid surrounded by a membrane or wall. Animals are made of anywhere from thousands to trillions of cells, depending on their size. Some organisms, such as yeasts, molds, bacteria and some algae, are composed of only one cell.

chemical A substance formed from two or more atoms that unite (become bonded together) in a fixed proportion and structure. For example, water is a chemical made of two hydrogen atoms bonded to one oxygen atom. Its chemical symbol is H 2 O. Chemical can also be an adjective that describes properties of materials that are the result of various reactions between different compounds.

decode To convert a hidden or secret message into a language that can be understood.

digest (noun: digestion) To break down food into simple compounds that the body can absorb and use for growth. Some sewage-treatment plants harness microbes to digest &mdash or degrade &mdash wastes so that the breakdown products can be recycled for use elsewhere in the environment.

DNA sequencing The process of determining the exact order of the paired building blocks &mdash called nucleotides &mdash that form each rung of a ladder-like strand of DNA. There are only four nucleotides: adenine, cytosine, guanine and thymine (which are abbreviated A, C, G and T). And adenine always pairs up with thymine cytosine always pairs with guanine.

gene (adj. genetic) A segment of DNA that codes, or holds instructions, for producing a protein. Offspring inherit genes from their parents. Genes influence how an organism looks and behaves.

guanine One of four substances that organisms need to produce DNA.

molecule An electrically neutral group of atoms that represents the smallest possible amount of a chemical compound. Molecules can be made of single types of atoms or of different types. For example, the oxygen in the air is made of two oxygen atoms (O2 ), but water is made of two hydrogen atoms and one oxygen atom (H2O).

nucleotides The four chemicals that, like rungs on a ladder, link up the two strands that make up DNA. They are: A (adenine), T (thymine), C (cytosine) and G (guanine). A links with T, and C links with G, to form DNA. In RNA, uracil takes the place of thymine.

organism Any living thing, from elephants and plants to bacteria and other types of single-celled life.

plastic Any of a series of materials that are easily deformable or synthetic materials that have been made from polymers (long strings of some building-block molecule) that tend to be lightweight, inexpensive and resistant to degradation.

proteins Compounds made from one or more long chains of amino acids. Proteins are an essential part of all living organisms. They form the basis of living cells, muscle and tissues they also do the work inside of cells. The hemoglobin in blood and the antibodies that attempt to fight infections are among the better-known, stand-alone proteins. Medicines frequently work by latching onto proteins.

sequencing Technologies that determine the order of nucleotides or letters in a DNA molecule that spell out an organism&rsquos traits.

species A group of similar organisms capable of producing offspring that can survive and reproduce.

unique Something that is unlike anything else the only one of its kind.

About Bethany Brookshire

Bethany Brookshire was a longtime staff writer at Science News for Students. She has a Ph.D. in physiology and pharmacology and likes to write about neuroscience, biology, climate and more. She thinks Porgs are an invasive species.

Classroom Resources for This Article Learn more

Free educator resources are available for this article. Register to access:


Bioinformatic and Biostatistic Methods for DNA Methylome Analysis of Obesity

Sarah Amandine Caroline Voisin , in Computational Epigenetics and Diseases , 2019

Which Software and Data Sets Should I Use to Analyze DNA Methylation Data in the Context of Obesity?

Regardless of the chosen DNA methylation technique (RRBS, Illumina arrays, MeDIP-seq, Me-DIP chip, etc.), recent coordinated efforts by the bioinformatics community have made it possible to preprocess, filter, normalize, and perform all kinds of statistical analyses on DNA methylation data with the R statistical software. In 2003, the open-source, open-development Bioconductor project was launched with the goal of providing tools for the analysis and comprehension of high-throughput genomic data, using the R programming language. It proved extremely successful and is now the leading platform for the analysis of DNA methylation data, whether in the context of obesity or in other disease contexts [15] . Among the 1473 packages that are now on the website [16] , 68 include algorithms to preprocess and analyze DNA methylation data. The excellent review by Teschendorff and Relton described in detail these algorithms and software packages for downstream analyses of DNA methylation data, including algorithms for cell type deconvolution, feature selection, as well as pathway, integrative, and system-level analysis [17] .

It is fair to say that there is no gold standard for the preprocessing of DNA methylation data from Illumina beadchips, but there are a few important steps that should be implemented to increase the validity of results. First, it is important to perform a logit transformation of β values, which represent the percentage of methylation at a given CpG, into M values. β values range from 0 (no allele is methylated) to 1 (all alleles are methylated) and are notoriously heteroscedastic when they are close to 0 and 1. This is a problem for most statistical tests that assume homoscedasticity, but this can be avoided by using M values, defined as M = log 2 ( β 1 − β ) , since M values are approximately homoscedastic [18] . Most studies conduct their analyses with M values and report their results with β values, since β values have a more straightforward biological interpretation. Second, it is of prime importance to account for the two different probe designs on the Illumina HumanMethylation 450k and EPIC chips, called type I and type II designs. Specifically, β values from type II probes are less accurate and reproducible than type I probes, and show different distributions [19] . It is possible to account for this difference in probe design by either analyzing the two types of probes separately, or by normalizing the methylation values directly with peak-based correction (PBC) [19] , Beta-MIxture Quantile (BMIQ) normalization [20] , or Regression on Correlated Probes (RCP) [21] . There are only minor performance differences between those methods, but RCP seems to outperform them all and to be computationally effective [21] . Finally, samples are often run on different plates and at different locations on the plates, introducing known batch effects that could have dramatic consequences on the downstream analysis if the sample distribution on the plates is unequal between groups. It is possible to adjust for this batch effect, either by adding both the plate number and location on the plate as covariates in the statistical analysis, or by normalizing the methylation data using empirical Bayes methods (ComBat) [22] , surrogate variable analysis (SVA) [23] , functional normalization [24] , Remove Unwanted Variation (RUVm) [25] , and BEclear [26] . ComBat is a very popular method that is easy to implement, but RUVm is particularly beneficial for the analysis of very “messy” data sets such as those that seek to combine samples from multiple labs/studies [25] .

Scientists should strive to combine multiple data sets from different studies, or to replicate their results in different cohorts to strengthen their findings. DNA methylation data are not as sensitive as genetic data and are more easily shared in the scientific community. More and more journals are now asking authors to deposit their raw and processed DNA methylation data on open-access repositories before publication is accepted. The Gene Expression Omnibus (GEO) [27] and the ArrayExpress [28] platforms contain several thousand DNA methylation data sets in humans, which constitute a valuable and underexploited treasure for research groups working on DNA methylation data. For instance, a large epigenome-wide association study (EWAS) of body mass index [14] used a data set from the GEO database to perform cross-tissue correlation analyses, which led to the discovery that methylation loci are enriched for functional genomic features in multiple tissues. It can however be challenging to obtain phenotypic data on samples deposited on such open-access platforms, as authors tend to share the minimum amount of phenotypic information when uploading data sets. It then becomes a daunting task to contact every author of every data set, and efforts should be made to make this phenotypic information more accessible.


DNA Sequencing, Without the Fuss

DNA sequencing technology has been improving by leaps and bounds in recent years, with several techniques vying for supremacy. Now an upstart technology, called nanopore sequencing, looks ready to jump to the front of the pack. Researchers have demonstrated for the first time that they can continuously read the chemical letters of DNA as it travels through a tiny pore, paving the way for a new kind of sequencing machine that decodes DNA much like an announcer reading a ticker tape. The advance might drop the cost of sequencing a complete human genome below $1000, which is expected to revolutionize personalized medicine and help usher in a new era of genetic-based diagnostics and medicines.

Most sequencing techniques require days of work. Machines copy DNA strands and modify them with fluorescent labels and other compounds to enable them to read DNA's sequence letters, or bases. Nanopore sequencing promises to do away with these added steps by sequencing single unmodified DNA strands, and thereby possibly becoming the fastest and cheapest sequencing method on the market.

The idea of passing a DNA strand through a small pore and then reading out its chemical letters was first suggested by researchers in Massachusetts and California in 1996. Since then scientists have figured out how to drive DNA through proteins with tiny pores embedded in a film using an electrical charge. As DNA's bases pass through the pore, they change the electrical charge. Sensitive electronics detect these changes and identify the bases.

One major problem, however, has been that when an electric voltage is applied across the film, DNA tends to move through the nanopore too quickly to read off all the bases in sequence. Two years ago, Mark Akeson and colleagues at the University of California, Santa Cruz, hit upon a possible solution. They added a protein called phi29 to a nanopore setup. The protein loosely grabbed onto a DNA strand as it was moving through the nanopore, slowing its progress.

Now, a team led by Jens Gundlach, a physicist at the University of Washington, Seattle, reports today in Nature Biotechnology that it has incorporated Akeson's phi29 protein into its nanopore setup, which uses a different pore protein that's more adept at quickly identifying all four chemical bases. The phi29 protein slows the DNA down so that only 20 to 30 nucleotide bases move through the pore each second, making it possible to electrically identify each one as it passes. "It's really the holy grail of nanopore sequencing," Gundlach says.

The advance promises to juice up the competition with a nanopore sequencing company called Oxford Nanopore Technologies. In February, officials with that company told attendees at a sequencing technology meeting in Florida that they had already snagged this grail. The company said it could not only electrically read out the full sequence of nucleotides in DNA as they streamed through an individual pore, but that by early 2013 it would be selling machines with thousands of nanopores running in parallel, making it possible to sequence a full genome in as little as 15 minutes, for around $1000.

Most genome researchers agree that would be an impressive feat if true. But they are still waiting for proof. "Oxford Nanopore was the first announcement, but they've been roundly criticized for not showing much data," says chemist Geoffrey Barrall, president of Electronic BioSciences in San Diego, California, which is also developing nanopore sequencing. By contrast, Barrall says, the results by Gundlach and colleagues look convincing. "This is the first paper where somebody has actually sequenced DNA."


The Meselson - Stahl Experiment

The structure of DNA suggested to Watson and Crick the mechanism by which DNA &mdash hence genes &mdash could be copied faithfully. They proposed that when the time came for DNA to be replicated, the two strands of the molecule

  • separated from each other but
  • remained intact as each served as the template for the synthesis of
  • a complementary strand.

When the replication process is complete, two DNA molecules &mdash identical to each other and identical to the original &mdash have been produced.

This mode of replication is described as semiconservative: one-half of each new molecule of DNA is old one-half new.

While Watson and Crick had suggested that this was the way the DNA would turn out to be replicated, proof of the model came from the experiments of M. S. Meselson and F. W. Stahl.

They grew E. coli is a medium using ammonium ions (NH4 + ) as the source of nitrogen for DNA (as well as protein) synthesis. 14 N is the common isotope of nitrogen, but they could also use ammonium ions that were enriched for a rare heavy isotope of nitrogen, 15 N.

After growing E. coli for several generations in a medium containing 15 NH4 + , they found that the DNA of the cells was heavier than normal because of the 15 N atoms in it.

The difference could be detected by extracting DNA from the E. coli cells and spinning it in an ultracentrifuge. The density of the DNA determines where it accumulates in the tube.

Then they transferred more living cells that had been growing in 15 NH4 + to a medium containing ordinary ammonium ions ( 14 NH4 + ) and allowed them to divide just once.

The DNA in this new generation of cells was exactly intermediate in density between that of the previous generation and the normal.

This, in itself, is not surprising. It tells us no more than that half the nitrogen atoms in the new DNA are 14 N and half are 15 N. It tells us nothing about their arrangement in the molecules.

However, when the bacteria were allowed to divide again in normal ammonium ions ( 14 NH4 + ), two distinct densities of DNA were formed:

As this interpretative figure indicates, their results show that DNA molecules are not degraded and reformed from free nucleotides between cell divisions, but instead, each original strand remains intact as it builds a complementary strand from the nucleotides available to it.

This is called semiconservative replication because each daughter DNA molecule is one-half "old" and one-half "new".

Immortal strands. Note that the "old" strand (the red one in the top half of the figure) is immortal because &mdash barring mutations or genetic recombination &mdash it will continue to serve as an unchanging template down through the generations.

E. coli is a bacterium, but semiconservative replication of DNA also occurs in eukaryotes. And because each DNA molecule in a eukaryote is incorporated in one chromosome, the replication of entire chromosomes is semiconservative as well. This also means that the eukaryotic chromosome contains one "immortal strand" of DNA.

  • You are here:  
  • Home
  • Andover Biology Department Textbooks
  • Kimball's Biology (supplemental textbook for Biol-58x Sequence)
  • DNA: The Substance of Genes
  • The Meselson-Stahl Experiment

A DNA sequence is a chain of deoxyribonucleotides while protein sequence is a chain of amino acids. So, this is the key difference between DNA and protein sequence. Phosphodiester bonds exist between deoxyribonucleotides of a DNA sequence while peptide bonds exist between amino acids in a protein sequence. Therefore, this is also a difference between DNA and protein sequence.

Below infographic shows more details on the difference between DNA and protein sequence.


DNA Sequencing: Is Science Fiction Becoming Medical Fact?

In two papers in major scientific journals, researchers today suggested pushing DNA sequencing into more routine use in the clinic, and not just as a research tool.

Dutch researchers are proposing that DNA sequencing replace older forms of genetic tests for diagnosing the cause of severe intellectual disability, the second time in a day that researchers have pushed the emerging technology as a first-choice diagnostic test for severe illness. Those results were published this evening in the New England Journal of Medicine.

"This is the new test for intellectual disability. There is no doubt about it," says Han Brunner of the Radboud University Nijmegen Medical Center, one of the Dutch study's authors. "This is a paradigm shift to genome-first medicine for patients who have complex problems that will not be easy to diagnose by conventional strategies."

Earlier today, a study in Science Translational Medicine proposed that DNA sequencing could become a standard first-choice test for infants in neonatal intensive care units, because a combination of new software and hardware could allow doctors to get results in just 50 hours, answering questions about what is making a baby sick far faster when time is of the essence.

"The bottom line of our research is that it’s now feasible to decode an entire genome and provide interim results back to the physician in two days," Stephen Kingsmore, the director for the Center for Pediatric Genomic Medicine at Children’s Mercy Hospitals and Clinics and a lead author of the paper, told reporters on a conference call yesterday. "We think this is going to transform the world of neonatology, by allowing neonatologists to practice medicine that’s influenced by genomes."

Brunner and his colleagues used a technique called exome sequencing, which extracts only known genes from the vast expanse of DNA in the human genome as a way to reduce sequencing cost. The Dutch researchers sequenced 100 patients with IQs of less than 50 and their unaffected parents. The technique found genetic mutations known to cause intellectual disability in 16 patients, and genes of unknown function that appear to be the cause in another 22 patients. This is at least as good as the current genetic technology available for identifying mutations in patients with severe intellectual disability.

In only two patients did the diagnosis change the way doctors were treating them. For one, a change in diet may prove helpful for a second, the genetic diagnosis will direct which types of epilepsy drugs the patient will receive.

But there are several other benefits to diagnosis. Parents often desperately search for one, just because they want to know what went wrong. They also are often terrified of what will happen if they decide to have another child, and the test can tell them whether they are carrying a gene that caused the defect.

In most cases in this study, a bad copy of a gene from a parent was not to blame. One surprise is that most of the mutations that caused these cases are brand new – they didn't exist in the parents' genomes. In three cases, the gene was inherited from the mother via the X chromosome because boys only have one X chromosome but women have two, such a bad copy is deleterious in boys.

On average, every child is born with one brand new genetic mutation. Most of them don't matter. But it may be, Brunner says, that there are 1,000 genes in which such a mutation can cause mental disability. So some unlucky kids suffer just because of the background mutation rate.

Brunner says that the cost of doing the exome sequencing was about $2,400 per patient sequenced that doesn't include the cost of analysis. But that part of the cost is likely to come down. The sequencing in this study was done with the SOLID system made by Life Technologies. Using Life's forthcoming Proton system or the more widely used HiSeq from Illumina, the dominant player in DNA sequencing, should bring costs down further.

"Exome sequencing has already entered the clinical diagnostic realm," says Heather Memford, a pediatrician and geneticist at the University of Washington who wrote an editorial in NEJM about the result. "The question for us as clinicians is for which patients do we use it now versus later."

The researchers in the Science Translational Medicine paper, working directly with Illumina, first developed a diagnostic test that captured 600 genes that were likely to be the reason the babies wound up in the NICU. That panel is being sold as a kit by Illumina and will likely be used by some commercial laboratories that have government certifications to do diagnostic testing. But then they sequenced the whole genomes of two children who had already died and another five who were alive but undiagnosable. In all but one case, they were able to find a gene that either is or is likely to be causing the child's illness. The cost per test, including the cost of analyzing the data, was $13,500. That might sound like a lot, but the cost of a day in the NICU is $8,000.

The biggest question remains how quickly tests such as this will move from being essentially research tools to being real clinical tests that are conducted in commercial laboratories and are paid for by insurers. There are some accounts that this is starting to happen, but Memford says she still does all her DNA sequencing using research, not insurance, funding. Kingsmore says that he's not sure how the test will reach patients, but he thinks that Children's Mercy may eventually be able to offer it as a service to other hospitals.


DNA Sequencing

After the finding that DNA's information was encoded in its sequence of nucleotides (A, C, T, and G), it was believed that one could find this sequence through a method of analyzing a large number of identical strands of DNA and identifying each nucleotide in order of it's appearance in the DNA sequence. This technique was ultimately discovered in 1977 by Fredrick Sanger. His method, known as Sanger Sequencing, relied on two main principles.

DNA can be separated by size. This was briefly discussed in the section on Laboratory Techniques with regards to gel electrophoresis. Keeping in mind that DNA is negatively charged, it will migrate towards the positive electrode when an electric current is applied. When placed in a gel composed of polyacrylamide beads, larger strands of DNA will migrate through the gel slower. Incidentally, the use of polyacrylamide gels instead of agarose gels allow for much greater resolution in separating strands of DNA by size. It allows one to actually distinguish a strand of DNA 400 base pairs long from a strand of DNA 401 base pairs long. As a result, polyacrylamide gels would be capable of resolving even 1 base pair differences in DNA, making this very useful for sequencing.

In addition, the chemical structure of DNA allows geneticists to manipulate its normal function and use it to identify the sequence of a strand of DNA. As was discussed in the section about DNA replication, cells have a protein called DNA Polymerase II that adds nucleotides complementary to the original strand, allowing it to form two identical strands of DNA from a template strand. To do this, it adds one nucleotide complementary to the template strand at a time. In order to do this, Polymerase requires a DNA Primer, a short strand of DNA complementary to the end of the template strand for a 3' OH group. The following diagram shows the structure of a normal nucleotide (dNTP):

DNA Polymerase II normally will connect the 3' OH to the next nucleotide at the 5' Phosphate group, labeled alpha, and kick off Phosphates beta and gamma. DNA primers provide the 3' OH for DNA Polymerase to build onto. It was realized that without that 3' OH group, the strand could no longer be elongated, which provided the basis for the Sanger Sequencing technique. Sanger proposed the synthesis of a modified form of a nucleotide, a dideoxyribonucleotide triphosphate (ddNTP), which share the same structure as a normal dNTP, with the exception of the 3' OH group, which is replaced by an H:

DNA Polymerase would be able to integrate this modified nucleotide if added in vitro (in a test tube outside the cell) and would immediately terminate the chain being synthesized, as the lack of a 3' OH group would prevent any further addition of nucleotides to the synthesizing strand. This resulted in the discovery of the first technique of DNA sequencing.

Four separate reaction tubes would be required, each one containing radioactively labeled DNA primers, DNA Polymerase II, and an ample amount of all 4 dNTP (dATP, dTTP, dGTP, dCTP), each to be integrated into the DNA strand being synthesized as a nucleotide. In addition, each reaction tubes would contain a different ddNTP, allowing each tube to identify a different nucleotide along the strand. For example, one tube would contain a ddATP, enabling that reaction tube to identify all the A's being integrated into the synthesizing strand, and thus all the T's in the complementary template strand (recall that T nucleotides are complementary and base pair with A nucleotides). All 4 dNTPs and a different ddNTP are added to each reaction tube in a ratio of around 300:1, and Polymerase will randomly integrate either a dNTP or a ddNTP into the synthesizing strand if the ddNTP complements with the nucleotide on the template strand. As a result, a reacting that has ddATP would integrate a dGTP, dCTP, and dTTP if the template strand's nucleotide was C, G, or A, respectively. If the template strand's nucleotide was T, however, Polymerase will randomly integrate either dATP or ddATP. If it integrates dATP, the strand will continue to synthesize, which is what generally happens 97% of the time. If it integrates a ddATP, however, the reaction for that strand of DNA is terminated immediately, and will be of that size for good. This process is repeated with three other tubes with ddGTP, ddCTP, and ddTTP.

Once this reaction is ran to completion, it is then put onto a flat slab of polyacrylamide gel and an electric current is applied in a process called PolyAcrylamide Gel Electrophoresis. This allows for the separation of strands of DNA one base-pair apart, allowing one to resolve the sequence of the template strand by viewing the gel in a process called Autoradiography. Because the primers added were radioactive, an image of it can be taken, where every time strands of DNA are encountered on the gel, a band appears. The end product is a gel with a banding pattern that looks similar to this:

This technique of sequencing, however, has two major shortcomings. The length of the DNA being sequenced cannot be longer than 1000 base pairs long, or it will be completely inaccurate. Typically, sequencing is done on strands of DNA no longer than 850 base pairs long for the best accuracy. This has severe implications for the sequencing of large genomes such as humans, which have a genome of almost 3 billion base pairs!. In addition, this technique of sequencing has another major flaw that one can see from the image above: it requires that the sequences be read by a person. The amount of time it would take for people to manually read and record genomes billions of base pairs long would be impractical for realistic research.

The invention of automated sequencers in 1987 by Applied Biosystems was a breakthrough in the ability of geneticists to sequence large genomes. While the limitation of 1000 base pairs for sequencing is still unavoidable, it solves the problem of needing people to read and record the sequence. In an automated sequencing process, instead of labelling the primers with radioactive labels, the ddNTP is labeled with a fluorescent label, where each ddNTP would fluoresce a different color when a laser is fired through it. Unlike the autoradiography, which will show a band of the same color regardless of the ddNTP, this method will fluoresce a different color for each of the four different nucleotides. Thus, this allows for the sequencing reaction to occur in one tube, as each ddNTP would fluoresce a different color and identify the nucleotide in the sequence. Once the reaction was ran to completion, it was placed into a gel tray where an electric current would be applied from the tray into 96 microcapillaries, all of which will gather at a laser. The idea is that the DNA will migrate towards the positive electrode at the laser end, where it would fluoresce a specific wavelength of light once the DNA passes through, and get recorded by a computer detector. The wavelength of light detected would be automatically associated with the corresponding nucleotide, allowing computers to automatically print out a chromatogram as well as the sequence, similar to this image:

Thus, using computers, the amount of time it takes to sequence strands of DNA is significantly shortened. Without this computing technology, large scale sequencing projects would be impossible to even initiate, much less complete. This paved the way for projects such as the Human Genome Project, which was initiated by the NIH in 1991.