Why are there N's after Sanger sequencing?

Why are there N's after Sanger sequencing?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

After sending a DNA sample for sequencing, the resulting sequence had N's in the beginning and end of the sequence. I know the N's mean that the computer can't tell what the base pair is, but why is this?

Typically Sanger sequencing will run into a few errors. Sometimes the traces will overlap as below in red and the computer will call N. If you truly wanted to figure out the correct basepair, you can look at the trace.

As you accurately stated, N bases in sequence data generally means the software is unable to identify the base. N bases may appear at the beginning of the sequence result for a number of reasons. One reason would be purification of the amplified product before electrophoresis. Salts in the sample or a poor purification could leave excess dyes in the sample and appear as "dye blobs." Another reason is the software may have started analysis too soon before accurate sequence begins. Typically, quality sequence data begins 30 bases from the primer. N bases at end of the sequence simply could be the end of sequence data as stated earlier. Other reasons include hairpin loops and poly base regions that cause early termination. The best way to determine the cause is to look at the trace data.

Sanger Sequencing

Sanger Sequencing

Sanger sequencing is the process of selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication it is the most widely used method for the detection of SNVs. Because both alleles of an autosomal locus are sequenced concurrently and are displayed as an analogue electropherograms, Sanger sequencing is unable to detect mosaic alleles below a threshold of 15–20% ( Rohlin et al., 2009 ) and can miss a significant proportion of low-level mosaic mutations ( Jamuar et al., 2014 ). In addition, mosaic mutations at higher allele fractions are miscalled “germ line,” which highlights the limitations of Sanger sequencing in detecting mosaicism on both ends of the spectrum ( Jamuar et al., 2014 ).


Frederick Sanger was born on 13 August 1918 in Rendcomb, a small village in Gloucestershire, England, the second son of Frederick Sanger, a general practitioner, and his wife, Cicely Sanger (née Crewdson). [5] He was one of three children. His brother, Theodore, was only a year older, while his sister May (Mary) was five years younger. [6] His father had worked as an Anglican medical missionary in China but returned to England because of ill health. He was 40 in 1916 when he married Cicely who was four years younger. Sanger's father converted to Quakerism soon after his two sons were born and brought up the children as Quakers. Sanger's mother was the daughter of an affluent cotton manufacturer and had a Quaker background, but was not a Quaker. [6]

When Sanger was around five years old the family moved to the small village of Tanworth-in-Arden in Warwickshire. The family was reasonably wealthy and employed a governess to teach the children. In 1927, at the age of nine, he was sent to the Downs School, a residential preparatory school run by Quakers near Malvern. His brother Theo was a year ahead of him at the same school. In 1932, at the age of 14, he was sent to the recently established Bryanston School in Dorset. This used the Dalton system and had a more liberal regime which Sanger much preferred. At the school he liked his teachers and particularly enjoyed scientific subjects. [6] Able to complete his School Certificate a year early, for which he was awarded seven credits, Sanger was able to spend most of his last year of school experimenting in the laboratory alongside his chemistry master, Geoffrey Ordish, who had originally studied at Cambridge University and been a researcher in the Cavendish Laboratory. Working with Ordish made a refreshing change from sitting and studying books and awakened Sanger's desire to pursue a scientific career. [7] In 1935, prior to heading off to college, Sanger was sent to Schule Schloss Salem in southern Germany on an exchange program. The school placed a heavy emphasis on athletics, which caused Sanger to be much further ahead in the course material compared to the other students. He was shocked to learn that each day was started with readings from Hitler's Mein Kampf, followed by a Sieg Heil salute. [8]

In 1936 Sanger went to St John's College, Cambridge to study natural sciences. His father had attended the same college. For Part I of his Tripos he took courses in physics, chemistry, biochemistry and mathematics but struggled with physics and mathematics. Many of the other students had studied more mathematics at school. In his second year he replaced physics with physiology. He took three years to obtain his Part I. For his Part II he studied biochemistry and obtained a 1st Class Honours. Biochemistry was a relatively new department founded by Gowland Hopkins with enthusiastic lecturers who included Malcolm Dixon, Joseph Needham and Ernest Baldwin. [6]

Both his parents died from cancer during his first two years at Cambridge. His father was 60 and his mother was 58. As an undergraduate Sanger's beliefs were strongly influenced by his Quaker upbringing. He was a pacifist and a member of the Peace Pledge Union. It was through his involvement with the Cambridge Scientists Anti-War Group that he met his future wife, Joan Howe, who was studying economics at Newnham College. They courted while he was studying for his Part II exams and married after he had graduated in December 1940. Sanger, although brought up and influenced by his religious upbringing, later began to lose sight of his Quaker related ways. He began to see the world through a more scientific lens, and with the growth of his research and scientific development he slowly drifted farther from the faith he grew up with. He has nothing but respect for the religious and states he took two things from it, truth and respect for all life. [9] Under the Military Training Act 1939 he was provisionally registered as a conscientious objector, and again under the National Service (Armed Forces) Act 1939, before being granted unconditional exemption from military service by a tribunal. In the meantime he undertook training in social relief work at the Quaker centre, Spicelands, Devon and served briefly as a hospital orderly. [6]

Sanger began studying for a PhD in October 1940 under N.W. "Bill" Pirie. His project was to investigate whether edible protein could be obtained from grass. After little more than a month Pirie left the department and Albert Neuberger became his adviser. [6] Sanger changed his research project to study the metabolism of lysine [10] and a more practical problem concerning the nitrogen of potatoes. [11] His thesis had the title, "The metabolism of the amino acid lysine in the animal body". He was examined by Charles Harington and Albert Charles Chibnall and awarded his doctorate in 1943. [6]

Sequencing insulin Edit

Neuberger moved to the National Institute for Medical Research in London, but Sanger stayed in Cambridge and in 1943 joined the group of Charles Chibnall, a protein chemist who had recently taken up the chair in the Department of Biochemistry. [12] Chibnall had already done some work on the amino acid composition of bovine insulin [13] and suggested that Sanger look at the amino groups in the protein. Insulin could be purchased from the pharmacy chain Boots and was one of the very few proteins that were available in a pure form. Up to this time Sanger had been funding himself. In Chibnall's group he was initially supported by the Medical Research Council and then from 1944 until 1951 by a Beit Memorial Fellowship for Medical Research. [5]

Sanger's first triumph was to determine the complete amino acid sequence of the two polypeptide chains of bovine insulin, A and B, in 1952 and 1951, respectively. [14] [15] Prior to this it was widely assumed that proteins were somewhat amorphous. In determining these sequences, Sanger proved that proteins have a defined chemical composition. [6]

To get to this point, Sanger refined a partition chromatography method first developed by Richard Laurence Millington Synge and Archer John Porter Martin to determine the composition of amino acids in wool. Sanger used a chemical reagent 1-fluoro-2,4-dinitrobenzene (now, also known as Sanger's reagent, fluorodinitrobenzene, FDNB or DNFB), sourced from poisonous gas research by Bernhard Charles Saunders at the Chemistry Department at Cambridge University. Sanger's reagent proved effective at labelling the N-terminal amino group at one end of the polypeptide chain. [16] He then partially hydrolysed the insulin into short peptides, either with hydrochloric acid or using an enzyme such as trypsin. The mixture of peptides was fractionated in two dimensions on a sheet of filter paper, first by electrophoresis in one dimension and then, perpendicular to that, by chromatography in the other. The different peptide fragments of insulin, detected with ninhydrin, moved to different positions on the paper, creating a distinct pattern that Sanger called "fingerprints". The peptide from the N-terminus could be recognised by the yellow colour imparted by the FDNB label and the identity of the labelled amino acid at the end of the peptide determined by complete acid hydrolysis and discovering which dinitrophenyl-amino acid was there. [6]

By repeating this type of procedure Sanger was able to determine the sequences of the many peptides generated using different methods for the initial partial hydrolysis. These could then be assembled into the longer sequences to deduce the complete structure of insulin. Finally, because the A and B chains are physiologically inactive without the three linking disulfide bonds (two interchain, one intrachain on A), Sanger and coworkers determined their assignments in 1955. [17] [18] Sanger's principal conclusion was that the two polypeptide chains of the protein insulin had precise amino acid sequences and, by extension, that every protein had a unique sequence. It was this achievement that earned him his first Nobel prize in Chemistry in 1958. [19] This discovery was crucial for the later sequence hypothesis of Crick for developing ideas of how DNA codes for proteins. [20]

Sequencing RNA Edit

From 1951 Sanger was a member of the external staff of the Medical Research Council [5] and when they opened the Laboratory of Molecular Biology in 1962, he moved from his laboratories in the Biochemistry Department of the university to the top floor of the new building. He became head of the Protein Chemistry division. [6]

Prior to his move, Sanger began exploring the possibility of sequencing RNA molecules and began developing methods for separating ribonucleotide fragments generated with specific nucleases. This work he did while trying to refine the sequencing techniques he had developed during his work on insulin. [20]

The key challenge in the work was finding a pure piece of RNA to sequence. In the course of the work he discovered in 1964, with Kjeld Marcker, the formylmethionine tRNA which initiates protein synthesis in bacteria. [21] He was beaten in the race to be the first to sequence a tRNA molecule by a group led by Robert Holley from Cornell University, who published the sequence of the 77 ribonucleotides of alanine tRNA from Saccharomyces cerevisiae in 1965. [22] By 1967 Sanger's group had determined the nucleotide sequence of the 5S ribosomal RNA from Escherichia coli, a small RNA of 120 nucleotides. [23]

Sequencing DNA Edit

Sanger then turned to sequencing DNA, which would require an entirely different approach. He looked at different ways of using DNA polymerase I from E. coli to copy single stranded DNA. [24] In 1975, together with Alan Coulson, he published a sequencing procedure using DNA polymerase with radiolabelled nucleotides that he called the "Plus and Minus" technique. [25] [26] This involved two closely related methods that generated short oligonucleotides with defined 3' termini. These could be fractionated by electrophoresis on a polyacrylamide gel and visualised using autoradiography. The procedure could sequence up to 80 nucleotides in one go and was a big improvement on what had gone before, but was still very laborious. Nevertheless, his group were able to sequence most of the 5,386 nucleotides of the single-stranded bacteriophage φX174. [27] This was the first fully sequenced DNA-based genome. To their surprise they discovered that the coding regions of some of the genes overlapped with one another. [3]

In 1977 Sanger and colleagues introduced the "dideoxy" chain-termination method for sequencing DNA molecules, also known as the "Sanger method". [26] [28] This was a major breakthrough and allowed long stretches of DNA to be rapidly and accurately sequenced. It earned him his second Nobel prize in Chemistry in 1980, which he shared with Walter Gilbert and Paul Berg. [29] The new method was used by Sanger and colleagues to sequence human mitochondrial DNA (16,569 base pairs) [30] and bacteriophage λ (48,502 base pairs). [31] The dideoxy method was eventually used to sequence the entire human genome. [32]

Postgraduate students Edit

During the course of his career Sanger supervised more than ten PhD students, two of whom went on to also win Nobel Prizes. His first graduate student was Rodney Porter who joined the research group in 1947. [3] Porter later shared the 1972 Nobel Prize in Physiology or Medicine with Gerald Edelman for his work on the chemical structure of antibodies. [33] Elizabeth Blackburn studied for a PhD in Sanger's laboratory between 1971 and 1974. [3] [34] She shared the 2009 Nobel Prize in Physiology or Medicine with Carol W. Greider and Jack W. Szostak for her work on telomeres and the action of telomerase. [35]

Awards and honours Edit

As of 2015 [update] , Sanger is the only person to have been awarded the Nobel Prize in Chemistry twice, and one of only four two-time Nobel laureates: The other three were Marie Curie (Physics, 1903 and Chemistry, 1911), Linus Pauling (Chemistry, 1954 and Peace, 1962) and John Bardeen (twice Physics, 1956 and 1972). [4]

  • Elected Fellow of the Royal Society (FRS) in 1954[3] – 1963 [3] – 1981 [3] – 1986 [3]
  • Corresponding Fellow of the Australian Academy of Science – 1982 [3] – 1976 [3] – 1958, 1980 [19][29] – 1951 [3] – 1969 [3] – 1971 [3] – 1977 [3]
  • G.W. Wheland Award – 1978 [3] of Columbia University – 1979 [3] – 1979 [3] Award – 1994 [36]
  • Golden Plate Award of the American Academy of Achievement - 2000 [37][38]
  • Citation for Chemical Breakthrough Award from the Division of History of Chemistry of the American Chemical Society – 2016 [39][40][41]

The Wellcome Trust Sanger Institute (formerly the Sanger Centre) is named in his honour. [3]

Marriage and family Edit

Sanger married Margaret Joan Howe (not to be confused with Margaret Sanger) in 1940. She died in 2012. They had three children — Robin, born in 1943, Peter born in 1946 and Sally Joan born in 1960. [5] He said that his wife had "contributed more to his work than anyone else by providing a peaceful and happy home." [42]

Later life Edit

Sanger retired in 1983, aged 65, to his home, "Far Leys", in Swaffham Bulbeck outside Cambridge. [3]

In 1992, the Wellcome Trust and the Medical Research Council founded the Sanger Centre (now the Sanger Institute), named after him. [43] The institute is on the Wellcome Trust Genome Campus near Hinxton, only a few miles from Sanger's home. He agreed to having the Centre named after him when asked by John Sulston, the founding director, but warned, "It had better be good." [43] It was opened by Sanger in person on 4 October 1993, with a staff of fewer than 50 people, and went on to take a leading role in the sequencing of the human genome. [43] The Institute now [ when? ] has over 900 people and is one of the world's largest genomic research centres.

Sanger said he found no evidence for a God so he became an agnostic. [44] In an interview published in the Times newspaper in 2000 Sanger is quoted as saying: "My father was a committed Quaker and I was brought up as a Quaker, and for them truth is very important. I drifted away from those beliefs – one is obviously looking for truth, but one needs some evidence for it. Even if I wanted to believe in God I would find it very difficult. I would need to see proof." [45]

He declined the offer of a knighthood, as he did not wish to be addressed as "Sir". He is quoted as saying, "A knighthood makes you different, doesn't it, and I don't want to be different." In 1986 he accepted admission to the Order of Merit, which can have only 24 living members. [42] [44] [45]

In 2007 the British Biochemical Society was given a grant by the Wellcome Trust to catalogue and preserve the 35 laboratory notebooks in which Sanger recorded his research from 1944 to 1983. In reporting this matter, Science noted that Sanger, "the most self-effacing person you could hope to meet", was spending his time gardening at his Cambridgeshire home. [46]

Sanger died in his sleep at Addenbrooke's Hospital in Cambridge on 19 November 2013. [42] [47] As noted in his obituary, he had described himself as "just a chap who messed about in a lab", [48] and "academically not brilliant". [49]

Why are there N's after Sanger sequencing? - Biology

Article Summary:

Authors: Rajani Verma 1 and M.L. Jakhar 2
1 Department of Plant Breeding and Genetics, SKNAU, Jobner 303329 (Raj.) (India)
2 Professor & Head Department of Plant Breeding and Genetics, SKNAU, Jobner 303329 (Raj.) (India)

What is DNA Sequencing

DNA Sequencing refers to the process of recording the exact sequence of nucleotides in a DNA segment of an organism corresponding to a specific gene. Gene Sequencing is a process of determine the nucleotide order of a given DNA fragment, called DNA Sequencing. DNA sequencing is the process of determining the sequence of nucleotide bases (As, Ts, Cs, and Gs) in a piece of DNA.

Main points related to DNA Sequencing:

  1. The DNA sequencing is a useful in both basic and applied research in biological science especially in molecular biology.
  2. The gene Sequencing can be done for a specific gene as well as for the entire genome.
  3. The knowledge of DNA Sequencing is very much useful in medical science. It can be used for identification, diagnosis and treatment of genetic disease.
  4. In plants, knowledge of gene sequencing will be help in controlling disease and improving quality of the product and resistance to biotic and abiotic stresses.
  5. DNA Sequencing also referred to as Gene Sequencing or nucleotide sequencing.

Ingredients for Sanger Sequencing :

  • The template DNA to be sequenced
  • A DNA polymerase enzyme
  • A primer, which is a short piece of single-stranded DNA that binds to the template DNA and acts as a "starter" for the polymerase
  • The four DNA nucleotides (dATP, dTTP, dCTP, dGTP)
  • Dideoxy, or chain-terminating, versions of all four nucleotides (ddATP, ddTTP, ddCTP, ddGTP), each labeled with a different color of dye

Method of Sanger sequencing The DNA sample to be sequenced is combined in a tube with primer, DNA polymerase, and DNA nucleotides (dATP, dTTP, dGTP, and dCTP). The four dye-labeled, chain-terminating dideoxy nucleotides are added as well, but in much smaller amounts than the ordinary nucleotides.

The mixture is first heated to denature the template DNA (separate the strands), then cooled so that the primer can bind to the single-stranded template. Once the primer has bound, the temperature is raised again, allowing DNA polymerase to synthesize new DNA starting from the primer. DNA polymerase will continue adding nucleotides to the chain until it happens to add a dideoxy nucleotide instead of a normal one. At that point, no further nucleotides can be added, so the strand will end with the dideoxy nucleotide.

This process is repeated in a number of cycles. By the time the cycling is complete, it’s virtually guaranteed that a dideoxy nucleotide will have been incorporated at every single position of the target DNA in at least one reaction. That is, the tube will contain fragments of different lengths, ending at each of the nucleotide positions in the original DNA (see figure below). The ends of the fragments will be labeled with dyes that indicate their final nucleotide.

(Image Source:

After the reaction is done, the fragments are run through a long, thin tube containing a gel matrix in a process called capillary gel electrophoresis. Short fragments move quickly through the pores of the gel, while long fragments move more slowly. As each fragment crosses the “finish line” at the end of the tube, it’s illuminated by a laser, allowing the attached dye to be detected. The smallest fragment (ending just one nucleotide after the primer) crosses the finish line first, followed by the next-smallest fragment (ending two nucleotides after the primer), and so forth. Thus, from the colors of dyes registered one after another on the detector, the sequence of the original piece of DNA can be built up one nucleotide at a time. The data recorded by the detector consist of a series of peaks in fluorescence intensity. The DNA sequence is read from the peaks in the chroatogram.

  1. The primer used can also be annealed to a second site. This will cause two sequences for interpretation at the same time.
  2. Sometime RNA contaminates the reaction, which can act like a primer and lead to bands in all lanes at all possible due to non-specific priming.
  3. Secondary structure of DNA being read by DNA polymerase can lead to reading problem.
  4. It can sequence only short length of DNA at a time. It can sequence maximum of about 1000 base pairs.

About Author / Additional Info:
I am currently pursuing Ph.D in PLANT BREEDING AND GENETICS from University of SKNAU, JOBNER, JAIPUR.

Important Disclaimer: All articles on this website are for general information only and is not a professional or experts advice. We do not own any responsibility for correctness or authenticity of the information presented in this article, or any loss or injury resulting from it. We do not endorse these articles, we are neither affiliated with the authors of these articles nor responsible for their content. Please see our disclaimer section for complete terms.

1. You can use any of the following programs to view your .ab1 chromatogram file

2. You should see individual, sharp and evenly spaced peaks

3. Expect to get 500-700 bases of clean reliable DNA sequence

Anything less and you might suspect contamination in your sample or consider asking your sequencing facility to apply a special protocol for a difficult template. Anything more and you’re venturing into the uncertain terrain.

4. Never trust the first 20-30 bases of a DNA sequencing read

The peaks here are usually unresolved and small, so I suggest designing your primer at least 50bp upstream of the sequence of interest.

5. Use a silica spin column for purification of the samples you send for DNA sequencing

If your sequencing facility requires you to perform your own Big Dye PCR amplification reaction (as opposed to using the all inclusive service some companies offer), you can purify the product either via the Sodium Acetate/isopropanol precipitation method or using a silica spin column available from several vendors. The precipitation method has an unfortunate side effect of messing up the reaction around base 70-75 of the read (see image below), so I would strongly recommend using a silica spin column. They can be pricey, but well worth it.

6. Edit your DNA sequence

Finally, when you do see a miscalled peak, don’t be shy. Feel free to edit it. Most chromatogram viewing programs (even the free ones) allow you to edit the sequence.

I hope these tips will help you get the most out of your DNA sequencing results and to troubleshoot any problems that come up. Good luck analyzing your sequences!

More DNA Sequencing Resources:

    Thank you BitesizeBio for originally publishing this and allowing us to share it with our readers!

Q&A: Confirming Next-Gen Sequencing Results with Sanger

Tracy Vence
Oct 11, 2016

WIKIMEDIA, BAINSCOU For clinical purposes, next-generation sequencing (NGS) has all but replaced its methodological predecessor, Sanger sequencing. It is faster. It is cheaper. But is next-gen sequencing alone sensitive and specific enough to catch every difficult-to-detect, disease-associated variant while avoiding false-positives?

&ldquoThere is significant debate within the diagnostics community regarding the necessity of confirming NGS variant calls by Sanger sequencing, considering that numerous laboratories report having 100% specificity from the NGS data alone,&rdquo Ambry Genetics Chief Executive Officer Aaron Elliott and colleagues wrote in a study published last week (October 6) in The Journal of Molecular Diagnostics.

Elliott and colleagues simulated a false-positive rate of zero when comparing the results of 20,000 hereditary cancer, NGS panels&mdashincluding 47 disease-NGS alone, the researchers &ldquomissed [the] detection of 176 Sanger-confirmed variants, the majority in complex genomic regions (n = 114) and mosaic mutations (n = 7),&rdquo they reported in their paper.

In an interview with The Scientist, Elliott lamented a lack of quality-control guidelines regarding confirmatory sequencing methods among diagnostic labs.

The Scientist:What prompted this particular analysis?

Aaron Elliott: The debate within the diagnostic industry as far as the need to confirm variants. Every lab kind of has their own stance on if something like Sanger confirmation is needed. And the debate is getting very heated as companies offer cheaper and cheaper tests. As you keep dropping the price of testing, it’s very hard to keep these [confirmatory] methods around. . . . So different labs are coming out with different stances on the need to do this, and it’s really that [some] labs are trying to have, basically, rock-bottom pricing.

We wanted to go back and look at our own internal data where we start the test by Sanger-confirming every next-generation sequencing variant that is not benign—your variants of unknown significance, your likely pathogenic, and your pathogenic variants. We did that on 20,000 samples. . . . It’s about a 2 percent of real mutations is what you would miss if you did no Sanger confirmation at all.

Basically, the results [showed] that, number one, if you don’t Sanger-confirm calls at all, you’re . . . going to report out false calls, or, if you set your thresholds based on low sample numbers, you’re going to miss calls.

TS: Your team still uncovered some false-positive results, and noted that these “were not evenly distributed across all genes as would be expected if they were random PCR or sequencing artifacts.”

AE: We looked at 47 genes in those 20,000 samples. And false-positives aren’t in every gene: they were in 20 of the 47 genes that we looked at. And on top of that, they are in specific genomic regions that are difficult to sequence. But those are the genomic regions that also have real mutations in them, as well. So if you were to do a 1,000-sample validation—which is a pretty big validation—and you looked at the same 47 genes that we looked at, you would see about five false-positives.

TS: Why aren’t there guidelines for confirmatory methods?

AE: A lot of labs don’t want to do it. It’s a whole ’nother workflow in the lab, it costs a lot of money . . . and it increases the turnaround time on the test by about two days if you have to confirm something. . . . There’s more and more pressure to get these tests out faster and faster and cheaper and cheaper.

TS: After adjusting your analyses to simulate zero false positives, your team reported missing 2.2 percent of clinically relevant mutations. Was that surprising at all?

AE: I didn’t think it was surprising. Our philosophy is to start with the most-sensitive assay, the most sensitive bioinformatics pipeline you can possibly start with when you begin your testing, which does require more Sanger confirmation. But it does allow you to pick up more mutations that you would have missed or would have been filtered out. A good example of that is mosaic mutations. In the study there were seven mosaic mutations that we would have missed if we were not Sanger-confirming calls.

TS: You and your colleagues propose quality thresholds for next-gen sequencing–based diagnostic screens. For all screens that don’t meet these, are you recommending confirmation by Sanger sequencing or some other method?

AE: Yeah, it doesn’t necessarily have to be Sanger confirmation. There are other methods that you could use. For deletions/duplications, you could use MLPA [multiplex ligation-dependent probe amplification] or array. You can use qPCR for certain tests. . . .For anything that doesn’t meet those specific thresholds does need to be confirmed.

Those thresholds cannot be accurately determined unless you have tens of thousands of samples.

TS: Your group reported spending $1.9 million to include Sanger sequencing confirmation for 20,000 samples. Considering the added cost of confirmatory screening, do you think other diagnostic labs will heed your group’s recommendations?

AE: You know, I don’t know. It’s hard to say. . . . Definitely when you look at diagnostic labs, there’s definitely different tiered-quality testing. There are certain labs that we believe are high-tier that go above and beyond in quality, and then you have your cheaper labs. And it’s your cheaper labs whose business models are to drive down [the cost of] genetic testing and, in order to do that, you need to eliminate these particular assays, which keep the sensitivity very, very high.

This type of testing—especially the type of testing that this paper is based on, hereditary cancer testing—is not really the time to cut corners. People make very big decisions based on these results. . . . I don’t think this is the time to try to save a couple hundred dollars by not confirming your next-generation sequencing calls. The data show that it needs to be done.

TS: What do you hope comes of this study?

AE: I hope people look at studies like this and understand that all next-generation sequencing tests on the market are not the same—they’re not created equal. People need to really understand how companies do their testing. Companies need to be transparent on the quality control that they have in place for testing.

What is Sanger Sequencing

Sanger sequencing (SGS) is the first generation sequencing method developed by Fredric Sanger in 1977. It involves the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication . Then, the producing amplicons are separated by capillary electrophoresis. Generally, Sanger sequencing serves as a fast and cost-effective sequencing method for small-scale projects with less than 100 amplicon targets. Moreover, it is better for the sequencing of single genes.

Figure 1: Sanger Sequencing

Furthermore, Sanger sequencing is an analogical method which generates a single sequence by combining signals from all DNA fragments in the sample. It does not allow the isolation of individual signals. Thus, the resultant signal is a mixed-signal, which does not allow the identification of variants, which occur below 25% frequency in a sample.

Secondary Structure


Sharp drop or premature termination of sequencing signal during the sequencing run.

How to identify:

Sequencing signal drops off sharply or terminates before the end of the sequencing run (termination of sequencing signal also occurs with PCR products, depending on how long the PCR fragment is).


Secondary structure in the template can cause various anomalies that result in inefficient chain elongation after the region of secondary structure.

Solution: Many of the same techniques used to eliminate compression can be helpful in obtaining good sequence data past regions of secondary structure. Although not always practical, changing sequencing chemistries can produce better results as well.

At Amplicon Express, we use strict protocols and internal quality controls to insure our reactions are as consistent as possible. Most often when there is trouble with the sequencing reaction, the cause lies with the DNA template itself, making it very important that the materials we sequence are of the highest quality possible. If you have further questions or need technical support, please contact us.

Module 1 : Synthetic Biology and Central Dogma

We discussed the chain termination method by Sanger’s technique. Now Sanger’s
technique is really beautiful, because it is based on a principle that you can terminate a
growing DNA chain by using a nucleoside which lacks the 3’-OHgroup.
Now the problem with this type of technique is that it is not very quick or rapid. Because,
first of all you need to have four lanes in the gel and then in one go, you can determine
up to 100 base sequences. Now, suppose you have millions of base pairs in the genome
of an organism.
So, it will a huge amount of time. It will take years, before you can really determine the
whole sequence of the genome. By the way genome is basically the complete DNA that
is present in an organism. Thus determination of the sequence of the entire genome is
very difficult using Sanger’s technique because of several limitations. The first
limitation is that it can go up to only 100 (maximum) base sequences and then the

second one is more important that is that after everything is done, you have to run four
lanes for the electrophoresis. These are the two limitations there.
If you run four lanes your gel size will be more, your applied voltage has to be more. So,
it may not be very economic. Thus, apart from time constrain, economy is also
important. So, people started looking at other ways using the same principle of
dideoxynucleoside triphosphate, but trying to do something which can work in one lane,
so that you do not have to do the four lane gels. So, four lane gels need to be replaced by
a single lane.
(Refer Slide Time: 03:33)

Now initially what was done? Initially the change that was brought in Sanger’s technique
was that the primer it was realized that radioactivity is one issue that people try to avoid
Both Maxam-Gilbert and Sanger’s technique used radioactive phosphorus. For Sanger’s
method the primer had the radioactivity.
So people thought that let us take a primer, which has got a fluorescent label at this end
and these fluorescent labels have different colors. Suppose one fluorescent label is red,
another is blue, another is green and the fourth one is deep violet. So, these are the four
fluorescent labels that are put at the end of the primer. Now what you do? You take that
strand for which, you want to know the sequence. Earlier there were four test tubes, now
also there are four test tubes. In one test tube, you add this primer. So, this primer will be
attached here. And suppose in your first test tube, you are adding ddATP, along with the

all the dNTPs and the DNA polymerase and magnesium. So, what will happen now?
Since you are adding ddATP, so the chain will terminate wherever there is requirement
of A.
Suppose the chain will terminate here and another chain will terminate here but the
interesting point is that whatever you, whatever truncated oligonucleotides are made
when there is requirement of dATP, and instead of that dATP, ddATP is taken they will
all have the same primer with this, dark violet fluorescent label.
So, in the first test tube, you added this primer you added ddATP. So, all the truncated
oligonucleotides will show this dark violet fluorescence. Now in another test tube what
you do? You do a very similar experiment. You take the primer which is having a green
fluorescent tag. In the second test tube, you are adding ddCTP.
So, then all the truncated oligonucleotides will have only green fluorescence containing
label, where ever there is a requirement of C. So, all that truncated pieces will all have
green fluorescence. Then in this third case, you take the blue fluorescent labeled primer
in another eppendorf and now you add ddGTP.
So, you know that for blue fluorescent labeled truncated oligonucleotides, showing blue
fluorescence, there must be G that is required at that point, because you are adding
dideoxy GTP there, and the fourth one is the red fluorescence, you add the ddTTP you
add the red fluorescent primer and then add the ddTTP.
So what is the outline? After doing all these, that in one test tube, wherever there is
truncation of A that piece will show this dark violet fluorescence wherever there is
truncation of C, that will show the green fluorescence wherever there is truncation of G,
that will show blue fluorescence and finally, wherever there is truncation of T, that we
show the red fluorescence.
So, after doing all these reactions in four test tubes, you mix all these four contents
together. Now, it will have pieces which will show dark violet fluorescence, green
fluorescence, blue fluorescence, red fluorescence. Suppose you now put everything
together on a gel and run the electrophoresis. This is an electrophoresis gel in a column,
you can load in a glass column and then you apply voltage.

So, all these pieces will now come down, because this will be the positive side. So,
everything will come down and there is a laser camera which is aimed at this point and
then whatever color comes out, it will tell to the detector. This is the detector when the
laser hits a green fluorescent oligonucleotide, it records that colored fluorescent band.
So, now as we apply voltage, they will be slowly separated. How they will be separated?
That will depend on the length of the truncated oligonucleotide. Suppose you initially get
a green fluorescent band, followed by a blue, then suppose red and then suppose the dark
So, so you know the sequence of colors now you continue the gel electrophoresis. So as
the first band comes here, the detector records the color that goes away the second band
comes here, it records the color and then the third one comes and it records the color. So,
you do not have to move the detector only these bands are moving and finally they come
out. Only thing that you should do is to detect the sequence of these colors and
depending on the sequence of these colors, you conclude the base sequence.
So, this is the actual picture that you will get that is the chromatogram that you will see.
And then from the color of these peaks, you can tell the sequence. Whenever you see a
green sequence, you know that must be coming from the test tube where you have used
the green primer. So, where you have used green primer? It has been used where you
have added the ddCTP. So, there must be a C here then.
Then for the blue primer, you have used a G, so that oligonucleotide must be having a G
and then you have a red. The red one implies TTP. So, you have a T and then for the
dark violet, you have used the ATP. So, like that, the sequence will be according to the
color, color sequence your base sequences will be determined. I hope this is clear.
To summarize this technique, you have to use four primers with different fluorescent
colors and then you have to do the reactions in different test tubes and you have to add
that ddATP like previously. In previous cases, Sanger used the same primer in all the
cases, because all are radioactive. Here the primers have different colors, the base
sequence in primers are same only the fluorescent, the 5’ OH is attached to a fluorescent
labels which give different colors. And then the steps are same you add ddATP in
addition to whatever else is required all the time and then ddATP, ddCTP. ddGTP and

What is the difference between Sanger’s method and this technique? In this method, you
mix all these and you can run only a single lane gel electrophoresis. And then you detect
the color sequence, and depending on the fluorescent color sequence, you predict the
base sequence. In this method, light is required the laser detects fluorescence as it hits
the band and then this records that what is the color that is coming as the truncated
oligonucleotides migrate from here to there.
So, this is the next development after Sanger’s method. But then people again questioned
it that can we simplify it even further? Because one drawback of this method is that you
are doing the reactions in four test tubes or four eppendorfs. So, can we do the reaction in
one container and then do the sequence technique?
So, the next challenge is to do the reaction in only single eppendorf and then do a similar
kind of assay but if you use the primer of different colors, then you have to use different
test tubes or different eppendorfs. So, then somebody thought that the best way to do
these reactions in one test tube or one eppendorf is to you use the ddNTPs
(dideoxynucleoside triphosphate) having fluorescent labels NTP has a sugar and base
with lot of reactive nitrogens. So, what you can do? You can put fluorescent label in the
base of the dideoxynucleotide triphosphate and these fluorescent labels are different for
different bases.
(Refer Slide Time: 15:35)

That means now instead of the primer being colored differently, which means
fluorescently label differently, now the primer is same, but your dideoxy bases have
different colors. Suppose ddATP is colored with a red fluorescence. Then ddGTP, has
got a blue fluorescence. I am just arbitrarily giving some colors. Then you have ddCTP
has the green and ddTTP has the dark violet fluorescence. So, instead of putting the color
on the primer, now you put the fluorescent labels on the dideoxy nucleoside
triphosphates thus these labels are in the bases.
Fortunately the DNA polymerase does not discriminate it accepts the fluorescent tagged
ddNTP as a substrate, although the size is bigger because some of these fluorescent
labels are quite big, but the DNA polymerase is not that selective, it accepts even if the
base has some handle which is a fluorophoric handle. Now you can do the all the
reactions in the same test tube.
So, you have the primary sequence of DNA and you do the reaction that means, you are
adding all the dideoxy nucleotide triphosphates. So, this contains your DNA polymerase,
magnesium, the regular primer without the label, all the dNTPs and finally it contains all
the ddNTPs.
When you do the reaction, what will be the outcome? Now you will again have
truncation the oligonucleotide upon picking up a ddATP will be truncated. If the
truncated portion is red that means, there was the requirement of A at that time. When
the truncated piece is showing a blue fluorescence, that means, there was a requirement
of G at that time.
And then the green fluorescence means there was a requirement of C and then this violet
fluorescence indicate that a T is required. So, now, what will happen? You can do all the
reactions in the same test tube and again run the same single lane gel. Then you just
check the color of the bands that are coming through the column containing your agarose
So, by just checking the color sequences, you can immediately write the base sequence.
Now all these things are computerized. So, the computer will see the colors that are
constantly fed to the computer and the computer already knows that this color means this
base, so it will immediately write all the base sequences.

So Sanger used the radioactive primer, then the improvement was that using a primer
containing the fluorescent label, but that required the reactions to be done in different test
tubes or eppendorfs and then the subsequent development was doing the reaction in one
container (either the test tube or eppendorf), but using the deoxynucleoside triphosphates
which are differently labeled with fluorescent markers.
Then what you can do? You can quicken the process. , One lane gel will be much more
rapid more number of bases can be sequenced if you have radioactivity then you have
to take the photograph of that, a photographic plate has to be put on top of the gel and the
next day you have to take the print of that, you have to develop that.
But when you have these fluorescent labels, you do not care whether something has
passed through. Suppose this color has passed through, that goes away into the solution,
you do not need to know what goes away in the solution what you need to know is that
what comes after what (basically the sequence of colors). If this blue violet band comes
first, then the second one which was following it was yellow, then that will come here
and then the detector records the color and then all these go away.
So, you can actually sequence much greater number of bases in one attempt. About 600-
700 base sequences can be done by this technique. Now why this became important? As
I told you that the genome mapping of different organisms of different living species was
taken up scientist wanted to know that why a cat is different from a dog or why a dog is
different from a man there must be differences in their genome sequence, because
everything is ultimately dependent on the base sequence of the DNA that are present in
the cells.
So, that is why this became a very important issue and a big program was taken which
was called the ‘Human Genome Project’. So, they attempted to map the entire human
genome. Now how many base pairs are there in the human genome? Around 3.2 billion
base pairs.
So, if the Sanger’s method would have been followed to sequence the entire human
genome, it would have taken 10-12 years to complete that or maybe more. However,
after all these changes, this is one project, which was finished much earlier than the
predicted date. This could happen because of these new developments that took place in
the late 20th century.

Why this became very important? We know that the same drug does not work equally for
all of us. A drug may cause acidity to some person, whereas the same drug works very
well for another person. What does it signify? That means, every person has some
differences somewhere in their gene with respect to the other person. If you know those
differences, then you can develop, what is called the personalized medicine.
So, today we are in the era of personalized medicine that means, if your gene sequence
is known and then the doctor can say that this drug is going to work for you or this drug
may not work for you. So, now, this is person specific, although we are not right there,
but now in the in the western world there are cases where personalized medicine is in
use. Specially when they treat cancer, they actually adapt this strategy they determine
the gene sequence of that person very quickly, and then compare that with a healthy
individual and then immediately they can find where the problem is.
So, this type of medicine has to be given to that person. This is called personalized
medicine and that is only possible after the development of the human genome project.
(Refer Slide Time: 24:37)

All these methods that I have told you so far, the fluorescent based methods, that also
take took 3-4 years to complete the process of DNA sequencing, I give you some
statistics here. For human genome project, the goal was to sequence the individual
genome at an affordable price.

If you want to take benefit of this personalized medicine, you have to analyze the whole
gene sequence of your body, but that should be at an affordable price if it is really very
expensive, then it is very difficult. US dollar 1000 is the figure that is often quoted. This
would permit comparison of many thousands of human genome sequence and hence the
correlation of specific sequence with susceptibility to particular disease. I said that
different diseases have different gene sequence.
But now they want that the human genome sequence should be known very quickly,
because when somebody is suffering from some terminal illness, you need to know the
sequence very rapidly, so that the proper medicine can be given and also it should be at
an affordable price. It cannot cost thousands and thousands of dollars that is number 1
and number 2 is that it cannot take months and months or years to know the sequence,
because somebody who is terminally ill, he might have only 2 to 3 months.
So, by that time you have to know the gene sequence and start giving the proper
medicine. So, now, it is the era of Next Gen Sequencing (next generation sequencing).
Now, we have more rapid method of sequencing, even within 4-5 days, the mapping the
base sequence can be completed. In Next Gen Sequencing, there are different methods. I
will not describe all the methods. I will just take one of the methods of that next Gen
There is a method which is called the Illumina method that is developed in Cambridge.
In earlier methods, whatever we have said, they are not real time determination of the
base sequence. What is real time determination of base sequence? When the
dideoxynucleoside triphosphate is added, or any other correct base is added, at that time
you cannot determine the result. You have to wait till all the reactions are over and then
compile them and do the electrophoresis. That is not real time because the you first do
the reactions, then you do the electrophoresis and then come to the result.
Under the real time analysis, as soon as the base is taken, you get to know which base
has been taken. What I am saying is that under real time analysis, whenever a base is
added, you immediately know that which base is being taken and whenever the next base
is added you immediately get to know what is added. This is called real time

This appeared to be very difficult at the beginning. The base is having this fluorescent
marker and you have to amplify these strands. So, you have many strands suppose and
the base has a fluorescent marker. Suppose the there is a requirement of A. So, you do
not add the dideoxy NTP here if there is requirement of A you attach a fluorescent label
to A and you should have a detector which is extremely powerful that whenever A is
added there is a constant shining of the light at this point. So, A is added and you know
that what type of fluorescence you are getting and from that color of the fluorescence,
you can determine the sequence.
But today I am just concentrating on another method which is called pyrosequencing.
Pyrosequencing is a method based on this simple chemistry. I told you that the DNA
polymerase joins the growing oligonucleotide to a new nucleoside triphosphate. So, for
DNA synthesis, what you need is the oligo nucleotide which should have a 3’ OH and
which should have a triphosphate.
So, this is your the chemistry that is happening. This 3’ OH is attacking this phosphate
and pyrophosphate (PPi) goes out. A diphosphate or a pyrophosphate (PPi) is an
inorganic phosphate, no organic, no carbon is there. So, an inorganic diphosphate comes
out. If the DNA had n number of nucleotides earlier, after this reaction, you have n plus
1 number of nucleotide and one pyrophosphate molecule is generated.
(Refer Slide Time: 31:33)

So, the first reaction is that you have a DNA residue, you added the dNTP. So, a
pyrophosphate P2O7

4-(pyrophosphate) is formed. It has been found that there is a
compound called adenosine-5’-phosphosulfate that means, you have adenosine and you
have a phosphosulfate at the 5’-position.
. There is an enzyme called ATP sulfurylase. What is sulfurylase? It breaks this P-O
bond and puts the pyrophosphate here. If you put the pyrophosphate that means this
becomes triphosphate and the sulfate comes out.
So, the reaction is pyrophosphate plus adenosine phosphosulfate in presence of the
enzyme ATP sulfurylase gives ATP plus sulphate. So, a molecule of high energy is
generated which is known as ATP. Now this ATP reacts with a compound called
luciferin. Luciferin is a compound which is present in fireflies, which gives light, light
comes out of the insect which is called the firefly.
So, this luciferin in presence of molecular oxygen and in presence of this enzyme which
is called luciferase, it reacts with ATP and then forms a compound which is called
oxyluciferin this is its structure and it generates light. So, in the pyrosequencing, you
take your DNA strand, you add the primer, that is requirement.Suppose you have added
dATP and then you have added ATP sulfurylase and you have added luciferin and you
also add luciferase that is the enzyme which generates light when luciferin reacts with
the ATP in presence of oxygen. So, light comes out.
There is another enzyme that you have to add which is known as apyrase,. Now I have
told you what is the function of this ATP sulfurylase the function of ATP sulfurylase is
to generate ATP from pyrophosphate. Actually you have to add that adenosine-5’-
phosphosulfate in the same test tube.
So, you have added all this. So, first the P2O7

4- (diphosphate) is generated then it reacts
with this adenosine-5’-phosphosulfate, in presence of ATP sulfurylaseto generate ATP.
As soon as ATP is generated, luciferin reacts with the ATP. In presence of oxygen and
luciferase, it generates light and then there is an enzyme apyrase. Now you have added
dATP to start with. Suppose your DNA that you want to sequence does not require dATP
to start with, maybe it requires a dGTP to start with, but you do not know which one is
the first one.

So, you added first dATP suppose it does not react if it does not react that means, there
will be no generation of pyrophosphate. Now before you add the dGTP, you have to
break down this dATP. So, this apyrase is an enzyme which has the ability to break
down these dNTPs. That means, if unused, then these dATP or dGTP or dCTP or dTTP
will be broken down into nucleoside monophosphate (harmless products) by apyrase.
So, basically what happens now as you have added dATP, if it reacts then you will get
some light if it does not react then what will happen? It will be broken down by apyrase
into some harmless thing. So, after some time, you add the dGTP, if that reacts, then you
will get light. If that does not react, that will be broken down by the apyrase and then if
you add CTP, if it does not react that will be broken down.
So, you will get no light in case of C and in case of the next one say TTP, either you will
get light or you may not get some light. So, basically the whole instrument measures the
light. It actually measures how much light is generated, when you are adding deoxy
ATP, deoxy GTP, deoxy CTP or when you are adding deoxy TTP.
(Refer Slide Time: 39:13)

So, if there is no light that means, that base is not required at that moment. If there is
light that means, there is a requirement of that base at that moment. If there are two
consecutive Gs which are required, as you have added your dGTP, both will be
incorporated one after another, and you will get a light which will be having twice the
intensity as that obtained for the requirement of only one GTP. Ultimately, you have this

kind of a graph. In this direction (along X-axis), you have the bases that you are adding
one after another and on this Y-axis, you have the intensity of the light.
So, how to know what is the sequence? You have added G here and that is the light
intensity that you got. Remember the light comes from the generation of ATP, which
reacts with luciferin to give oxyluciferin, and that happens in presence of luciferase
enzyme to gives light. The intensity of the light will depend on how many Gs or how
many Cs are required. So, now, after G you add that dTTP and you see that the light
intensity is double with respect to C this means two Ts are there. Then you have added
C with practically no light intensity. That means we do not have a C after T. Then
addition of dATP yielded double intensity light, that means two As are there.
Light intensity upon addition of dCTP is three times with respect to your base value.
Base value is basically the intensity obtained when one of the bases are incorporated. So,
now, it is easy to read the sequence of the DNA. This is G then the T has twice the
intensity of the light that is emitted when you have added the dGTP.
So, that will be two Ts that means, there are two Ts because you have double the
intensity then there is no C, because you have not got any light then there are two As
then there is a G, one G because the light is the base value of the light that you get then
there is no T, then there are 3C s. So you may have the following sequence: It is G T T A
A G C C C A T A A A C C G C C A.
So, that is the way to read this diagram. This is what is called pyrosequencing and I say
there are other methods in Next Gen Sequencing. All are direct methods that means at
the time when it is added, you are analyzing the product and then your computer will
ultimately tell you what is coming out and finally what is the sequence of the bases.
Thank you.

Watch the video: The Sanger Method of DNA Sequencing (October 2022).