How does heredity work with Endogenous retrovirus?

How does heredity work with Endogenous retrovirus?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Q1: If someone in a species gets infected with an endogenous retrovirus, does that mean that absolutely all of their children will have that endogenous retrovirus? Or is it based on chance?

Q2: is there such a thing as an allele for the virus being there, and for it not being there? How does it work?

Q3: how can an endogenous retrovirus disappear from a species? And when it cannot disappear?

For a provirus to be heritable, the infection must occur in the germ line. The initial integration would occur at one site in the genome and thus the locus could be described as hemizygous. Inheritance would indeed be probabilistic, depending on whether an offspring inherits the provirus-containing chromosome or its uninfected homolog.

Eventually, the endogenous retrovirus (ERV) can become fixed or lost in a population by genetic drift and natural selection. Additionally, retrotransposition and, more importantly, reinfection can greatly increase copy number and allow ERV persistence despite a lack of fixation at specific loci. It should also be noted that more recently acquired ERVs exist that have not reached fixation in humans.

References and Further Reading:

Boeke JD, Stoye JP. Retrotransposons, Endogenous Retroviruses, and the Evolution of Retroelements. In: Coffin JM, Hughes SH, Varmus HE, editors. Retroviruses. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press; 1997.

Belshaw R, Pereira V, Katzourakis A, Talbot G, Pačes J, Burt A, Tristem M. 2004. Long-term reinfection of the human genome by endogenous retroviruses. Proc Nat Acad Sci 101(14):4894-4899.

Researchers Discover That Sheep Need Retroviruses For Reproduction

A team of scientists from Texas A&M University and The University of Glasgow Veterinary School in Scotland has discovered that naturally occurring endogenous retroviruses are required for pregnancy in sheep.

In particular, a class of endogenous retroviruses, known as endogenous retroviruses related to Jaagsiekte sheep retrovirus or enJSRVs, are critical during the early phase of pregnancy when the placenta begins to develop.

Retroviruses, such as human immunodeficiency virus or HIV, are one class of viruses. They are best known for their ability to cause diseases, said Dr. Thomas Spencer, a reproductive biologist with the Texas Agricultural Experiment Station and Texas A&M University.

Findings published Sept. 11 in the Proceedings of the National Academy of Sciences demonstrate enJSRVs are essential for the development of the placenta in sheep.

Retroviruses are unique for their ability to permanently insert their genetic material into the DNA of host cells, he said. During evolution of mammals, some retroviruses infected the germline (cells of the ovary and testis that have genetic material that are passed to their offspring) of the host, which is then inherited by their children. These retroviruses, known as endogenous retroviruses, are present in the genome of all mammals, including humans. Consequently, endogenous retroviruses can be considered remnants of ancient retroviral infections, Spencer said.

Many scientists believed these endogenous retroviruses were junk DNA, he said.

"Indeed, these endogenous retroviruses are usually harmless and generally contain mutations that prevent them from producing infectious retroviruses," he said.

However, several endogenous retroviruses appear to provide protection from infection and are involved in reproduction. For instance, the exogenous Jaagsiekte Sheep Retrovirus or JSRV causes lung tumors in sheep and led to the death of Dolly, the world's first mammal cloned from an adult cell.

The idea that endogenous retroviruses are important for reproduction in mammals has been around for about 30 years, Spencer said. Studies in cultured cells have shown that a protein of a human endogenous retrovirus might have a role in development of the human placenta.

The team blocked expression of the envelope of the enJSRVs using morpholino antisense oligonucleotides, which inhibit translation of specific messenger RNA. When production of the envelope protein was blocked in the early placenta, the growth of the placenta was reduced and a certain cell type, termed giant binucleate cells, did not develop.

The result was that embryos could not implant and the sheep miscarried, Spencer said.

Miscarriage is a serious medical problem for all mammals, including humans.

"Our research supports the idea that endogenous retroviruses shaped the evolution of the placenta in mammals and then became indispensable for pregnancy, and thus may be why they are expressed in the placenta of many mammals," he said.

Further, Palmarini said, "The enJSRVs arose from ancient infections of small ruminants during their evolution," said Dr. Massimo Palmarini, a virologist at The University of Glasgow Veterinary School. "This infection was beneficial to the host and was then positively selected for during evolution. In other words, animals with enJSRVs were better equipped than those without. Therefore, enJSRVs became a permanent part of the sheep genome and, in these days, sheep can't do without them."

The research team is trying to determine exactly how enJSRVs function in development of the sheep placenta. Their results should have implications for both human health and animal production.

The team was led by Spencer and Palmarini. Team members are Kathrin Dunlap, Robert Burghardt, Kanako Hayashi and Jennifer Farmer at Texas A&M, and Mariana Varela at The University of Glasgow Veterinary School.

Introductory Virology Without An Origin

Textbook Definition

Modern textbooks define viruses as obligate intracellular parasites—they have an obligation (i.e., need) to be inside a cell and hijack normal cellular functions to make more viruses, killing the cell.3 The exact details of what a virus does are irreducibly complex.4 Irreducible complexity is something that ceases to effectively function when one part of the whole is removed. Since viruses require every part, they meet the definition of being irreducibly complex. As a result of their irreducible complexity, viruses are said to travel light because they carry only what is needed without the extra baggage of what the host cell provides. All viruses are made of genetic material (DNA or RNA) and a protein coat. Their size is also particularly relevant since they were originally defined as infectious filtrate capable of causing disease in tobacco plants. Viruses range in size from a few nanometers (nm) to several micrometers across. To give you a better idea of just how small a virus is, our red blood cells are 10,000 nm across and a simple bacteria like E. coli is 1,000 nm by 3,000 nm in size. As a result, viruses cannot be seen using traditional microscopy and require sophisticated electron microscopes.

Host Range and Genetic Makeup

The variation of viruses is astounding. Viruses infect every form of life, including humans, insects, plants, and even bacteria. The previously mentioned virus genetic information targets its host(s) and is used as a modern classification scheme. Viruses causing cold sores are called DNA viruses. The flu is an RNA virus. There are even viruses that carry RNA and turn the RNA into DNA once inside the cell. HIV belongs to a special family of viruses called retroviruses because of how they make the RNA to DNA switch.

Retroviruses (e.g., HIV) often exist outside cells to move between hosts. Other retroviruses move between hosts embedded in the host’s DNA. These retroviruses are transmitted through the host genome and are, therefore, called endogenous retroviruses (ERVs). ERVs make up approximately 5–10% of all mammalian genomes (including that of humans).5

Phylogenetic Tree of Life. Image by Maulucioni, via Wikimedia Commons.

Origins of Viruses

Considering the diversity of viruses, evolutionists have been severely challenged to explain their origin. The traditional tree of life now shows three domains: eubacteria, archaea, and eukarya. Typical evolutionary trees in textbooks show the three domains of life but often overlook viruses. Viruses are often overlooked because secular scientists debate whether viruses are alive by definition.6 Let’s exclude viruses from the typical evolutionary trees for now. Standard microbiology textbook chapters about viruses have simpler representations of the three domains of life with one addition: a line extending from the three domains called viruses. In making these evolutionary trees, textbooks communicate that there is no explanation for virus origins. The virus basic definition is a Catch-22: viruses need cells, but viruses kill cells. When I was studying molecular virology at the graduate level, I asked my professor about virus origins, and he said, “We don’t know.”

Arshan Nasir and Gustavo Caetano-Anollés. “A Phylogenomic Data-driven Exploration of Viral Origins and Evolution,” Science Advances 1, no. 8 (September 2015): 1–24,

In this figure, pay attention to the pink areas located on these schematics to represent viruses. For a standard tree of life from an evolutionary perspective, all of the living things are supposed to share a common ancestor marked by a singular dot. Notice how the singular dot in the bottom panel (B) has all the pink on one end, indicating that the viruses are altogether separate from all other living things in terms of common ancestry.

A recent paper tried to address this evolutionary enigma in the journal Science.7 These authors empirically analyzed various DNA sequences (i.e., testable and repeatable) but actually reported historical science (i.e., extrapolation of current rates and processes back into the past).8 No actual experiment lasted millions of years they interpreted their empirical data to fit their presuppositions about our origins.


Discovery of K222 proviruses

In the present study, we report the discovery of a new lineage of pericentromeric endogenous retroviruses that is found in several human chromosomes. These novel pericentromeric sequences were first identified through the study of three cutaneous T-cell lymphoma (CTCL) cell lines derived from one patient and one B-cell lymphoma line. They were then confirmed to exist in the genome of healthy humans. These cell lines at first appeared to lack the known centromeric endogenous retrovirus K111, which was surprising, as centromeric K111 proviruses are detected by PCR in the DNA of almost all human subjects who have been tested in our laboratory [10]. When we screened for integration of K111 proviruses in the DNA of 19 human cell lines using primers that bind the 5′ flanking sequence in a CER:D22Z3 element and the gag proviral sequence of K111 (10 Figure 1A primers P1 and P4), we detected K111 in all cell lines, but none in one B-cell line (IRA) or in three CTCL cell lines (HUT78, H9, and H9/HTLVIII Figure 1B). We considered the possibility that the absence of K111 detection was caused by the deterioration of DNA and so we checked for genomic integrity by amplifying another gene, GAPDH. Detection of GAPDH was seen in the DNA of all cell lines tested, suggesting the true absence of K111, or at least the 5′ end of K111, in some cell lines (Figure 1B). Next we screened for K111 by real-time PCR using a set of primers and a custom probe that specifically targets the K111 env gene, but no other HERV-K (HML-2) proviruses known at the time (Figure 1A 10). (We later found this probe detected K222 provirus as well see below). K111 env amplification signal was detected in the DNA of all cell lines tested (data not shown). Taken together, these results indicate that in the genome of some human cell lines, though we were not able to detect the K111 5′ end, we still detect K111 env. Lack of detection of the 5′ end of K111 could be explained by deletion of the 5′ portion of the K111 genome in some cell lines and/or deletion/mutation of the sequence that primers P1 and/or P4 target. The persistent detection of K111 env signal could otherwise be explained by the presence of an unknown HERV-K (HML-2) sequence, closely related to K111, which could be detected with this primer/probe combination.

Absence of K111 5′ end in the genome of some cell lines. (A) Genomic structure of the K111 provirus. Arrows indicate the position of the primers P1 and P4, which amplify the 5′ integration of K111, and the primer/probe combination K111F, K111R, and K111P that specifically discriminates the K111 and K222 env gene from other HERV-K (HML-2) env sequences due to a 6 bp mutation [10]. (B) Detection of K111 5′ end insertions in human cell lines. The 5′ flanking K111 insertions were detected in all human cell lines tested in this study by PCR using the primers P1 and P4 [10], except for the DNA of cell lines H9, HUT78, H9/HTLVIII, and the IRA B-cell line. Arrows indicate individual K111 insertional polymorphisms. Integrity of the DNA was assessed by amplification of GAPDH (see lower gel). The molecular size of the DNA ladder is shown on the left of the gel. On top of each lane is the name of each cell line subjected to study. The weak bands observed in H9 and H9/HTLVIII were shown by sequencing to be the result of non-specific PCR amplification.

To test the above possibilities, we designed a PCR strategy to examine whether incomplete K111, truncated at the 5′ end, or a novel provirus closely related to K111, exists in the genome. Initially, we designed four forward primers that bind the 5′ sequence flanking K111. These primers, in combination with the reverse primer (P4), which binds to K111 gag, did not detect K111 5′ end in the DNA of two CTCL cell lines derived from the same individual all these primers detected the K111 5′ end in most of the normal human DNA (data not shown). This result again suggests the deletion of K111, or at least the 5′ end, in some human cell lines.

We next designed a PCR strategy to amplify the centromeric provirus that might exist in the DNA of cells lacking the K111 5′ end. We used primers (forward and reverse) that bind several sites spanning a HERV-K (HML-2) genome in combination with primers (P1 and P2) that bind centromeric regions (Figure 2) [10]. In cells lacking the 5′ end of K111, these sets of primers were able to amplify the genome of a novel provirus, which we term K222. In most normal human DNA, these primers amplify K111 (Figure 2). K222 amplification products were seen only when the complementary primers sit on HERV-K (HMl-2) pro, pol, and env but not the gag gene (Figure 2). Cloning and sequencing of full-length K222 revealed two distinct features making K222 different from K111. First, in contrast to K111, K222 lacks the 5′LTR and the gag gene. Second, the K222 5′ flanking sequence is only 78% similar to the K111 5′ flanking sequence, known as CER:D22Z3 [10]. The sequence differences in the K222 5′ flanking region and the likely positioning of K222 in the pericentromere domain (see below) led us to designate these repetitive regions pCER:D22Z8. At the 3′ end of K222, however, we identified the target site duplication of K111 (GAATTC) flanked by a CER:D22Z3 element.

Mapping of K222 proviruses in the human genome. (A) Schematic representation of the primer sets used to isolate K222 by PCR. The genomic structure of a centromeric provirus K111 is shown the viral genes gag, pro, pol, env, and np9, surrounded by LTRs, integrated into centromeric repeats (CER:D22Z3). The target site duplication of K111 GAATTC is indicated. The primers P1 and P2 bind CER:D22Z3. These primers were used in combination with primers that span the provirus genome. Arrows indicate the position and orientation of the primers the number above indicates the nucleotide position they bind in reference to K111. Mapping to the 5′ end of the provirus was performed using the primer P1 and a set of HERV-K (HML-2) reverse primers. Mapping to the 3′ end of the provirus was performed with the reverse primer P2 and a set of HERV-K (HML-2) forward primers. (B, C) Isolation of K222 provirus. The sequence of K222 was detected by PCR from DNA of the cell lines H9 and HUT78, which lack K111 5′ end. Normal human DNA, containing K111, was used as a control for the PCR reaction. The number shown for each lane represents the primers. The gels show the amplification products of the 5′ mapping (B) or 3′ mapping (C) of centromeric proviruses in H9, HUT78, and normal human DNA using different combinations of primers. A molecular size ladder is indicated at the left. No amplification products were detected in H9 and HUT78 cell lines, in contrast to normal human DNA, when using the primer sets P1-982R, P1-2499R (B), or primer sets P2-1965F, and P2-2641F (C). An asterisk indicates a band that was shown by sequencing to be the result of non-specific amplification. Sequencing of the mapping products obtained from DNA of H9 and HUT78 cells reveals the sequence of K222.

After we identified the putative complete genome of K222, we looked for similar sequences in human genome databases. We did not find sequences similar to K222 in the most recent human genome assembly GRCh38/hg38, nor in human Sequence Read Archive libraries. However, we found a K222 provirus in the Whole-Genome Shotgun (WGS) Contigs library (Acc. No. AADC01167561.1). This sequence is from genomic DNA from a presumably healthy person, which suggests that K222 is not only present in the DNA of some cell lines but also in the genomic DNA of healthy modern humans. This K222 sequence was also devoid of the 5′LTR and the gag gene, with the deletion occurring at exactly the same position that our PCR and sequencing studies revealed. Interestingly, this K222 sequence is flanked by pCER:D22Z8 elements at both sides and does not have the K111 target site duplication GAATTC, which we identified at the 3′ integration site of K222 from a human cell line. This may indicate that the complete K222 sequence we amplified from a human cell line is a recombinant K222/K111 sequence. The recombinant K222/K111 sequence we amplified is deposited in the NCBI database (Acc. No. KF651980).

As we have found distinct features in the 5′ and the 3′ integration sites of K222 that made it different from K111, we looked for other possible K222 sequences in WGS libraries. We found five more K222 sequences (Acc. Nos. ABSL01025452.1, ABBA01170497.1, AADB02159125.1, ABSL01190241.1, and ABBA01169090). They have a deletion in the LTR and gag gene, and therefore show a pCER:D22Z8-pro junction. We also identified three more K222 sequences (Acc. Nos. AADB02144450.1.1, ABSL01242357.1, and ABBS01119704.1) that have sequences similar to K222 but not K111 at the 3′ LTR integration site. These sequences were detected in DNA samples from presumably healthy individuals, again suggesting the occurrence of K222 in the human population at large.

To confirm the 5′ end deletion of K111 and the existence of K222 observed by PCR, we performed slot and southern blotting in DNA samples of cell lines, one that appeared by PCR to contain K222 but not the K111 provirus, and another cell line that should have both proviruses. We created specific biotinylated probes for K111 and K222 detection as described in Materials and methods. The K111-specific probe is a 422 bp product that spans the 5′ flanking sequence of the K111 provirus and the immediate 116 bp of its 5′ LTR. The K222-specific probe is a 464 bp product that spans the 5′ flanking sequence of K222 provirus and the immediate 396 of its pro gene (Figure 3A). In the slot blot, we observed that the K111 probe does not recognize DNA from IRA cells, mouse DNA, or a plasmid containing the K222 sequence (Figure 3B). The K111 probe, however, recognized DNA from BJAB cells. These observations verify our previous findings by PCR (Figure 1A). The K222 probe recognized DNA from both human B-cell lines as well as a plasmid containing a complete K222 sequence, but not a plasmid with K111 or mouse DNA. These observations demonstrate the specificity of the K222 probe and confirm the existence of K222 in human DNA. Using southern blot analyses, we further verified the detection of K222 in the DNA of human B-cell lines that either do or do not have the K111 5′ end (data not shown). The K222 probe recognized DNA from a plasmid containing K222 but not K111, further confirming our observations. These results suggest that K222 indeed is distinct from K111 and may exist in much of the human population.

Detection of the K222 provirus in the genome of human cell lines by slot blot analysis. The DNA of human cell lines that were found to have or lack the 5′ end of K111 by PCR, and presumably contain the truncated K222 provirus, were screened for K111 and K222 by slot blot analyses. (A) Generation of K111 and K222-specific biotinylated probes. Probes were generated by PCR incorporation of biotin-labeled dCTP. The K111 probe is 422 bp long and spans the CER:D22Z3 flanking sequence and the beginning of the LTR of K111. The K222 probe is 464 bp long and covers the pCER:D22Z8 flanking sequence and pro gene of K222. (B) DNA from the B-cell lines BJAB (having the 5′ end of K111) and IRA (lacking the 5′ end) as observed by PCR, were screened for K111 and K222 virus by slot blotting. DNA was cross-linked to PVDF membranes and screened for K111 and K222 using biotinylated probes. The probes were detected by chemiluminescence with HRP-conjugated streptavidin. The K111 probe, which targets the 5′ end of genomic K111, reacted with the DNA of BJAB cells but not IRA cells, confirming the lack of the 5′ end of the viral genome in IRA cells. The K222 probe reacted with the DNA of both BJAB and IRA cells, confirming that both cell lines have provirus K222, which is truncated at the 5′ end. Mouse DNA served as a negative control, and plasmids containing either K111 or K222 genomes were used as positive controls. The K111 probe did not react with the K222 plasmid and vice versa.

We created an alignment of the full-length K111 and K222 and the recombinant K222/K111 detected in a human cell line. Nucleotide differences between K111 and K222 can be seen using Highlighter plots (Figure 4A). It is noticeable that besides the deletion of the 5′ LTR and gag gene in K222, there are differences in the nucleotide sequences flanking either end of both proviruses K111 flanked by CER:D22Z3 and K222 flanked by pCER:D22Z8. Not apparent in the figure is that the target site duplication GAATTC found at the 5′ and 3′ end of K111, as well as the last 9 bp of the 3′LTR (ACCCCTTCA), are not present in K222. The difference in flanking sequence and premature deletions in K222 suggest that both proviruses arose from two independent infections. As we noted previously, the K222 sequence found in the genome of the H9 cell line has features indicating that it is a K222/K111 recombinant sequence. This K222/K111 sequence has a deletion of the 5′ LTR and the gag gene similar to K222 and is flanked at the 5′ end by a pCER:D22Z8 repeat. However, at the 3′ end this sequence has the K111 target site duplication GAATTC and is flanked by a CER:D22Z3 repeat, similar to the K111 provirus (Figure 4A). We performed a recombination test (RIP 3.0) to address whether this K222/K111 sequence arose by recombination. The recombination analysis indicated that this sequence originated for the most part from K222 parental provirus. The 3′ LTR and the flanking sequence next to the integration site, however, clearly resemble the K111 provirus, further suggesting that this sequence is a recombinant K222/K111 provirus (Figure 4B). In addition to this recombination assay, we performed a phylogenetic analysis of several 3′LTRs plus flanking sequence of K111, K222, and recombinant K222/K111 proviruses found in human databases and in our laboratory [10]. The phylogenetic tree shows that K222 LTRs and 5′ flanking sequence found in the CTCL cell lines H9 and HUT78 cluster at a midpoint between K111 and K222 sequences, further indicating that these sequences are recombinant (Figure 4C).

Genomic structure and nucleotide differences of full-length K111, K222, and K222/K111 recombinant proviruses. (A) Highlighter plot showing the nucleotide differences between K111 along with K222 provirus found in a WGS database (Acc. No. AADC01167561.1) and K222/K111 recombinant provirus isolated from the genome of the H9 cell line indicated by tick marks (green ticks: A red ticks: T orange ticks: G light blue ticks: C). Gray boxes denote areas deleted in K222. (B) Recombination plot of K222/K111 provirus. The similarity between the query K222/K111 recombinant sequence and each parental K222 and K111 provirus is plotted for each position of an approximately 10 Kb bp sliding window. The Y axis represents the match fraction of the query sequence to each parental sequence (red and blue lines). A match fraction of 1 means 100% identity. The recombinant query sequence is illustrated on the X axis (upper red/blue line at the top). Arrows indicate recombination spots. (C) A phylogenetic dendrogram displays three major clades the 3′ LTR K111 (sometimes called K105) sequences previously reported (10 black), 3′ LTR K222 sequences found in human databases (blue), and the 3′ LTR of K222 sequences found in H9 and HUT78 cell lines (yellow). Previous sequences assigned by us as K105J and K105K were indeed K222 sequences and were flanked by pCER:D22Z8 repeat. (D) K222 and K111 proviruses arose by independent infections. A Bayesian inference tree shows the clustering of the 5′ and 3′ LTRs from various HERV-K (HML-2) proviruses. The K111 5′ LTR (red) and the 3′ LTRs of K111 (blue) and K222 (gray) proviruses cluster in three independent clades with a common ancestor. Posterior probability values > 70 are shown.

Phylogenetic reconstruction of several HERV-K (HML-2) LTRs, including K111s (sometimes called K105) and K222s, showed that K222 3′ LTRs clustered in the ancestral K111 LTR clade (Figure 4D). At the time of provirus integration, both 5′ and 3′ LTR sequences are similar, but over time they accumulate mutations. The LTRs of individual HERV-K (HML-2) proviruses therefore cluster in a specific clade (for example the LTRs of proviruses K101, K102, and so on cluster together). Likewise, Solo LTRs, which are generated by the recombination of the 5′ and 3′ LTR of full-length proviruses, deleting the internal genes, also cluster to the original 5′ and 3′ LTRs (for example, K109 and K111 Solo LTRs). In this tree (Figure 4D), we observe one clade (shown in black) represented by several distinct HERV-K (HML-2) proviral LTRs indicating independent infections. On the other hand, we observe that K111 LTRs and K222 LTRs have a common ancestral sequence. We further observe that this evolutionary line splits into three well denoted clades: a clade corresponding to the K111 5′ LTRs (red), a clade corresponding to the K111 3′ LTR (blue), and another one derived from K222 3′ LTRs (gray). Of note is that some sequences we amplified in our previous publication [10] and that we labeled K105K and K105J 3′ LTR were actually K222 sequences and do not have the last 9 bp of the 3′ LTR, the GAATTC target site duplication of K111, nor the flanking CER:D22Z3 element. Instead, they are flanked by pCER:D22Z8 element. All these K222 3′ LTR sequences clustered together in an independent branch from the K111 3′ LTR sequences, again suggesting two independent infections. The K222 3′ LTR sequences amplified from H9 and HUT78 lines as well as the K111 solo LTR sequences also cluster close to the K222 3′ LTR sequences but in independent clades (Figure 4D). These lines of evidence suggest that although K111 and K222 appear to represent two different integrations in the germline, at some point during human evolution recombination/gene conversion events between K111 and K222 occurred, which would generate in the phylogenetic tree a common ancestral sequence.

Analysis of the integration sequences led us to discover that K111 and K222 inserted into two different centromeric repeats. Both the K111 flanking sequence, the CER:D22Z3, and the K222 flanking sequence, we call pCER:D22Z8, are both the type of CER repeats composed of 384 bp. The organization of these repeats is made of 48 bp segments repeated eight times. The nucleotide similarity of CER:D22Z3 and pCER:D22Z8 is about 71.8% (Figure 5). The similarity between these sequences thus could have allowed us to amplify K222 integration using a set of primers, P1 and P2, which we usually use to amplify the K111 integration.

Sequence alignment of CER:D22Z3 and pCER:D22Z8 repeats. Sequences flanking K222 were analyzed. The new sequence repeat we called pCER:D22Z8 shows 71.8% similarity to CER:D22Z3. The repeat pCER:D22Z8 is a centromeric repeat (CER), which we have named pCER according to its likely position in the pericentromere (see the text). The organization of pCER:D22Z8 consists of eight repeats of 48 bp each. According to the chromosomal location of K222, pCER:D22Z8 is located in chromosome 22 and eight additional chromosomes.

We found evidence of the occurrence of K222 using the DNA of human cell lines, which lack the K111 5′ integration, and we corroborated the existence of K222 in human WGS sequence databases and by southern blotting. To determine if the absence of the K111 5′ end seen in some human cell lines is also seen in the healthy human population, we attempted to detect K111 using the primers P1 and P4 (Figure 6A). We tested the DNA of 96 human individuals by PCR and found that the K111 5′ integration was not detected in 11 out of 96 individuals (Figure 6B). This suggests the deletion of K111 5′ site is found at a frequency of 11.4% in humans (at least in the population of the United Kingdom studied) and is not a genotype exclusively found in cell lines. In certain individuals that presumably lack the K111 5′ end, we sometimes observe a faint amplification product of the right size. The possibility thus exists that in the genome of these individuals there are a few copies of the K111 5′ end, however the concentration of this product was too low for cloning and/or sequencing confirmation. We further analyzed the K111 5′ deletion genotype with primers that bind along the gag gene at positions, 982, 1968, and 2499 in reference to the K101 genome (Acc. No. AF164609.1), and a primer that binds to position 3460 at the pro gene, which is present in both K111 and K222 (Figure 6A). We mapped the K111 5′ integration in the DNA of five individuals who tested positive and five more individuals who tested negative for the K111 5′ end. In this K111 5′ mapping, we were able to detect amplification products of K111 in the five patients who tested positive for the K111 5′ end, confirming they have an intact K111 5′ end (Figure 6C). In contrast, we were not able to amplify K111 in five individuals with a negative K111 5′ end (Figure 6C), confirming the absence of the K111 5′ site in this fraction of the human population. Using the primer P1 and the primer 3460R, which binds to the pro gene, we detected K111 and K222 in individuals positive for the K111 5′ integration. With this set of primers, we also detected K222 in the DNA of individuals negative for the 5′ integration of K111. Sequence analysis confirmed that in contrast to K111, the K222 sequences amplified have a deletion in the 5′ LTR and gag gene.

Detection of K111 and K222 in the human population. (A) Genomic organization of K111 and K222 proviruses. The location of the primers to map K111 and K222 is shown. (B) Detection of K111 5′ end in the human population. The 5′ end of K111 was detected using the primers P1 and P4. The black arrow A indicates the K111 5′ end. The gray arrow indicates non-specific PCR products. On top of each lane is a number signifying each individual subjected to study. (C, D) Mapping of K111 (C) and K222 (D) in five individuals, who are positive or negative for the K111 5′ end, respectively. K111 mapping (C) was carried out with primer P1 and reverse primers that bind at positions 982, 2499, and 3460 bp of a K111 provirus. Black arrows indicate specific K111 insertions A (product P1-982R), C (product P1-2499R), and D (product P1-3460R). The gray arrow indicates non-specific PCR amplifications. K111 detection was observed in the individuals labeled with the numbers, 1, 2, 3, 5, and 6, which are positive for the 5′ K111 end. Non-specific PCR product was detected in individuals labeled with the numbers 4, 68, 86, 90, and 95, which are negative for the 5′ K111 end as shown in B. The primers P1 and 3460R also detect K222 in individuals either negative or positive for the 5′ K111 integration (see stars). K222 mapping was carried out with the primer K222F and reverse primers that bind at positions 982, 1968, 2499, and 3460 bp in reference to K111. PCR products A, B, and C (black arrows) seen in the DNA of K111 positive individuals were shown to be the amplification of K111. No amplification products were seen in individuals lacking the 5′ end of K111. D represents the amplification product of K222.

Finally, we assessed the DNA of individuals having or lacking the 5′ integration of K111 for the presence of K222. We mapped K222 using the primer K222F that binds to the pCER:D22Z8 element and reverse primers targeting either the gag gene (missing in K222) or the pro gene (present in K222). In the set of individuals with the 5′ K111 end, these sets of primers amplified K111 (Figure 6D). The primer K222F likely sits on the K111 flanking CER:D22Z3 element, which is 71.8% similar to pCER:D22Z8, producing the amplification of K111. These sets of primers, however, did not amplify a provirus in the DNA of individuals with a negative K111 genotype. The combination of primers K222F and the reverse primer that binds to the pro gene (3460R) did amplify K222 both in individuals who have K111 and those who do not. Even though the primer K222F could bind to the CER:D22Z3 flanking sequence of K111, in the group of individuals positive for the K111 5′ end these primers instead favor the amplification of K222, and not K111, as targeting K222 will produce the smallest amplification product (Figure 6D). These data thus indicate the existence of K222 in all the humans screened thus far.

Detection of both K222 and recombinant K222/K111 in individuals missing the K111 5′ integration

The evidence so far suggested that K222 sequences are found in all humans. The K222 sequences we detected in some human cell lines, however, have features of K222 and K111 recombination. We evaluated whether a full K222 sequence (non-recombinant) can be detected in the cell line HUT78 and DNA samples from healthy humans, as well as whether a K222/K111 recombinant sequence can be detected in the human population and not just in cell lines. To accomplish this, we first evaluated possible K222/K111 recombinant sequences using primers that bind to the 3′ integration site of K111 in a set of DNA samples missing or having the K111 5′ end. The primers for this test targeted the env gene (7972F) and CER:D22Z3 flanking the 3′ site (P2) (Figure 2A). In samples lacking the K111 5′ end we were able to detect K222/K111 recombinant sequences with these primers (Figure 7A). Confirming previous findings, a K222/K111 recombinant product was also detected in the cell line HUT78. Analysis of these products revealed that these K222/K111 recombinant sequences have the K111 3′ LTR, the K111 target site duplication GAATTC, and the CER:D22Z3 flanking sequence. We next attempted to detect K222 3′ end by designing a primer that binds specifically to the K222 3′LTR-pCER:D22Z8 junction (an area that in the case of K111 would be disrupted by the 3′ LTR end ACCCCTTCA and the K111 target site duplication GAATTC). We were able to detect K222 sequences in individuals positive or negative for the K111 5′ integration as well as in the DNA of the cell line HUT78 (Figure 7B).

Detection of K222 and recombinant K222/K111 sequences in individuals lacking the K111 5′ end. (A) Amplification of K222/K111 recombinant sequences. K222/K111 sequences were amplified with the primer 7972F and the primer P2, which binds to the K111 3′ flanking sequence (see Figure 2) in the DNA from individuals who lack the K111 5′ end (68, 90, and 95) and the cell line HUT78, which also lacks the K111 integration. As a positive control we used the DNA of individual 96, who is positive for K111 5′ end. (B) Amplification of K222 3′ integration. K222 was amplified with the primer 7972F and K222LTR-pCER:D22Z8R, the latter primer binding to the LTR-pCER:D22Z8 junction sequence present in K222, but not in K111. K111 3′ integration instead has a 5 bp sequence from the LTR and the target site duplication GAATTC not present in K222. Amplification of K222 3′ integration was seen in individuals having (96) or lacking (68, 90, and HUT78) the K111 5′ end. (C) Evolution of K222 and K222/K111 recombinant sequences in humans. A Bayesian inference tree of K222 and K222/K111 LTR sequences obtained by PCR in individuals lacking the K111 5′ end. The K222 sequences amplified are indicated with a K222 label. The tree reveals two different K222 LTR clades K222 sequences similar to the K222 provirus (blue) and sequences that cluster to the K111 provirus (red). K222 sequences in individuals lacking the K111 5′ end clustering to K111 indicate the likely existence of K111 in the ancestral human lineage of those individuals. The K222/K111 recombinant clade (red) also suggests that K222 and K111 likely recombined by recombination/gene conversion during human evolution before K111 was lost from the lineage. Posterior probability values >85 are shown for the best tree.

If we were not detecting the 5′ end of K111 in some individuals, yet these individuals have recombinant K222/K111 sequences, we asked ourselves whether K111 was present in the ancestral genome before it was deleted. To address this issue, we created a Bayesian phylogenetic tree to determine whether K222 and recombinant K222/K111 sequences split into two different clades. The K222 sequences amplified in individuals with a genotype negative for the K111 5′ end (these sequences are labeled K222) clustered in the clade represented by K222 sequences (shown in blue in Figure 7C). Recombinant K222/K111 LTR sequences in such individuals clustered to either the K111 clade (red) or the K222 clade (blue) (Figure 7C). This suggests that K111 sequences existed in the genome of individuals missing the K111 5′ end. The separation of the K111 and K222 clades also indicates the integration of K222 and the recombination of K222 and K111 occurred at two separate events over the evolution of humans. It is likely that the deletion of K222 at the 5′ end served as a template to delete this area of K111 and produced the K111 5′ end deletion. As many copies of K111 exist in most modern humans, this deletion event mediated by recombination likely happened early in human evolution, before the expansion of K111.

Integration time of K222 in primate evolution

Despite the sequence similarities between K111 and K222, the differences in the proviral integration sequences, the premature deletions in the genome of K222, and the phylogenetic analysis (see above) suggest that they are distinct proviruses. We attempted to calculate the time of integration of K222 in the germline, and compare this to the time of integration of K111, to elucidate whether these viruses had arisen from independent infections or from a common ancestral infection. Comparison between mutations that differentiate the 5′ and 3′ LTRs has been used to calculate the integration time of a provirus. The sequences of the 5′ and 3′ LTRs are considered identical at the time of integration, but accumulate mutations over time. Thus, by comparing the sequence differences of the 5′ and 3′ LTRs we can estimate the age of viral integration [11]. By comparing the LTRs of K111 we calculated K111 to have entered the germline 2.6 to 6.3 million years ago [10]. However, the 5′ LTR of K222 is missing, and so molecular clock analysis for K222 LTRs is unreliable. Therefore, we searched for K222 integration in the DNA of both New and Old World monkeys and primates using primers specific for K222 (primers K222F and K222bR Figure 8A), and estimated the integration time of K222 in the primate evolutionary line. We detected K222 in the genome of the baboon (an Old World monkey), orangutan, gorilla, chimpanzee, and human (Figure 8B). Of note, K222 was detected in the genome of all 112 human DNA samples (data not shown). This set of primers did not detect K222 in macaques and African green monkeys (Old World monkeys), New World monkeys, or non-primate species (Figure 8B). We designed other sets of primers that could amplify K222 if existent and again failed to detect K222 in macaques, African green monkeys, and New World monkeys, confirming the previous experiment described above. These data generally suggested that K222 integrated after the divergence of New and Old World monkeys, an event calculated to have happened approximately 25 million years ago (Figure 8B [29]). While K222 was not detected in macaques and African green monkeys, both Old World monkeys, it was found in the baboon, another Old World monkey that diverged from macaques and African green monkeys about 6 to 10 million years ago (Figure 8B [30]). These data suggest that K222 was deleted or mutated in the genome of some Old World monkeys. Of course, we might also postulate that K222 was unfixed within the common ancestral Old World monkey population and then fixed only in the baboon but not the macaque and African green monkey genera [31]. In contrast, K111 is only detected in chimpanzee and human DNA [10], suggesting that K111 entered the germline only before the divergence of humans and chimps, an event calculated to have happened approximately 6 million years ago, confirming the molecular clock analysis of K111 LTRs [10]. Therefore, these results indicate that K222 entered the primate line about 25 million years ago and K111 did so about 6 million years ago, indicating that these proviruses arose from independent infections.

K222 integrated into the primate germline after the divergence of New and Old World monkeys and expanded in copy number during the evolution of humans. (A) Genomic organization of centromeric K111 and K222 proviruses. The positions of the primers used to amplify K222 insertions by PCR and qPCR are indicated by arrows. (B) Detection of K222 from DNA of New and Old-World primates. K222 was detected by PCR with the primers K222F and K222bR in the baboon, orangutan, gorilla, chimpanzee, and human, but not in macaques, African green monkeys, and New World monkeys. Other bands (for example, the PCR products detected in mouse, hamster, and rhesus macaque) were shown by sequencing to be the result of non-specific PCR amplification. A phylogeny of New World monkeys, Old World monkeys, and hominoids (humans and apes) is shown. Estimated times of divergence are shown. MYA: million years ago. (C) Quantitation of K222 copies by qPCR in the genomes of Old World monkeys, humans, and a number of other primates. K222 is likely present as a single copy in the genomes of baboon, orangutan, gorilla and chimpanzee, while present in multiple copies in the human genome. The label of each species in (B) matches to the bars.

When amplifying K222 integration, three additional findings were observed. First, K222 amplification in the orangutan produced a PCR product of higher molecular weight than in baboon, gorilla, chimpanzee, and human DNA. Second, sequencing and phylogenetic analysis of K222 amplification products showed that K222 in the gorilla and orangutan diverged from the baboon, chimpanzee, and human cousins (Figure 9A). In the orangutan, K222 incorporated 37 bp nucleotide insertions, which explained the longer PCR product that we observed. K222 sequences accumulated 11 nucleotide substitutions in the gorilla (Figure 9B). Thus, K222 acquired more mutations during the evolution of modern orangutans and gorillas than in the evolution of modern chimpanzees and humans. The third finding, stronger intensity of K222 bands in humans, may indicate expansion of K222 in humans. We performed real-time PCR using equal concentrations of DNA of each species with primers/probe specific for the K222 insertion to estimate an approximate number of K222 copies (Figure 9A). Real-time PCR quantitation estimated that K222 likely exists as a single copy in the genomes of baboon, orangutan, gorilla, and chimpanzee, but multiple copies are found in the human genome (Figure 9C). These data indicate that K222 expanded in copy number during the evolution of humans, sometimes by recombination.

K222 provirus in the genomes of Old World monkeys, primates and humans. (A) Phylogenetic neighbor-joining tree of K222 integration sequences amplified from the DNA of baboon, orangutan, gorilla, chimpanzee, and human. The tree is unrooted, with taxa arranged for a balanced shape. The tree was constructed using the Kimura 2-parameter model. The stability of branches was evaluated by bootstrap tests with 10,000 replications. The scale bars represent the nucleotide substitutions per sequence. (B) Nucleotide sequence alignment of K222 insertion sequences amplified from the genomes of Old World monkeys, primates, and humans. The sequences are compared to the olive baboon sequence, which is the oldest germline sequence. Dots indicate nucleotide similarities to the master sequence. Nucleotide substitutions are indicated in letters. Several nucleotide insertions can be seen in the sequence of K222 in the orangutan, but not other primates or humans (B), which cause the divergence of the orangutan K222 in the phylogenetic tree (A), suggesting that these insertions arose only during the evolution of modern orangutans.

Determination of K222 copy number in humans

When we quantitated K222 in the DNA of primates and humans we observed that K222 seemed to exist as a multiple-copy provirus. We next asked what the approximate copy number of K111 and K222 in humans is. We have seen that the quantitation assay we implemented to detect K111 [10] also detects K222. This could be explained by the identical sequence similarity in the env region of K111 and K222 we target for that assay (Figure 1A). The assay developed to quantitate K222 is otherwise specific for K222 and does not detect K111 (Figure 8A). We therefore quantitated the K111 + K222 copy number as well as the K222 copy number in 16 individuals using equal amounts of DNA. We further developed a qPCR assay to quantitate the single copy gene Top3A (topoisomerase III A) in the same amounts of DNA as a control. We then normalized the number of copies of K111 and K222 to the number of copies detected for the single-copy gene, TOP3A. The number of copies of K111 was estimated by subtracting the number of copies detected for K222 from the copies detected with the assay that detects both proviruses. Our results indicate that K111 exists in human diploid genomes in on the order of approximately 207 to 968 copies and K222 exists in on the order of eight to 61 copies in the human genome (Figure 10).

Estimated copy number of K111 and K222 in modern humans. K111 and K222 copy numbers were calculated by qPCR. K111 plus K222 copies were calculated using the primers K111F and K111R and the probe K111P, which detect both K111 and K222. K222 copy number was calculated with the primers K222F and K222R, and the probe K222P, which binds specifically to K222. K111 copies were calculated by subtracting the K222 copies from the K111 + K222 copies. K111 and K222 copies were normalized to the level of the single copy gene TOP3A. The plot indicates the relative copy number per genome of K111 and K222 in 16 healthy individuals.

Integration of K222 in human chromosomes

K222 exists in the human genome in multiple copies, similar to K111 [10]. In this light, we looked for the presence of K222 in human chromosomes using DNA from human/rodent cell hybrids (Figure 11A), each one harboring one human chromosome. K222 integration was detected by PCR in human chromosomes 1, 7, 12, 13, 14, 15, 18, 21, and 22, but not in any other human chromosome. Interestingly, all centromeres found to harbor K222 also have K111. These DNA were prepared in an outside laboratory, and the possibility of DNA contamination was ruled out by detection of chromosome-specific genes [10]. We next studied whether K222 exists in one or multiple copies in each human chromosome by real-time PCR quantitation. The quantitative assay corroborated the detection of K222 in nine human chromosomes (see above). The quantitative assay also revealed that K222 is likely present in approximately a single copy in chromosome 1, 18, 21, 22, and possibly chromosome 11. K222 exists, however, as several copies in chromosomes 7, 12, 13, 14, and 15 (Figure 11B). Phylogenetic analysis shows K222 sequences clustered in three separate groups: K222 residing in chromosome 13, another K222 group residing in chromosomes 12, 14, and 22, and the K222s found in chromosomes 1, 7, 18, 15, and 21 (Figure 12A). Sequencing of K222 integration (with the primers K222F and K222bR) shows nucleotide differences between K222 proviral sequences found in specific human chromosomes (Figure 12B).

Detection of K222 in human chromosomes. (A) K222 was detected by PCR using the set of primers K222F and K222bR in DNA from human/rodent hybrid cell lines, which carry only one specific human chromosome. K222 was found in chromosomes 1, 7, 12, 13, 14, 15, 18, 21, and 22. Other bands (for example the PCR products detected in chromosomes 17, 19, 20, X, and Y) were shown by sequencing to be the result of non-specific PCR amplification. (B) Quantitation of K222 copies by qPCR in human chromosomes. The number of K222 copies was calculated from 250 ng of DNA from human/rodent cells lines. Assuming that human cells have between 8 and 61 K222 copies, then we could estimate that about one copy of K222 is present in chromosomes 1, 18, 21, 22, and perhaps more than one in chromosome 12. Several copies of K222, however, exist in chromosomes 7, 13, 14, and 15.

Evolution of K222 in human chromosomes. (A) Phylogenetic neighbor-joining tree of K222 integration sequences amplified from the single human chromosomes grown in human/rodent cell hybrids. The tree was constructed in the same way as was the tree in Figure 9. (B) Nucleotide sequence alignment of K222 insertion sequences amplified from single human chromosomes. The sequences are compared to the K222 insertion sequence amplified from the genome of the H9 cell line. Dots indicate nucleotide similarities to the H9 sequence. Nucleotide substitutions and insertions are indicated in letters.

We next investigated whether K222 sequences detected by PCR can be detected by another methodology: deep sequencing and bioinformatics analysis of human DNA samples. We screened for K222 in HERV-K (HML-2)-enriched DNA libraries prepared from splenic fibroblasts and adjacent malignant lymphocytes from a patient with large B-cell lymphoma [10]. We searched for sequence similarities to K222 5′ integration: this is defined as having at least 20 bp of 5′ flanking sequence pCER:D22Z8, the junction sequence ACATATACCCAGT, and 20 bp of the adjacent K222 provirus. We screened for all K222 insertion sequences amplified in nine human chromosomes. Using this independent approach, we detected hundreds of reads of identical K222 integrations in these human DNA libraries (data not shown), confirming the existence of multiple K222 in humans. We detected sequence reads identical to K222 sequences that clustered in three distinct phylogenetic K222 groups, confirming the observations made above using PCR. We further detected several K222 insertion sequences when screening the raw data of sequence read archive (SRA) libraries generated by deep-sequencing studies of human DNA (data not shown). As we noted earlier, we also detected several K222-related integrations in WGS libraries.

Location of K222 in the chromosome

We next further addressed whether K222 resides in the core or the periphery of the centromere by using chromatin immunoprecipitation (ChIP) studies. The histone 3 variant centromere protein A (CENPA) [32] and the centromere protein B (CENPB) [33] are both binding proteins specific to the centromere core the histone post-translational modification mark H3K9Me3 is a hallmark of the pericentromeric domain [34]. We immunoprecipitated CENPA, CENPB, and the H3K9Me3 mark in chromatin extracts from HeLa cells using specific antibodies, and K222 integration linked to these centromere marks was then quantitated by qPCR. We did not find enrichment of K222 in CENPA and CENPB immunoprecipitated fractions (Figure 13A), while an enrichment of the positive control for centromeric DNA, the 11-mer alphoid repeat of chromosome 21 (alphoid Chr.21 ), was found as previously reported [10,35] Figure 13B). As expected, antibodies specific to CENPA and CENPB did not enrich the 5S ribosomal DNA gene (used as a negative control), which is found in the q arm of chromosome 1 (Figure 13C). Immunoprecipitation of the H3K9Me3 histone mark, however, which is found abundantly in pericentromeric regions [34], yielded a marked enrichment of K222 (approximately 50-fold change) and the endogenous alphoid Chr.21 repeat (approximately 650-fold change), but did not significantly enrich the 5S ribosomal DNA (Figure 12A to C). These results strongly suggest that K222 sequences reside in the pericentromeric domains of the centromere, and not in the centromeric core. Although we cannot rule out the possibility that K222 exists in other areas of the genome, it appears that at least the vast majority of K222 reside in the pericentromere.

ChIP analysis shows that K222 proviruses are found in pericentromeric regions. Quantitative PCR of K222 DNA, the centromeric 11-mer alphoid repeat of chromosome 21 (alphoidChr.21) DNA, and 5S ribosomal DNA immunoprecipitated by antibodies to CENPA, CENPB, H3K9Me3, or control IgG. (A) Compared to the control IgG fraction, K222 is enriched 50-fold in the H3K9Me3 fraction, but not in the centromeric CENPA and CENPB protein fractions. (B) The positive control, the alphoid Chr.21 , is enriched approximately 8-fold in each of the CENPA and CENPB fractions, and approximately 650-fold in the H3K9Me3 fraction. (C) The negative control, 5S ribosomal DNA present in the q arm of chromosome 1, shows no significant enrichment with antibodies to CENPA, CENPB, or H3K9Me3. Graphs show the relative enrichment normalized to control IgG-precipitated fractions from three independent experiments. Asterisks indicate statistical significance: *** = P <0.001, ** = P <0.01, * = P <0.05, n.s = not significant.

ERVs in Germ Cells and Pre-Implantation Embryos

Certain stages of mammalian pre-implantation embryo and germ cell development characterized by multiple waves of epigenetic reprogramming pose a unique challenge for the control of endogenous retroviral activity. During the two waves of epigenetic reprogramming that occur in primordial germ cells (PGCs) and fertilized oocytes, a considerable amount of DNA demethylation occurs. Examination of global DNA methylation at these stages have shown that levels within human and mouse pre-implantation embryos decrease beginning at the 1- to 2-cell stage, depending on the species, and up to or soon after the blastocyst stage (Kobayashi et al., 2012 Guo et al., 2014 Lee et al., 2014 Okae et al., 2014 Wang L. et al., 2014). Since DNA methylation is largely responsible for repression of many transposable elements, including ERVs (Walsh et al., 1998), the activity of ERVs and the alternative mechanisms repressing ERV activation during these periods of global hypomethylation have been the focus of a number of recent investigations.

Given that some ERV families have expanded substantially in the number of genomic integrations in animals (Tristem, 2000 Bénit et al., 2001), it has been hypothesized that widespread reactivation of ERVs during the waves of global reprogramming within germ cell and pre-implantation development are largely responsible for this expansion. On the other hand, it is also known that additional ERV repressive mechanisms must be in place in order to maintain genomic stability throughout epigenetic reprogramming and the highly choreographed molecular processes required for normal germ cell development, fertilization, and embryonic development. These ideas are not mutually exclusive, as there is substantial evidence supporting both reactivation (Fuchs et al., 2013 Wang J. et al., 2014 Grow et al., 2015) and alternative repression (Thomas and Schneider, 2011 Manghera and Douville, 2013 Leung et al., 2014 Liu et al., 2014 Schlesinger and Goff, 2015 Wolf et al., 2015 Thompson et al., 2016) across the vast number and variety of ERVs within the genome during germ cell development and embryogenesis.

Despite the existence of elaborate mechanisms that mediate ERV inactivation within the genome, there is extensive evidence that some ERVs are still active and play an important role during gametogenesis and pre-implantation development. Upregulation of ERV proviral transcription and protein expression has been well documented in early human embryos and embryonic stem cells (hESCs). For example, elevated expression of the ERV-H family has been observed within both naïve-like and primed hESC sub-populations (Wang J. et al., 2014 Theunissen et al., 2016 Supplementary Table 1). Additional transcripts from the ERV-K (HML-2) family are also observed at high levels within hESCs and rapidly decrease upon differentiation (Fuchs et al., 2013). Expression of ERV-K begins at the 8-cell stage, concurrent with embryonic genome activation (EGA), and continues throughout pre-implantation development into the blastocyst stage. A majority of actively transcribed ERV-K loci during this time are associated with LTR5HS, a specific subclass of LTR, which is confined to human and chimpanzee and contains an OCT4 binding motif. The LTR5HS subclass requires both hypomethylation and OCT4 binding for transcriptional activation, which synergistically facilitated ERV-K expression (Grow et al., 2015 Supplementary Table 1). Based on the elevated activity of these ERVs within hESCs and pre-implantation embryos, as well as their known interactions with other cellular factors during this time, it is thought that these ERVs have been functionally incorporated into roles important for defining and maintaining pluripotent specific states.

The role of LTRs as regulatory regions for proviral DNA represents an additional function that can be utilized by or incorporated into host genomes. In particular, LTRs are known to be co-opted as promoters or enhancer elements of nearby genes important during embryonic development and maintenance of pluripotency (Friedli and Trono, 2015). Nearly, 縳% of all transcripts in human embryonic tissues are associated with repetitive elements, suggesting a clear pattern of embryonic cell specificity for viral promoters (Fort et al., 2014). Many transcripts detected in the totipotent blastomeres of mouse 2-cell embryos are initiated from LTRs upon EGA as well, indicating that these repeat sequences may help drive cell-fate regulation in mammals (MacFarlan et al., 2012). Regulatory activities of certain LTRs have also been shown to provide important functions not only in embryonic cells, but also within germ cells during gametogenesis. For example, germline-specific transactivating p63 (GTAp63), a member of the p53 family and a transcript important for maintaining genetic fidelity in the human male germline, is under the transcriptional control of ERV9 LTR (Ling et al., 2002 Beyer et al., 2011 Liu and Eiden, 2011 Supplementary Table 1). Transcriptionally active GTAp63 suppresses proliferation and induces apoptosis upon DNA damage in healthy testis and is frequently lost in human testicular cancers. Restoration of GTAp63 expression levels in cancer cells was observed upon treatment with a histone deacetylase (HDAC) inhibitor, indicating possible epigenetic control of ERV9-mediated GTAp63 expression via activating histone acetylation marks. Thus, the ability of ERV9 regulatory regions to contribute to the maintenance of male germline stability is yet another example of how ERVs have evolved to serve an important function in their human hosts (Liu and Eiden, 2011).

Retrovirus Receptor Interactions and Entry

Lorraine M. Albritton , in Retrovirus-Cell Interactions , 2018

Envelope Protein–Driven Neoplasia in Betaretrovirus Infection

The ovine betaretroviruses, JSRV, and ENTV are the causative agents of ovine pulmonary adenocarcinoma ( York et al., 1992 Palmarini et al., 1999 DeMartini et al., 2001 ) and enzootic nasal adenocarcinoma ( Walsh et al., 2013 ), respectively. These neoplasia result from Env-induced cellular transformation ( Allen et al., 2002 Chow et al., 2003 Hull and Fan, 2006 Dirks et al., 2002 Alberti et al., 2002 ). The transformation process is best understood for JSRV. Env interactions with HyaL2, the shared receptor for JSRV and ENTV, have no apparent role in this neoplasia ( Chow et al., 2003 ). Instead it results from interactions of TM and specifically of its cytoplasmic tail with nonreceptor cell proteins.

Studies using the yeast two-hybrid system identified two cellular proteins that coimmunoprecipitate with JSRV Env. The cellular RalA-binding protein-1 (RALBP1), a transcription suppressor in Ras family signaling was identified using the CT domain of Env as bait and a human HeLa cell library as prey ( Monot et al., 2015 ). Independently the cellular Kruppel family zinc finger protein 111 (Zfp111) was identified in a screen using full-length Env as bait and mouse liver cDNAs as prey ( Hsu et al., 2015 ). RALBP1 expression is repressed in JSRV-induced lung tumor cells but in untransformed MDCK cells RALBP1 colocalized with Env in the ER and at the plasma membrane and RALBP1–Env complexes coimmunoprecipitated ( Monot et al., 2015 ). In contrast, the Env bound to Zfp11 was not glycosylated and the complexes localized to the nucleus ( Monot et al., 2015 ). It is possible that both interactions occur in infected lung cells and may cooperate in transformation with reduced RALBP1 expression providing opportunities for Zfp111– glycosylation-negative Env complexes to form early in Env synthesis, which could divert some of the Env mRNAs from normal route of membrane translation in the ER to wholly cytoplasmic translation.

Materials and Methods

Generation of Cas9-GFP mouse NPC cultures

All animal-related procedures were approved by and conducted in accordance with the committee for use of laboratory animals at Lund University.

The forebrain was dissected on embryonic day 13.5 from embryos obtained by breeding homozygote Cas9-GFP knock-in mice (Platt et al, 2014 ). The tissue was mechanically dissociated and plated in gelatin coated flasks and maintained as a monolayer culture (Conti et al, 2005 ) in NSA medium (Euromed, Euroclone) supplemented with N2 hormone mix, EGF (20 ng/ml Gibco), bFGF (20 ng/ml Gibco), 2 nM l -glutamine and 100 µg/ml Pen/Strep. Cells were then passaged 1:3–1:6 every 2–3 days using Accutase (Gibco).

Targeting Trim28 in vitro

Guides were designed at and are listed in the Appendix. Lentiviruses were produced according to Zufferey et al, ( 1997 ), and titers were 10 9 TU/ml, which was determined using qRT–PCR. Cas9 mouse NPCs were transduced at a MOI 40 and allowed to expand for 10 days prior to FACS (FACSAria, BD Biosciences). Cells were detached and resuspended in basic culture media (media excluding growth factors) with propidium iodide (BD Biosciences) and strained (70 µm filters, BD Biosciences). RFP cells were FACS isolated at 4°C (reanalysis showed > 99% purity) and pelleted at 400 g for 5 min, snap frozen on dry ice and stored at −80°C until RNA/DNA were isolated. All groups were performed in biological triplicates.

Targeting Trim28 in vivo using CRISPR/Cas9 in the adult brain

All animal-related procedures were approved by and conducted in accordance with the committee for use of laboratory animals at Lund University.

The production of AAV5 vectors has been described in detail elsewhere (Ulusoy et al, 2009 ), and titers were in the order of 10 13 TU/ml, which was determined by qRT–PCR using TaqMan primers toward the ITR. Prior to injection, the vectors were diluted in PBS the vectors containing the guide RNAs were diluted to 30% except upon co-injection of guides 3, 4, and 13 where the vectors were diluted to 10% each. Rosa26 Cas9 knock-in mice were anesthetized by isoflurane prior to the intra-striatal injections (coordinates from bregma: AP + 0.9 mm, ML + 1.8 mm, DV −2.7 mm) of 1 μl virus solution (0.1 µl / 15 s). The needle was kept in place for additional 2 min post-injection to avoid backflow. Animals were sacrificed after 2 months and analyzed either by IHC or nuclei isolation (see details below) followed by DNA- or RNA-seq.

Targeting Trim28 in vivo during neural development

Male Emx1-Cre (+/−) Trim28-flox (+/−) gtRosa (+/−) were bred with Trim28-flox (+/+) females to generate animals in which one (Emx1-Cre +/− Trim28-flox +/−) or both (Emx1-Cre +/− Trim28-flox +/+) Trim28 alleles had been excised, used as control and Trim28-KO, respectively. Animals used for IHC were additionally heterozygote for gtRosa, in order to visualize the cells in which Cre had been expressed. Animals were genotyped from tail biopsies according to previous protocols (Cammas et al, 2000 ) and sacrificed at 3 months of age for either IHC or RNA-sequencing.


Mice were given a lethal dose of phenobarbiturate and transcardially perfused with 4% paraformaldehyde (PFA, Sigma) the brains were post-fixed for 2 h and transferred to phosphate buffered saline (PBS) with 25% sucrose. Brains were coronally sectioned on a microtome (30 µm) and put in KPBS. IHC was performed as described in detail elsewhere (Sachdeva et al, 2010 ). Antibodies: Trim28 (Millipore, MAB3662, 1:500), Trim28 (Abcam, ab10484, 1:1,000), NeuN (Sigma-Aldrich, MAB377, 1:1,000), IAP-Gag (a kind gift from Bryan Cullen and described in (Dewannieux et al, 2004 ), 1:2,000), Iba1 (WAKO, no.019-19741, 1:1,000). All sections were counterstained with 4',6-diamidino-2-phenylindole (DAPI, Sigma-Aldrich, 1:1,000). Secondary antibodies from Jackson Laboratories were used at 1:400.

Nuclei isolation

Animals were sacrificed by cervical dislocation and brains quickly removed. The desired regions were dissected and snap frozen on dry ice and stored at −80°C. The nuclei isolation was performed according to (Sodersten et al, 2018 ). In brief, the tissue was thawed and dissociated in ice-cold lysis buffer (0.32 M sucrose, 5 mM CaCl2, 3 mM MgAc, 0.1 mM Na2EDTA, 10 mM Tris–HCl, pH 8.0, 1 mM DTT) using a 1 ml tissue douncer (Wheaton). The homogenate was carefully layered on top of a sucrose cushion (1.8 M sucrose, 3 mM MgAc, 10 mM Tris–HCl, pH 8.0, and 1 mM DTT) before centrifugation at 30,000 ×g for 2 h and 15 min. Pelleted nuclei were softened for 10 min in 100 μl of nuclear storage buffer (15% sucrose, 10 mM Tris–HCl, pH 7.2, 70 mM KCl, and 2 mM MgCl2) before resuspended in 300 μl of dilution buffer (10 mM Tris–HCl, pH 7.2, 70 mM KCl, and 2 mM MgCl2) and run through a cell strainer (70 μm). Cells were run through the FACS (FACS Aria, BD Biosciences) at 4°C with low flowrate using a 100 μm nozzle (reanalysis showed > 99% purity). Sorted nuclei intended for either DNA or RNA-sequencing were pelleted at 1,300 ×g for 15 min and snap frozen, while nuclei intended for single-nuclei RNA-sequencing were directly loaded onto the 10× Genomics Single Cell 3′ Chip—see Single-nuclei sequencing.

Analysis of CRISPR/Cas9-mediated Trim28-indels

Total genomic DNA was extracted from all Trim28-KO and control groups using DNeasy blood and tissue kit (Qiagen) and a 1.5 kb fragment surrounding the different target sequences were amplified by PCR (see Table EV3 and EV4 for target and primer sequences, respectively) before subjected to NexteraXT fragmentation, according to manufacturer recommendations. Indexed tagmentation libraries were sequenced with 2 × 150 bp PE reads and analyzed using an in-house TIGERq pipeline to evaluate CRISPR/Cas9 editing efficiency.


Total RNA was isolated from frozen cell/nuclei pellets and brain tissue using the RNeasy Mini Kit (Qiagen) and used for RNA-seq (tissue pieces were run in the tissue lyser for 2 min, 30 Hz, prior to RNA isolation). Libraries were generated using Illumina TruSeq Stranded mRNA library prep kit (poly-A selection) and sequenced on a NextSeq500 (PE 2 × 150 bp).

The reads were mapped with STAR (2.6.0c) (Dobin et al, 2013 ), using gencode mouse annotation GRCm38.p6 vM19 as a guide. Reads were allowed to map to 100 loci with 200 anchors, as recommended by (Jin et al, 2015 ) to run TEtranscripts.

Read quantification was performed with TEtranscripts version 2.0.3 in multimode using gencode annotation GRCm38.p6 vM19 for gene annotation, as well as the curated GTF file of TEs provided by TEtranscripts authors (Jin et al, 2015 ). This file differs to RepeatMasker as it excludes simple repeats, rRNAs, scRNAs, snRNAs, srpRNAs, and tRNAs. The output matrix was then divided between TE subfamilies and genes to perform differential expression analysis (DEA) with DESeq2 (version 1.22.2) (Love et al, 2014 ) contrasting Trim28-KO against control samples. DESeq2 creates a general linear model assuming a negative binomial distribution using the condition of a sample and the normalized values of a gene. The resulting coefficients are tested between conditions using a Wald test. P values are then adjusted using Benjamini and Hochberg correction. For more information about the package methods, see (Love et al, 2014 ).

We report TE subfamilies as significantly different if their P adjusted value is below 0.05 and the absolute value of its log2 fold change is higher than 0.5.

To show the expression levels per condition, samples from the different guides targeting Trim28 were pooled together and tested against the LacZ controls. The data were normalized using sizeFactors from the DESeq2 object (median ratio method described in (Anders & Huber, 2010 ) to account for any differences in sequencing depth.

In order to define differentially expressed elements and study their effects on gene expression, reads were uniquely mapped with STAR (2.6.0c). Full length mouse ERV predictions were done using the RetroTector software (Sperber et al, 2007 ), and read quantification of them was performed using featureCounts (Subread 1.6.3) (Liao et al, 2014 ). Differential expression analysis (DEA) was done with DESeq2. An intersection of the gencode annotation GRCm38.p6 vM19 with windows of 10, 20, and 50 kbp up and downstream of the upregulated elements was made with BEDtools intersect (Quinlan & Hall, 2010 ) same intersection was done for non-upregulated elements to compare their nearby gene dysregulation.

Bigwig files were normalized by RPKM using bamCoverage from deeptools and uploaded to USCS Genome Browser (release GCF_000001635.25_GRCm38.p5 (2017-08-04)).

Differential gene expression analysis was performed using DESeq2. Up- and downregulated genes (P-adj < 0.05, log2FC > 0.5) were used to test for GO terms overrepresentation using the web-based tool PANTHER (Mi et al, 2019 ). 30407594 We tested for overrepresentation of terms in their GO-Slim biological process dataset using Fisher’s exact test with false discovery rates. Terms shown in main figures were those with more than four genes among the group of genes we were testing (up or downregulated), with an absolute log2 fold change value higher than 0.5 and a false discovery rate less than 0.05.

Single-nuclei RNA-sequencing

Nuclei were isolated from the cortex of Emx1-Cre(+/−)/Trim28-flox(+/−, +/+) animals (Ctl n = 2, KO n = 2) as described above. 8,500 nuclei per sample were sorted via FACS and loaded onto 10× Genomics Single Cell 3′ Chip along with the Reverse Transcription Master Mix following the manufacturer’s protocol for the Chromium Single Cell 3′ Library (10× Genomics, PN-120233) to generate single cell gel beads in emulsion. cDNA amplification was done as per the guidelines from 10× Genomics, and sequencing libraries were generated with unique sample indices (SI) for each sample. Libraries for samples were multiplexed and sequenced on a Novaseq using a 150-cycle kit.

The raw base calls were demultiplexed and converted to sample specific fastq files using cellranger mkfastq 1 that uses bcl2fastq program provided by Illumina. The default setting for bcl2fastq program was used, allowing 1 mismatch in the index, and raw quality of reads was checked using FastQC and multiQC tools. For each sample, fastq files were processed independently using cellranger count version 3.0 pipeline (default settings). This pipeline uses splice-aware program STAR 5 to map cDNA reads to the transcriptome (mm10). Since in nuclei samples it is expected to get a higher fraction of pre-mRNA, a pre-mRNA reference was generated using cellranger guidelines.

Mapped reads were characterized into exonic, intronic, and intergenic if at least 50% of the read intersects with an exon, intronic if it is non-exonic and it intersects with an intron and intergenic otherwise. Only exonic reads that uniquely mapped to transcriptome (and the same strand) were used for the downstream analysis.

Low-quality cells and genes were filtered out based on fraction of total number of genes detected in each cell (±3 nmads). From the remaining 16,671 nuclei, 6,472 came from control samples (Ctl) and 7,199 from knockout (KO).

For downstream analysis, samples were merged together using Seurat (version 3) R package (Dobin et al, 2013 ). Clusters have been defined with Seurat function FindClusters using resolution 0.1 and visualized with UMAP plots. Cell type annotation was performed using both known marker-based expression per cluster and a comparison of the expression profiles of a mouse brain Atlas (Zeisel et al, 2018 ). A marker gene set consisting of upregulated gene per cluster among the cells, combined with marker genes for all the 256 cell types in the atlas, was used in the comparison. The 256 atlas cell types were grouped into main clusters at Taxonomy rank 4 (39 groups), and mean expression per group was calculated using the marker gene set. These were compared to the mean expression in our clusters using Spearman correlation. Based on clusters annotation, clusters 0, 1, 2, 5, and 6, 7 were manually merged as excitatory and inhibitory neurons, respectively. For each cell type, differential expression between knockout and control samples was carried out using Seurat function FindMarkers (Wilcox test, P adjusted < 0.01).

The expression of transposable elements was analyzed by extracting cell barcodes for all clusters using Seurat function WhichCells, and the original.bam files obtained from the cellranger pipeline were used to subset aligned files for each cluster (subset-bam tool provided by 10×). Each.bam file was then converted back to clusters’ fastq files using bamtofastq tool from 10×.

The resulting fastq files were mapped using default parameters in STAR using gencode mouse annotation GRCm38.p6 vM19 as a guide. The resulting bam files were used to quantify reads mapping to genes with featureCounts (forward strandness). The output matrix was then used to calculate sizeFactors with DESeq2 that would later be used to normalize TE counts.

The cluster fastq files were also mapped allowing for 100 loci and 200 anchors, as recommended by TEtranscripts authors. Read quantification was then performed with TEtranscripts in multimode (forward strand) using GRCm38.p6 vM19 for the gene annotation, and a curated GTF file of TEs given by TEtranscripts’ authors. For further details, see the RNA-sequencing paragraph.

For the data presented in Fig 5B, the fold change bar plots were made from a DEA performed with DESeq2 of TE subfamilies of each cell type comparing control and knockout samples, for further details see the RNA-sequencing paragraph. The mean plots in the same figure were normalized using the sizeFactors resulting from the gene quantification with the default parameters’ mapping.


The CUT&RUN were performed according to (Skene & Henikoff, 2017 ). In brief, 200,000 mouse NPCs were washed, permeabilized, and attached to ConA-coated magnetic beads (Bang Laboratories) before incubated with the H3K9me3 (1:50, ab8898, Abcam) antibody at 4°C overnight. Cells were washed and incubated with pA-MNase fusion protein, and digestion was activated by adding CaCl2 at 0°C. The digestion was stopped after 30 min and the target chromatin released from the insoluble nuclear chromatin before extracting the DNA. Experimental success was evaluated by capillary electrophoresis (Agilent) and the presence of nucleosome ladders for H3K9me3 but not for IgG controls.

The library preparation was performed using the Hyper prep kit (KAPA biosystems) and sequenced on NextSeq500 2 × 75 bp. Mapping of the reads to mm10 was performed with Bowtie2 (Langmead & Salzberg, 2012 ) using default settings for local alignment. Multi-mapper reads were filtered by SAMtools view version 1.4 (Li et al, 2009 ).

Using the ERVK prediction described in the section RNA-sequencing, we retrieved full length MMERVK10Cs. An ERVK was considered to be a full length MMERVK10C when an annotated MMERVK10C-int of mm10 RepeatMasker annotation (open-4.0.5—Repeat Library 20140131) would overlap more than 50% into the full length ERVK prediction. The intersection was performed with BEDtools intersect 2.26.0 (-f 0.5) (Quinlan & Hall, 2010 ). Heatmaps and profile plots were produced using deepTools’ plotHeatmap (Ramirez et al, 2016 ) and sorted using maximum expression of the Trim28-KO samples or guide 3 for the in vitro and in vivo CRISPRs. Tracks for genome browser were normalized using RPKM using deepTools’ bamCoverage (version 2.4.3) (Ramirez et al, 2016 ).

The H3K9me3 ChIP-seq data from adult mouse cortex were retrieved from (Jiang et al, 2017 ), mapped, and analyzed in the same way as the in-house Cut & Run samples described above.


Cortical brain pieces were disrupted in a tissue lyser (2 min, 30 Hz, 4°C) prior to RNA isolation using an RNeasy Mini Kit (Qiagen). cDNA was synthesized by the Maxima First-Strand cDNA Synthesis Kit (Invitrogen) and analyzed with SYBR Green I master (Roche) on a LightCycler 480 (Roche). Data are represented with the ΔΔCt method normalized to the housekeeping genes Gapdh and β-actin. Primers are listed in Table EV4.

Western blot

Dissected cortical pieces from the Emx1-Cre (+/−) Trim28-flox (+/− and +/+) animals (Ctl n = 5, KO n = 5) were put in RIPA buffer (Sigma-Aldrich) containing Protease inhibitor cocktail (PIC, Complete, 1:25) and lysed at 4°C using a TissueLyser LT (Qiagen) on 50 Hz for 2 min, twice, and then kept on ice for 30 min before spun at 10,000 ×g for 10 min at 4°C. Supernatants were collected and transferred to a new tube and stored at −20°C. Each sample was mixed 1:1 (10 μl + 10 μl) with Laemmli buffer (Bio-Rad) and boiled at 95°C for 5 min before loaded onto a 4–12% SDS–PAGE gel and run at 200 V before electrotransferred to a membrane using Transblot-Turbo Transfer system (Bio-Rad). The membrane was then washed 2 × 15 min in TBS with 0.1% Tween20 (TBST), blocked for 1 h in TBST containing 5% non-fat dry milk, and then incubated at 4°C overnight with the primary antibody diluted in TBST with 5% non-fat dry milk (rabbit anti-Trim28, Abcam ab10484, 1:1,000 rabbit anti-CD68, 1:1,000, Abcam ab125212 rabbit anti-IAP-Gag, 1:10,000, a kind gift from Bryan Cullen and described in (Dewannieux et al, 2004 )). The membrane was washed in TBST 2 × 15 min and incubated for 1 h in room temperature with HRP-conjugated anti-rabbit antibody (Sigma-Aldrich, NA9043, 1:2,500) diluted in TBST with 5% non-fat dry milk. The membrane was washed 2 × 15 min in TBST again and 1 × 15 min in TBS, before the protein expression was revealed by chemiluminescence using Immobilon Western (Millipore) and the signal detected using a ChemiDoc MP system (Bio-Rad). The membrane was stripped by treating it with methanol for 15 s followed 15 min in TBST before incubating it in stripping buffer (100 mM 2-mercaptoethanol, 2% (w/v) SDS, 62.4 mM Tris–HCL, pH 6.8) for 30 min 50°C. The membrane was washed in running water for 15 min, followed by 3 × 15 min in TBST before blocked for 1 h in TBST containing 5% non-fat dry milk. The membrane was then stained and visualized for β-actin (mouse anti-β-actin HRP, Sigma-Aldrich, A3854, 1:50,000) as described above.

Morphological analysis

The morphology of Iba1 + cells in the Emx1-Cre/Trim28-flox animals (Ctl n = 3, KO n = 2) was analyzed in 2D through an unbiased, automated process using the Cellomics Array Scan (Array Scan VTI, Thermo Fischer). The scanner took a high number of photographs (using a 20× objective) throughout cortex (Ctl n = 361, KO n = 215) and striatum (Ctl n = 104, KO n = 63) and the program “Neuronal profiling” allowed analysis of process length, process area, and branchpoints per cell. 10 photographs of cortex from each animal were randomly selected, and Iba1 + cells were manually counted in a blinded manner and presented as Iba1 + cells per mm 2 .

Code availability

The pipeline, configuration files, and downstream analyses are available in the src folder at GitHub ( All downstream analysis and visualization were performed in R 3.5.1.

There are no restrictions in data availability. All file names are described in Table EV5, and the accession code for the RNA and DNA sequencing data presented in this study is GSE154196.

HERV proteins in neurodegenerative disease

The discovery of viral proteins in the eroded brains of MS and ALS patients has prompted researchers to investigate the role of HERVs in these diseases. Although this research is becoming more widespread, the mechanisms are still unclear and remain hypothetical.

Multiple sclerosis

The HERV-W envelope protein binds toll-like receptor 4 on microglia, triggering the cells to secrete proinflammatory cytokines (1). At the same time, the protein also inhibits these cells from scavenging myelin debris (2), a mechanism important for rebuilding myelin sheaths that are damaged in MS, and prevents oligodendrocyte precursor cells (OPCs)—which normally help remyelinate damaged axons—from maturing (3). Combined, these two pathways create an inflammatory environment that contributes to the development of lesions in the brain, while also impairing the ability of local cells to repair the damage. Researchers haven’t yet discovered what triggers the production of HERV-W in the first place.

Experiments in mice have shown that activation of the most recently integrated HERV in the human genome, known as HERV-K, in specific regions of the nervous system causes motor neuron deterioration. This could explain the neurodegeneration seen in ALS, although it is still unclear exactly how HERV-K is involved. Researchers speculate that the envelope protein of HERV-K causes disruption of the machinery in the nucleolus responsible for producing ribosomes, and this in turn results in cell death (1) .This process is thought to spread from cell to cell—in accordance with the progressive deterioration seen in ALS—through factors that stimulate the production of the viral envelope protein, through the secretion of the protein (2), or possibly through the spread of HERV-K itself, though there is no evidence that the endogenous virus can behave in this way.

The envelope protein is not the only way that HERVs can mess with the immune system. Even in the protein’s absence, excess HERV RNA and other HERV-derived nucleic acids can trigger the body’s immune response, alerting cellular sensors that detect cytoplasmic DNA, explains Feschotte, who receives funding from GeNeuro. And “when these sensors get overwhelmed, it triggers autoimmune reactions,” he says. “That’s very well characterized.”

A key question that remains unanswered is why these disease processes would occur in only some people. The vast majority of HERV sequences are present in everyone. But people can carry variable numbers of particular HERV fragments, and a few HERV snippets are found only in some human genomes—both factors that can in theory contribute to individual susceptibility to HERV-driven pathologies.

In addition, work by Christensen’s team at Aarhus University suggests that the host’s individual genetic makeup may come into play. In 2011, the researchers demonstrated a preponderance of certain single-nucleotide polymorphisms around a particular locus of HERV-Fc1, a member of the HERV-F group, on the X chromosome in MS patients, compared with healthy individuals. 10 And other research suggests that environmental factors—such as Epstein-Barr virus, a common infection thought to play a role in MS—can activate the expression of HERV-W. Hammell and colleagues have found that the aggregation of the TDP-43 protein, an RNA- and DNA-binding protein known to accumulate in the vast majority of ALS patients, induces the expression of an endogenous retrovirus in Drosophila. 11

Some surprising HERV links

There is also a growing body of research literature suggestive of a link between HERVs and a variety of idiopathic conditions where genetics and biochemistry have not been fully elucidated. De Meirleir and colleagues 13 reported preliminary results of immunoreactivity to HERV proteins in duodenal biopsies taken from patients diagnosed with myalgic encephalomyelitis (ME). They speculated that HERV expression may also have some connection to the expression of pro-inflammatory cytokines noted in cases of ME and some involvement with, or as a consequence of, the appearance of chronic inflammation.

HERVs have also been implicated in cases of schizophrenia and bipolar disorder. Perron and colleagues 14 suggested that a specific HERV &mdash HERV-W &mdash may lie at an important intersection &ldquobetween environmental, genetic and immunological factors&rdquo in relation to symptom onset. They suggested that activation of HERV-W by means of specific infections may have subsequent knock-on effects again with regard to inflammation and immune activation.

Other studies have also reported over-expression of other HERVs &mdash HERV-H &mdash in relation to conditions such as attention deficit-hyperactivity disorder 15 and autism spectrum disorder. 16 Although, again, there are still gaps in our knowledge of how HERVs may be related to these conditions and, indeed, any other comorbid conditions potentially present, it is interesting to note the results from Shuvarikov and colleagues 17 pointing to a possible role for HERVs in mediating a genetic deletion which coincided with the appearance of autistic behaviours and other cognitive and development features.


This article has provided an overview of a complex topic that may have ramifications in host protection, cancer, and autoimmunity. Ultimately, are HERVs friends or foes? In conferring a biological advantage, HERVs (and solitary LTRs) may indeed be beneficial. Their role in immunological homeostasis and perhaps protection against exogenous retroviruses is intriguing. Alternatively, HERV insertion mutation, molecular mimicry, superantigen motifs, and recombination with other viruses could be responsible for the development and pathology of disease. An additional aspect is whether the presence of HERV peptides during ontogeny culminates with a hole in the immune repertoire. As a result, peptides with similarity to HIV-CTL sequences could be more dangerous to a given individual.

Take home messages

Human endogenous retroviruses (HERVs) make up part of our genome and represent footprints of previous retroviral infection

HERVs possess a similar genomic organisation (gag–pol–env) to present day exogenous retroviruses but are not infectious

The HERV-K superfamily represents one of the most active HERVs and is capable of producing retroviral particles

HERVs may be of benefit to the host but could also be harmful, and may be involved in certain autoimmune diseases and cancers

“An additional aspect is whether the presence of HERV peptides during ontogeny culminates with a hole in the immune repertoire”

Clearly, there is a need for multicentre studies to ascertain firm associations between HERV(s) and autoimmune disease states and certain cancers. In particular “gene chip” technologies will no doubt relate HERV expression with disease and pathological progression. Transcription of individual HERVs or the coordinated expression of HERVs, although important, must be balanced against expression found in normal tissues. Consequently, studies of HERV/LTR polymorphisms, transactivation by helper viruses (or other triggers), and the role of full length or spliced transcripts may provide further knowledge of these viruses. In addition, there is a requirement for a panel of readily available antibody reagents (for example, monoclonal antibodies, recombinant phage antibodies) to determine retroviral products at the site of disease. No doubt the field of HERV research will continue to accelerate so that we can fully ascertain the consequences of renegade endogenous retroviruses and their transfer in xenotransplanation. 85, 86

The data underlying this article are available in NCBI ( database. The accession numbers of genomes used, the consensus sequences of lokiretroviruses, and the alignments are available in supplementary table S2, supplementary data set 1, and supplementary data sets 2–8, Supplementary Material online.

This study was supported by National Natural Science Foundation of China (31922001 and 31701091) and Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions.

Watch the video: Βιολογία Γ Γυμνασίου. Κληρονομικότητα (October 2022).