Virus isolation from a child with acute respiratory disease
In January 2003, a 7-month-old child was admitted to the hospital with coryza, conjunctivitis and fever. Chest radiography revealed typical features of bronchiolitis. A nasopharyngeal aspirate specimen was collected 5 d after the onset of disease (sample NL63). Diagnostic tests for respiratory syncytial virus, adenovirus, influenza viruses A and B, parainfluenza virus types 1, 2 and 3, rhinovirus, enterovirus, HCoV-229E and HCoV-OC43 yielded negative results. The clinical sample was subsequently inoculated onto human fetal lung fibroblasts, tertiary monkey kidney cells (
Cynomolgus
monkey) and HeLa cells. CPE was detected exclusively on tertiary monkey kidney cells, and was first noted 8 d after inoculation. The CPE was diffuse, with a refractive appearance in the affected cells followed by cell detachment. More pronounced CPE was observed upon passage onto the monkey kidney cell line LLC-MK2, with overall cell rounding and moderate cell enlargement (
Supplementary Fig. 1
online). Additional subcultures on human fetal lung fibroblasts, rhabdomyosarcoma cells and Vero cells remained negative for CPE. Immunofluorescence assays to detect respiratory syncytial virus, adenovirus, influenza viruses A and B, and parainfluenza virus types 1, 2 and 3 remained negative. Acid lability and chloroform sensitivity tests indicated that the virus was most likely enveloped, and did not belong to the picornavirus group
23
.
Virus discovery by the VIDISCA method
Identification of unknown pathogens using molecular biology tools is difficult because the target sequence is not known, so genome-specific PCR primers cannot be designed. To overcome this problem, we developed the VIDISCA method based on the cDNA-AFLP technique
4
. The advantage of VIDISCA is that prior knowledge of the sequence is not required, as the presence of restriction enzyme sites is sufficient to guarantee PCR amplification. The input sample can be either blood plasma or serum, or culture supernatant. Whereas cDNA-AFLP starts with isolated mRNA, VIDISCA begins with a treatment to selectively enrich for viral nucleic acid, including a centrifugation step to remove residual cells and mitochondria (
Fig. 1a
). A DNase treatment is also used to remove interfering chromosomal and mitochondrial DNA from degraded cells (viral nucleic acid is protected within the viral particle). Finally, by choosing frequently cutting restriction enzymes, the method can be fine-tuned such that most viruses will be amplified. We were able to amplify viral nucleic acids in EDTA-treated plasma from a person with hepatitis B viral infection, and from a person with an acute parvovirus B19 infection (
Fig. 1b
). The technique can also detect HIV-1 in cell culture, demonstrating its capacity to identify both RNA and DNA viruses (
Fig. 1b
).
The supernatant of the CPE-positive LLC-MK2 culture NL63 was analyzed by VIDISCA. The supernatant of uninfected cells was used as a negative control. After the second PCR amplification step, unique and prominent DNA fragments were present in the test sample but not in the control (1 of 16 selective PCR reactions is shown in
Fig. 1c
). These fragments were cloned and sequenced. Thirteen of 16 fragments showed sequence similarity to members of the coronavirus family, but significant sequence divergence with known coronaviruses was apparent in all fragments, indicating that we had identified a new coronavirus. The sequences of the 13 VIDISCA fragments are provided in
Supplementary Figure 2
online.
Detection of HCoV-NL63 in patient specimens
To show that HCoV-NL63 originated from the nasopharyngeal aspirate of the child, we designed a diagnostic RT-PCR that specifically detects HCoV-NL63. This test confirmed the presence of HCoV-NL63 in the clinical sample. The sequence of the RT-PCR product of the 1b gene was identical to that of the virus identified upon
in vitro
passage in LLC-MK2 cells (data not shown).
Having confirmed that the cultured coronavirus originated from the child, the question remained as to whether this was an isolated clinical case, or whether HCoV-NL63 is circulating in humans. To address this question, we used two diagnostic RT-PCR assays to examine respiratory specimens of hospitalized individuals and those visiting the outpatient clinic between December 2002 and August 2003 (
Fig. 2
). We identified seven additional individuals carrying HCoV-NL63 (
Table 1
). Sequence analysis of the PCR products indicated the presence of a few characteristic point mutations in several samples, suggesting that several viruses with different molecular markers may be cocirculating (
Fig. 3
and
Supplementary Fig. 3
online). At least five of the HCoV-NL63-positive individuals suffered from respiratory tract illness; the clinical data of two individuals was not available. Including the index case, five of the patients were children less than 1 year old, and three patients were adults. Two adults were likely to be immunosuppressed, as one of them was a bone marrow transplant recipient and the other an HIV-positive patient suffering from AIDS, with very low CD4
+
cell counts (
Table 1
). No clinical data was available for the third adult. One patient was coinfected with respiratory syncytial virus (no. 72), and the HIV-infected patient (no. 466) carried
Pneumocystis carinii
. No other respiratory agent was found in the other patients, suggesting that the respiratory symptoms were caused by HCoV-NL63. All positive samples were collected during the last winter season, with a detection frequency of 7% in January 2003. None of the 306 samples collected in the spring and summer of 2003 contained HCoV-NL63 (
P
< 0.01 by two-tailed
t
test).
Complete genome analysis of HCoV-NL63
The genomes of coronaviruses have a characteristic organization. The 5′ two-thirds contain the 1a and 1b genes that encode the nonstructural polyproteins, followed by the genes encoding four structural proteins: spike (S), envelope (E), membrane (M) and nucleocapsid (N). The genomes of known coronaviruses contain a variable number of unique characteristic open reading frames (ORFs) encoding nonstructural proteins either between the 1b and S genes, between the S and E genes, between the M and N genes, or downstream of the N gene.
To determine whether the HCoV-NL63 genome organization shares these characteristics, we constructed a cDNA library with purified virus stock as input material. A total of 475 genome fragments were analyzed, with an average coverage of seven sequences per nucleotide. Specific PCR reactions were designed to fill in gaps and to sequence regions with low-quality sequence data. We combined this with 5′ and 3′ rapid amplification of cDNA ends to resolve the complete HCoV-NL63 genome sequence.
The RNA genome of HCoV-NL63 consists of 27,553 nucleotides and a poly-A tail. With a GC content of 34%, HCoV-NL63 has the lowest GC content among the
Coronaviridae
, which range from 37?42% (ref.
24
). ZCurve software was used to identify the ORFs
25
, and the genome configuration was portrayed using the similarity with known coronaviruses as a guide (
Fig. 4a
and
Supplementary Table 1
online). Short untranslated regions (UTRs) of 286 and 287 nucleotides are present at the 5′ and 3′ termini, respectively. The 1a and 1b genes encode the RNA polymerase and proteases that are essential for virus replication. A potential pseudoknot structure is present at position 12,439 (data not shown), which may provide the ?1 frameshift signal to translate the 1b polyprotein. Genes predicted to encode the S, E, M and N proteins are found in the 3′ part of the genome. The hemagglutinin-esterase gene, which is present in some group 2 coronaviruses, is not present in HCoV-NL63. ORF3, located between the S and E genes, probably encodes a single accessory nonstructural protein; this gene showed only limited similarity to ORF4A and ORF4B of HCoV-229E and ORF3 of porcine epidemic diarrhea virus (PEDV).
The 1a and 1ab polyproteins are translated from the genomic RNA, but the remaining viral proteins are translated from subgenomic mRNAs made by discontinuous transcription during negative strand synthesis
26
. Each subgenomic mRNA has a common 5′ end, derived from the 5′ portion of the genome (the 5′ leader sequence), and common 3′ coterminal parts. Discontinuous transcription requires base-pairing between
cis
-acting transcription regulatory sequences (TRSs), one located near the 5′ part of the viral genome (the leader TRS) and others located upstream of each of the respective ORFs (the body TRSs)
27
. The cDNA bank that we sequenced contained copies of the subgenomic mRNA for the N protein, thus providing the opportunity to exactly map the leader sequence that is fused to all subgenomic mRNAs. A leader of 72 nucleotides was identified at the 5′ UTR. Eleven of twelve nucleotides of the leader TRS (5′-UCUCAACUAAAC-3′) showed similarity with the body TRS upstream of the N gene. Putative TRSs were also identified upstream of the S, ORF3, E and M genes (
Supplementary Table 2
online).
We next aligned the sequence of HCoV-NL63 with the complete genomes of other coronaviruses. The percentage nucleotide identity was determined for each gene and is listed in
Table 2
. All genes except the M gene shared the highest identity with HCoV-229E. To confirm that HCoV-NL63 is a new member of the group 1 coronaviruses, we conducted phylogenetic analysis using the nucleotide sequence of the 1a, 1b, S, M and N genes (
Fig. 4b
). For each gene analyzed, HCoV-NL63 clustered with the group 1 coronaviruses. The 1a, 1b and S genes of HCoV-NL63 are most closely related to those of HCoV-229E. However, further inspection revealed a subcluster of HCoV-NL63, HCoV-229E and PEDV. Phylogenetic analysis could not be performed for the ORF3 and E genes because the regions were too variable or too small for analysis, respectively. Bootscan analysis by the Simplot software version 2.5 (ref.
28
) found no signs of recombination (data not shown).
The presence of a single nonstructural gene between the S and E genes is noteworthy because almost all coronaviruses have two or more ORFs in this region, with the exception of PEDV and HCoV-OC43 (ref.
29
,
30
). Perhaps most notable is a large insert of 537 nucleotides in the 5′ portion of the S gene of HcoV-NL63, as compared with that of HCoV-229E. A BLAST search found no similarity between this additional 179?amino acid domain of the S protein and any coronavirus or other sequence deposited in GenBank. An alignment of the HCoV-NL63 S protein sequence with those of other group 1 coronaviruses is shown in
Supplementary Figure 4
online.