Highly conserved genes expressed in aphid saliva are candidates for host plant adaptation in Aphis gossypii


 BackgroundAphids are major crop pests, most species attacking crops specialize on a narrow range of plant species from a single family. By contrast, Aphis gossypii is a highly polyphagous species, for which host races specializing on particular crops have been clearly described. Salivary components, which aphids inject into the phloem via their stylets, play a key role in establishing compatible interactions between plants and aphids, and are probably involved in specialization.ResultsWe used the extensive resources available for Myzus persicae and Acyrthosiphon pisum to identify putative salivary proteins expressed in Aphis gossypii, despite the lack of genomic resources for this species. In silico, we identified 51 putative salivary proteins; we focused on 17 genes with orthologs in at least one aphid species, assuming that some of the conserved genes expressed in salivary glands are involved in host specialization. We amplified and sequenced 10 coding sequences in full, from 17 clones of Aphis gossypii specialising on plants from Malvaceae, Cucurbitaceae or Solanaceae. We reconstructed the phylogenetic tree for these genes, on which we identified a clade corresponding to all clones specializing on cucurbits. Three of these genes were under positive selection.ConclusionsFull adaptation to a particular host plant may require a combination of alleles at quantitative trait loci in aphids. The three genes we identified could potentially be part of a cocktail of effectors manipulating the immune system of cucurbits and therefore responsible for A. gossypii specialization on that plant family.

Flavie Vanlerberghe interpreted the data, substantively revised the manuscript and approved the submitted version.

Abstract
Background Aphids are major crop pests, most species attacking crops specialize on a narrow range of plant species from a single family. By contrast, Aphis gossypii is a highly polyphagous species, for which host races specializing on particular crops have been clearly described. Salivary components, which aphids inject into the phloem via their stylets, play a key role in establishing compatible interactions between plants and aphids, and are probably involved in specialization.

Results
We used the extensive resources available for Myzus persicae and Acyrthosiphon pisum to identify putative salivary proteins expressed in Aphis gossypii, despite the lack of genomic resources for this species. In silico, we identified 51 putative salivary proteins; we focused on 17 genes with orthologs in at least one aphid species, assuming that some of the conserved genes expressed in salivary glands are involved in host specialization. We amplified and sequenced 10 coding sequences in full, from 17 clones of Aphis gossypii specialising on plants from Malvaceae, Cucurbitaceae or Solanaceae. We reconstructed the phylogenetic tree for these genes, on which we identified a clade corresponding to all clones specializing on cucurbits. Three of these genes were under positive selection.

Conclusions
Full adaptation to a particular host plant may require a combination of alleles at quantitative trait loci in aphids. The three genes we identified could potentially be part of a cocktail of effectors manipulating the immune system of cucurbits and therefore responsible for A. gossypii specialization on that plant family. 4

Background
The genetic basis of adaptation is a key topic in evolutionary biology. Specialization on different hosts is a very common adaptation of plant pests and pathogens. Such specialization has been studied in detail for a number of pathogens, including fungi, bacteria, viruses and protists [2]. Several genes have been shown to be evolving rapidly in Microbotryum fungal pathogens specializing on different hosts [3]. By contrast, the genetic basis of insect pest adaptation to host plants has received much less attention, possibly due to the difficulties involved in developing genomic resources for insects [4].
Hemipterans and especially aphids are plant pests that can be considered to resemble pathogens in several ways. Indeed, they penetrate the plant tissues discreetly, until they reach the phloem the phloem, and the plant response to this penetration overlaps substantially with the responses mounted against microbial pathogens [5]. Four thousand aphid species have been identified, mostly in temperate regions, where they colonize 25% of existing plant species [6]. Many aphid species are restricted to one or a few host plants. Others are polyphagous, but differentiated host races within apparently polyphagous species are frequently identified, suggesting an ongoing process of speciation by adaptation to different host plants in several groups [7]. In the pea aphid Acyrthosiphon pisum, which specializes on several plant species from the Fabaceae, a genome scan survey highlighted 11 loci displaying genetic differentiation not consistent with a hypothesis of neutral evolution that might be responsible for host plant specialization in this aphid species [8].
Aphids have highly modified mouthparts, enclosing two flexible needle-like canals (stylets), which they use to feed. The alimentary canal is used to ingest plant fluids, whereas the salivary canal is used to deliver saliva along the path of the stylet, up to the feeding site. The aphid secretes a small amount of gelling saliva on the plant surface before stylet insertion. The stylets penetrate the plant epidermis at the border between two cells and they follow a pathway passing between the fibers of the secondary cell wall of one of these cells [9]. The stylets continue to produce gelling saliva whilst tracing an intercellular pathway through the mesophyll to the phloem. Along the way, the stylets very briefly puncture numerous cells, into which they inject a watery saliva [9]. The role of aphid saliva has been studied in detail, particularly in terms of its capacity to modulate plant defense responses [10][11][12].
Aphid saliva contains a number of proteins, including oxidases, proteases and pectinases (see, for example, [13][14][15][16][17][18][19][20]). Some of the proteins present in aphid saliva prevent the clogging of vessels with plant phloem proteins, a naturally occurring calcium-triggered response to injury [9,21]. The C002 protein, which has been shown to play an important role in the survival of the pea aphid on its host plant, bears no similarity to any protein outside the family Aphididae [22]. By contrast, a protein known to be conserved among parasites, a macrophage migration inhibitory factor secreted in aphid saliva, inhibits the plant immune response [23].
Aphis gossypii, the cotton-melon aphid, has a worldwide distribution and has been observed on more than a hundred different types of host plant [24]. Aphis gossypii populations display a host plant-based genetic structure, whatever their geographic origin [25]. Thomas et al., in their 2012 study, highlighted host plant selection in natura by demonstrating a large decrease in the genetic diversity in aphid populations during melon infestation. The spring-migrant populations visiting melon crops consist of alate aphids from several genetic clusters, whereas the populations infesting melon, which consist of their apterous offspring, belong exclusively to the cucurbit genetic clusters. This pattern strongly suggests that these highly specialized aphids can counteract cucurbit defenses.
We investigated the genetic basis of this host adaptation, with the following strategy: (i) we established a list of putative salivary candidate proteins in A. gossypii, making use of the salivary cDNA sequences available for two aphid species, Acyrthosiphon pisum and Myzus persicae ; (ii) we focused on orthologous candidates, due to the high degree of similarity in salivary function between members of the aphid family [13]; (iii) we studied the variability of these candidates in a collection of 17 A. gossypii clones for which we characterized host-plant adaptation.

Bioinformatic identification of salivary candidate genes in Aphis gossypii
Identification of candidate salivary proteins in M. persicae We used sequences from salivary gland libraries or that had been identified as corresponding to salivary proteins [26,27]. Expressed sequence tag (EST) data were retrieved from the NCBI EST database, with the accession numbers reported in previous studies [20,26,27]. We excluded redundant accessions, and then removed identical ESTs from further analyses by local reciprocal blastn searches (Score≥100, e-value≤1 x 10 -5 , coverage≥50% of the length of at least one sequence) [28]. The remaining ESTs were locally blasted against M. persicae coding sequences (CDS) available from Aphidbase 2.1 (blastn, Score≥100, e-value≤1 x 10 -5 , https://bipaa.genouest.org/is/aphidbase/). We then removed redundant CDS accessions from the list of CDSs identified.

Search for salivary candidate proteins in A. pisum
We used sequences from salivary gland libraries or that were reported to encode salivary proteins [16,18,27]. Sequences associated with the NCBI EST accession numbers reported in a previous study [27] were retrieved and locally blasted against A. pisum CDSs version 2, downloaded from AphidBase 2.1.
The ACYPI accessions reported in various studies correspond to the first released version of the A. pisum genome annotation. As some ACYPI accessions were absent from the second version of this annotation, the corresponding sequences from the first annotation were locally blasted against the second (tblastx, Score≥100, e-value≤1 x 10 -5 [28]). Additional sequences were then retrieved with new ACYPI accession numbers. Sequences corresponding to identical accessions were removed.
Putative candidate sequences common to M. persicae and A. pisum The M. persicae and A. pisum sequences identified as described above were locally blasted against each other (tblastx, Score≥100, e-value≤1 x 10 -5 , coverage≥50% of the length of at least one sequence) [28] to retrieve candidates common to both species.

Searches for similarity among A. gossypii CDSs
The putative candidate sequences common to M. persicae and A. pisum were used in local reciprocal tblastx approaches (Score≥100, e-value≤1 x 10 -5 , coverage≥50% of the length of at least one sequence) with 15810 CDSs from A. gossypii [1,28]. The A. gossypii candidates identified by screening with sequences from the other two aphid species were compared and candidate common to the different species were retrieved for further analysis after the removal of redundant CDSs.

Identification of orthologs
The shared A. gossypii candidates were used as queries in tblastx approaches (Score≥100, e-value≤1 x 10 -5 , coverage≥50% of the length of at least one sequence) [28] against non-redundant aphid sequences from 10 species with the blast tool of NCBI, or locally against data downloaded from AphidBase 2.1 or from a previous study [1]. Phylogenies were constructed for each candidate and sequences similar to it, by applying two methods to the protein data. Sequences were aligned with ClustalW [29], with gaps managed by "pairwise-deletion", and phylogenies were reconstructed with pdistance for the substitution model and by neighbor-joining for the reconstruction method, with MEGA5 [30]. The second reconstruction was performed on the phylogeny.fr website [31]. Sequences were aligned with MUSCLE [32]. No curation of the alignment was permitted. Phylogenies were then reconstructed with PhyML as the reconstruction method, with the other options set as "default parameters" [33]. Reconstruction methods were optimized if the tree topologies were discordant.
Maximum-likelihood analyses were performed in MEGA5 to identify the best-fitting substitution model [30]. Two new phylogenies were reconstructed as described above, but with the best-fit model for protein substitution. Orthologous relationships were identified on the basis of comparisons between the candidate tree topologies and the putative "aphid tree" (Figure 1). Putative candidates with at least one ortholog previously identified in a salivary gland cDNA library and presenting an excretion signal, or identified as encoding a salivary protein were chosen for further analyses.

Candidate sequence corrections, annotations and signal peptide predictions
A. gossypii sequences were corrected and extended based on available A. gossypii ESTs identified and downloaded from AphidBase 2.1 by blastn approaches [28]. When extension at the 3' or 5' end of the sequence was not possible, alignments with orthologs were generated with ClustalW2 [29]. Candidate sequences were extended on the basis of an orthologous sequence if the percentage identity between the two sequences exceeded 99%.
The corrected and extended sequences were annotated with Blast2GO software version 2.6.5 [34].
Briefly, after blastx on the nr database (e-value≤1.10 -3 ), mapping was performed to link BLAST hits to functional information (Gene Ontology -GO-terms). Annotation was performed and extended with the Annotation Expander tool. GO InterProScan terms were then merged with the annotations.
We investigated the presence and location of signal peptide cleavage sites in the candidate sequences with SignalP 4.1 Server [35]. The subcellular distribution of the candidate proteins and the presence of transmembrane domains were predicted with TargetP1.1 and TMHMM1.0, respectively [36,37].

Rearing and characterization of aphid clones
Seventeen A. gossypii clones, collected from various species from the Cucurbitaceae, Malvaceae or Solanaceae, in many areas of the world, between 1978 and 2012, were maintained on melon (Cucumis melo) for clones collected from cucurbits, okra (Abelmoschus esculentus) for clones collected on Malvaceae, eggplant (Solanum melongena) for clones collected on Solanum sp., and Capsicum annuum for clones collected from this species ( Table 1). The clones were reared in insectproof chambers, with an eight-hour night at 18°C and a 16-hour day at 24°C.

Adaptation of aphid clones to plant families
For each clone, we collected seven five-to seven-day-old apterous aphids from this rearing, which we transferred onto a young plantlet of an alternative host. The presence of nymphs and new adults was scored 12 to 15 days later (no, few, yes). At least three replicates were performed for each transfer.
Characterization of aphid clones on the basis of mitochondrial and nuclear SSR sequences The DNA sequences for the mitochondrial gene encoding cytochrome b and the barcode region of cytochrome oxidase I (COI) distinguished unambiguously between A. gossypii and its sister species A.
frangulae [38]. We extracted DNA from 17 aphid clones with a 5% (w/v) Chelex resin solution, as previously described [39]. The sequences of cytochrome b and COI (749 and 661 bases, respectively) were obtained for the 17 clones, as described by , and were aligned with an Aphis frangulae sequence [38]. The two sequences were concatenated for each clone, to build a phylogenetic tree. The best model for invertebrate mitochondrial DNA was identified for construction of the phylogeny tree with the Mega X platform [40].
We amplified eight microsatellite loci specific to the A. gossypii genome [41] in two PCRs, as previously described [38]. We determined the size of the allele at each locus by comparison with a molecular size standard, with GeneMapper v3.7 software (Applied Biosystems, Foster City, California, USA), and a multilocus genotype (MLG) was assigned to each aphid. The 17 MLGs were assigned to genetic clusters by comparison with 44 MLGs known to represent host race structuring in A. gossypii [38], nine of which were common. Mitochondrial DNA analysis suggested that one clone belonged to the sister species were subjected to Bayesian clustering [42], using an admixture model with a burn-in of 500,000 iterations and a subsequent Markov Chain of 250,000 iterations. For each putative number of clusters (K, ranging from 1 to 12), we compared 10 replicate runs, to assess the consistency of the estimated values. We used the Evanno method to determine the most likely number of genetic clusters [43]. For this number of clusters, we then performed one run of the admixture model with a burn-in of 500,000 iterations and a Markov Chain of 1,000,000 iterations. We checked the quality of the cDNA produced by subjecting 1 µL of the product of the reverse transcription reaction to electrophoresis in a 1% agarose gel stained with ethidium bromide.

Heterozygosity and detection of polymorphisms
Chromatograms were analyzed with Mixed Sequence Reader software (MSR) [45], which created nucleotide sequences with heterozygous sites coded according to the IUPAC nomenclature to avoid bias when building the phylogenetic tree [46]. The log ratio of intensity value was set to 1. The protein sequences corresponding to each gene were aligned with MUSCLE [32].

Phylogeny reconstruction and positive selection fingerprint
Sequences beginning with an ATG and containing equal numbers of base triplets were concatenated to construct a megaCDS for each A. gossypii clone. Phylogenetic trees were reconstructed from this megaCDS with the GTR + Γ model and parameters specific for each gene calculated with RAXML [47].
The action of positive selection on the megaCDS was investigated with the Datamonkey suite of web tools [48], by fast unconstrained Bayesian approximation (FUBAR) [49], to ensure robustness against model misspecification by averaging over a large number of predefined site classes.

Identification and annotation of conserved salivary genes in A. gossypii
The complete strategy for establishing a list of candidate genes putatively expressed in the salivary The CDS was complete for only seven of these 17 candidates. For five of the remaining 10 sequences, we were able to complete the CDS sequence using EST sequences from A. gossypii and orthologous sequences (Additional file 6).
The 12 full-length sequences from A. gossypii were annotated with Blast2GO software, using the accession number of their A. pisum ortholog ( Table 2, Additional file 7). Six sequences (CL1251, TCL2_2311, TCL5_3354, TCL6_1068, TCL6_147 and TCL6_4857) were unknown. CL373 was annotated as "γ-interferon-inducible lysosomal thiol reductase" and TCL5_430 was annotated as a "CD63 antigen". Two other sequences, TCL2_1813 and TCL5_4105, were identified as corresponding to cuticle proteins, both involved in molecular functions, particularly in structural molecule activity. TCL5_3248 and TCL2_1247 were identified as RAS-related proteins, and both sequences were implicated in biological processes (level 2), including cellular processes, response to stimulus, localization, signaling, and metabolic (TCL5_3248) and developmental processes (TCL2_1247). Both were also involved in molecular function (level 2), including binding and catalytic activity.

Biological and genetic characterization of a set of 17 A. gossypii clones
The host plant and country from which each clone was collected are listed in Table 1 for the 17 A.
gossypii clones studied here. Fifteen clones survived the entire study: seven collected on cucurbits and reared on melon, five collected on cotton or Hibiscus syriacus and reared on okra, two collected on potato and eggplant and reared on eggplant and one collected and reared on sweet pepper. All the clones were able to colonize okra, whatever the original host plant and the host plant used for rearing in the laboratory. By contrast, the clones collected/reared on cucurbits were unable to colonize eggplant or sweet pepper). The clones collected on plants from the Malvaceae were unable to colonize melon, but some were able to colonize eggplant, in an erratic manner. The clones collected/reared on Solanum sp. and the clone collected on Capsicum were unable to colonize melon. Thus, okra could serve as the host plant for any clone, whereas melon could be used only for clones collected on cucurbits.
According to mitochondrial DNA sequences encoding cytochrome b and COI, all clones except GEL6 belonged to the A. gossypii species (Figure 3). GEL6 was assigned to the A. frangulae clade. For two other CDSs, CL373 and TCL_4857, polymorphisms were observed in several clones, but a putatively specific allele was identified in GEL6.
The phylogenetic tree reconstructed from the megaCDS had a remarkable topology, with one clade grouping together specifically the clones collected on cucurbits, including GEL6, and supported by a high bootstrap value (> 0.85) ( Figure 5).
We then investigated the positive (diversifying) selection and negative (conservative) selection acting on each site of the megaCDS (Table 3). Positive selection was detected for sites in three CDSs, TCL5_3354, TCL6_147 and TCL6_4857. Posterior probabilities exceeded 0.9 for one site in TCL5_3354 and two sites in TCL6_4857. Five more sites in TCL6_4857 and one site in TCL6_147 had probabilities of positive selection exceeding 0.8. All other had a probability of positive selection below 0.5. Negative selection was detected for sites in five CDSs, CL1251, CL363, TCL2_2311, TCL5_3354 and TCL6_4857. .
Posterior probabilities exceeded 0.9 for one site in CL363, TCL2_2311 and TCL5_3354, three in CL1251, and four in TCL6_4857. All other sites had probabilities of negative selection below 0.5.

Discussion
Host-plant specialization is related to the ability of pathogens to manipulate host-plant responses to facilitate their attack. Pathogens encounter an active immune system in the host plant. Two main types of immunity have been described in plants, both locking out the microorganisms trying to invade the tissues [50]. The first type of immunity in plants is PAMPS-triggered immunity (PTI), which is induced by pathogen-associated molecular patterns (PAMPs) detected in the extracellular space. Organisms able to infect host plants deploying this type of immunity release a core effector into the intracellular space to block PTI. However, some of these effectors are recognized by plant resistance genes, triggering a second type of immunity known as effector-triggered immunity (ETI). In 2011, hemipterans were shown to be among the organisms triggering PTI, and aphid saliva has since been considered to contain the equivalent of PAMPs and effectors [10,51]. Even aphid extracts can trigger PTI [52]. In 2013, hemipterans were proposed to trigger ETI [53], and it was suggested that the saliva introduced into plant cells during furtive puncturing contains effectors interacting with plant resistance genes. The aphid Aphis gossypii triggers ETI on melon plants harboring a specific resistance gene [54]. These findings suggest that the PTI/ETI framework is relevant for A. gossypii.
A. gossypii is a highly polyphagous species with clones specializing on particular plant species [25]. We made use of the various A. gossypii clones sampled on various host plants and belonging to different MLGs available to confirm, in the laboratory, that okra is an universal host and to demonstrate that the observed specialization on cucurbits is biologically clearcut. None of the clones collected from cucurbits could colonize any of the solanaceous hosts tested. None of the clones collected from plants from Malvaceae or Solanaceae could colonize melon. The GEL6 clone, in particular, highlights this point. The GEL6 clone had an A. frangulae mitochondrial genome and a 'chimeric' A. frangulae/A. gossypii nuclear genome, as shown by its alleles for the SSR markers and CDSs characterized here. GEL6 was collected from cucurbits and observed on melon crops at different sites in France [55]. It had the biological features of clones specializing on cucurbits.
Many winged clones of Aphis gossypii visit cucurbit fields during spring, but few are able to colonize cucurbits [56]. These aphids clustered mostly in a single genetic group (dark gray in Figure 5), but this group also contained clones collected on Hibiscus syriacus. Some clones specializing on cucurbits displayed admixture (e.g. GWD or GEL6), or belonged to another genetic group (e.g. NM1). Aphis gossypii clones able to infect cucurbits may deploy a core effector that attacks PTI during exploratory puncture. We therefore investigated the diversity of salivary proteins in a set of 17 Aphis gossypii clones. There is currently no reference genome or transcriptome available for A. gossypii. We therefore adopted an in silico approach. By using salivary data for A. pisum and M. persicae, we established a list of 51 CDSs putatively encoding proteins present in A. gossypii saliva. Most of these loci encode odorant and gustatory receptor genes, suggesting a role for such receptors in the host plant specialization and speciation of the pea aphid [57].
We focused on CDSs encoded by orthologous genes in aphid species that had been confirmed to encode a salivary protein in at least one aphid species. This reduced the list to 17 CDSs, only 10 of which could be fully sequenced in all 17 aphid clones. These CDSs were short, consistent with findings for most of the effectors fully characterized in fungi and oomycetes [58]. We reconstructed a phylogenetic tree from the ten concatenated CDSs. The resolved tree highlighted differences in evolutionary history between the megaCDS for clones collected from cucurbits and that of those for clones collected from other plants, such as Capsicum, Solanum sp. or members of the Malvaceae.
Inferences regarding selection may provide important functional information [59] and findings for selection at megagene level are consistent with those for the positive selection of each of the genes considered separately [2]. We therefore investigated whether positive or negative selection acted on the 10 CDSs, by analyzing the megaCDS. Positive selection was detected at several sites of TCL5_3354, TCL6_4857 and TCL6_147. Signal peptide cleavage sites were detected in these three genes. We reconstructed a phylogenetic tree from the three concatenated CDSs, which again contained a separate clade corresponding to all the clones collected on cucurbits.
This suggests that the genes expressed in Aphis gossypii gland salivary may encode proteins with an effector function for attacking PTI in cucurbits. A role for TCL6_4857 in host plant family selection by aphids is supported by the characterization of its ortholog in A. pisum, ACYPI000472, a candidate protein responsible for host-plant specialization in pea aphid. Indeed, the gene encoding this protein was located in the environment of an outlier locus displaying a higher level of genetic differentiation between races than would be expected under a neutral evolution model [8].
Our knowledge of the molecular functions of fungal and oomycete effectors is increasing, but we still know very little about these aspects in aphids and, unfortunately, no function has yet been described for the three genes highlighted here. In a recent large genomics study investigating effectors in A.
pisum [60], only 10 single-copy genes of the 3603 genes expressed in saliva had already been characterized in terms of their role in the plant/insect interaction. Salivary proteins, or other effectors triggering plant defenses, such as GroEL, only slightly reduced fecundity or survival when expressed one-by-one in planta [27,51]. This suggests that a cocktail of effectors is required for the full expression of immunity. Conversely, a macrophage migration inhibitory factor (MIF) highly conserved among parasites and secreted in aphid saliva has been shown to manipulate the plant immune system [23] and its downregulation decreases the survival and fecundity of aphids. Full adaptation to a particular host plant may require a combination of alleles at quantitative trait loci in aphids, as suggested by mapping analyses of host acceptance and performance in two North American A. pisum host races, one specializing on alfalfa and the other on clover [61]. The three genes identified here may be part of this cocktail in A. gossypii.

Abbreviations
EST expressed sequence tag CDS CoDing Sequence cDNA Complementary DNA nr database no redundant database SSR Simple sequence repeats COI cytochrome oxidase I MLG multilocus genotype IUPAC International Union of Pure and Applied Chemistry PCR Polymerase chain reaction Bp base pair SNP single-nucleotide polymorphism PAMPs pathogen-associated molecular patterns ETI effector-triggered immunity  The cladogram was reconstructed based on [1,[62][63][64].  The tree was reconstructed by the maximum likelihood method with the Hasegawa-Kishino-Yano model [65]. Branches corresponding to partitions reproduced in less than 50% of bootstrap replicates (500) were collapsed. Evolutionary analyses were conducted on the MEGA Platform [40].  The tree was reconstructed after the concatenation of 10 CDSs with RAXML [47], using the GTR + Γ model with parameters specific for each gene. One clade contained all clones collected on cucurbits (C4, C6, GWD2, GWD, NM1, CUC1, GEL6 and C9).