Genome-wide identification of candidate genes associated with β-glucan trait in a hulled and hulless barley ( Hordeum vulgare L.) population

Background : To date, excavating the genomic sites based on the whole barley assembled genome sequences associated with β-glucan trait remains blank, resulting in omission of several genetic loci or existence of false positive. Results : Here, of 87 barley resources (39 hulled and 48 hulless) collected from China, the β-glucan content ranged from 2.07% to 6.56%. GWAS (genome-wide association study) was carried out using 191,098 SNP markers by RRGS (Reduced-Representation Genome Sequencing) platform in FarmCPU model excuted in rMVP R package. Four subpopulations (POP1POP4) were identified. GWAS showed one SNP, in chromosome 7 (chr7), negatively mapped and for two genes HORVU7Hr1G000320 , HORVU7Hr1G000040 , which are from nucleotide triphosphates hydrolases superfamily; previous study had proved the superfamily could inhibit β-glucan synthase activities. GWAS also showed that five SNPs for β-glucan trait were separately mapped in chr2, chr3, chr4, chr5 and chr6 non-coding regions and one SNP mapped for HORVU1Hr1G000010 in chr1, which is for plant sugar response. Conclusions : Through the highest density of SNP markers to date, this study identifys several SNP genetic loci with high degree of confidence for β-glucan trait, and has a certain value for β-glucan regulatory mechanism research in furture.


Introduction
Barley is the world's fourth largest planted crop and is one of the model crop plants in genetics and molecular biology research (Purugganan and Fuller, 2009;Dawson et al., 2015). The consumption with those barley genotypes that are rich in β-glucan component could be benefit for the lower plasma cholesterol, blood sugar control, betweer immunomodulatory (Skau-Mikkelsen et al., 2014); the β-glucan was consistent by 3 continuous β-(1,4)-glycosidic bonds and is the grain cell wall's main component (Li et al., 2008). In barley seeds, the β-glucan content distributes from 2-7%, which was far higher than that of rice, corn, wheat and oat (Collins et al., 2010;Andersson et al., 2008).
Considering the barley β-glucan has a special broad prospect for meeting the requirements of optimizing the people's dietary structure (Skau-Mikkelsen et al., 2014;Agostini et al., 2015), the molecular regulation mechanism of higher β-glucan accumulation in barley needs to keep groping, such as the genomic regulatory sites identification.
Previous research has verified the classical QTL (Quantitative Trait Locus) mapping strategy is suitablt for identifying genomic loci for quantitative traits in several model cereals (Paterson et al., 2005). Of the β-glucan trait, Han et al. (2004) identified several major QTL loci that contributed to β-glucan trait assigned to chr2 and chr1 using a 150 lines double haploid (DH) population by a hybid combination of 'Steptoe' × 'Morex'. A similar work was carried out also based on the DH population by Molina et al. (2007).
What's more, Igartua et al. (2014) mapped a major QTL for β-glucan trait in chr7 near centromere region in a recombinant inbred line population produced by 'Derkado' × 'B83' hybridization. Kim et al. (2011) identified several QTLs for β-glucan trait on chr7 in an entire single-grain population obtained from 'Yonezawa Mochi' × 'Neulssalbori' hybridization. Above all, these β-glucan related genes/QTLs mapped through the study of barley or highland barley populations/mutants could be benefit for new determining the βglucan-associated candidate genes.
Also, the candidate genes controlling β-glucan metabolism have been investigated. Burton et al. (2006) speculated that the expression of cellulase (HvCsl) gene family is the main factor affecting the seeds β-glucan content, especially for the HvCslF6 gene. Considering the importance of HvCslF6 in regulating seeds β-glucan content, many researchers tried to develop linked molecular markers (i.e. functional markers) by focusing on the the HvCslF6 gene aimed to accelerate the breeding, utilization of target β-glucan trait. Of these studies, Taketa et al. (2012) used full-length primers to analyze the HvCslF6 genome sequences in 29 barley varieties, and sequences blast results identified 30 polymorphic sites based on temple cDNA sequence of HvCslF6 gene (cv. Morex, GenBank accession: EU267181). In addition, Cory et al. (2012) cloned a series of HvCslF6 gene sequences from barley varieties with different β-glucan content by chromosome-walking method, with identifying four SNPs located in HvCslF6 exon regions.
Although QTL mapping assists to identify of genomic loci for quantitative traits, there was still existing several inherent disadvantages, which would resulte in interfering the contributive effects. First of all, QTL mapping requires a specific mapping population constructed by two parents, which can only calculate the different allelic diversity from the parent hybrids (Yang et al., 2012). However, GWAS (Genome-wide association analysis) based on nature populations is an alternative strategy comparing to QTL mapping. Through exploring the relationship between marker(s) and trait(s), GWAS could investigate the degree of LD (linkage disequilibrium) between markers associated with the functional polymorphisms across diverse germplasm resources, with high-resolution gentic mapping, reduced research time, and greater allele numbers (Lee et al., 2017). Many researches have confirmed that GWAS helps breeders use potentially powerful and rapid breeding programs to screen large germplasms and complete crop improvement (Zhou et al., 2016;Maniatis et al., 2007;Shi et al., 2017;Godoy et al., 2018).
So far, genome-wide association analysis based on SNPs associated with β-glucan trait has been studied in several studies (Shu and Rasmussen., 2014;Chutimanitsakun et al., 2013;Mohammadi et al., 2014;Houston et al., 2014;Gyawali et al., 2017). The GWAS analysis method provided a new idea for the discovery and utilization of β-glucan trait functional 5 candidate genes and their associated SNPs in barley. However, the latest whole genome sequences of barley was available in 2017 (Mascher et al., 2017), to date, the β-glucan, an important agronomic trait in barley, has not been reported by genome-wide association analysis using a large number of SNPs based on the whole genome sequences, which could restrict the confidence interval of QTL position, it is inevitable that some important functional genes cannot be identified.
To further detailedly understand the genomic sites underlying contributes to β-glucan trait for breeding of excellent barley varieties, in this study, we firstly calculated the β-glucan content in 87 barley natural resources; Secondly, the whole genome sequences of the barley population was sequenced based on RRGS platform with obtaing a large number of SNPs distributed in the whole genome, as well as investigating the evolution event, genetic structure of the population; Finally, GWAS for β-glucan trait was carried out in Agricultural Academy, and professors of academy are allowed to use the land for research work. Furthermore, we confirmed that the field studies did not involve endangered or 6 protected species. The seeds from each genotype were collected for the analysis of βglucan.
β-glucan assessment β-glucan measured was according to the manual of Megazyme kit produced by BIOSTEST Co., Ltd (China, cat. no. K-BGLU). Firstly, the barley seeds were crushed by a grinder and followed by weighing 0.5 g samples in a 1.5 ml PE tube. Then, add 1.0 ml aqueous ethanol (50% v/v) and 5.0 ml sodium phosphate buffer (20 mM, pH 6.5) to each tube, with stiring in a whirlpool mixer and incubating in boiling water bath for 2 min, followed by cooling the tubes to 40 ℃. Secondly, each tube was then added 0.2 mL lichenase (10 U) followed by incubating at 40 ℃ for 1 hour followed by adding distilled water and adjusting the solution volume of each tube to 30.0 mL, with centrifugating for 1000 g and 10 min.
Thirdly, the 0.1 mL supernatant liquid was carefully transferred into three tubes followed by adding 0.1 mL sodium acetate buffer (50 mm, pH 4.0) to one of the test tubes (reaction blank), and adding 0.1 mL β-glucosidase (0.2 U) to the left two tubes (reaction) followed by incubating at 40 ℃ for 15 min. Lastly, the 3.0 mL GOPOD reagent was added to each tube followed by incubating at 40 ℃ for 20 min. The absorbance values of each reaction tube (EA) and reaction blank tube (EBI) were determined at 510 nm by Ultraviolet spectrophotometer (UV-2401PC).

SNP molecular markers acquisition
The genomic DNA extraction kit produced by Shanghai Shenggong Bioengineering Co., Ltd.
(China, ca. no. B518261) was used to extract DNA from the fresh leaves of 87 barley varieties. The DNA purity and integrity of DNA were tested by 1.5% agarose gel and the concentration and quality were tested by Nanodrop @ 2000c Spectrophotometer (Thermo Scientific Co., Ltd, Barrington, IL, USA). The RRGS method procedures (Qi et al., 2018) were used to develop SNP molecular markers for 87 qualified barley genomic DNA, with using the recent published barley genome sequence as the reference genome (Mascher et al., 2017). For the step of restriction endonuclease digestion, it was prediction by the combination of MseI/TaqI restriction endonuclease, and the fragment sizes obtained rang from 500 to 600 bp.
Followed by the digestion step, the ends of the enzyme-digested fragments were modified by T4 ligase (Takara Biotechnology [Dalian] Co., Ltd) for adding joint and barcode. After using the magnetic beads (MBs) (Thermo Scientific Co., Ltd) to recover fragments, the recovered fragments were amplified by PCR using PrimeSTAR HS DNA Polymerase (Takara Biotechnology [Dalian] Co., Ltd) and using Qubit (Thermo Scientific Co., Ltd) to determine the concentration of PCR products, with the final concentration should be more than 5 ng/ul. Finally, the samples were sequenced using the Illumina Hiseq 2500 plantform with paired-end reads.
The clean reads was obtained by data quality control from raw reads referred to the analysis procedures (Qi et al., 2018). Those clean reads with the highest depth in each restriction fragment were selected as the temple sequence for analyzing by using the BWA software (Li et al., 2009), followed by aligning the sequenced clean reads against the reference barley genome (Maher et al., 2017) for obtaining the location attribution information. Finally, SNPs were indentified through using the GATK software (Mckenna et al., 2010) with high evidence at the 2 X depth, 0 deletion rate and 5% minor allele frequencies (MAF).

Population evolution, structure and PCA analysis
The NJ (Neighbor-joining) tree was constructed according to the Nei-Gojobori method by using the MEGA 2.0 program (Bootstrap value was set as 1,000) based on the SNP markers identified above. The population sub-structure of the 87 barley varieties was analyzed by 8 using the Admixture mapping software, and the cluster number (K value) of the population was assumed to 1 10. The clustering results are cross-verified, and the optimal number of groups is determined according to the valley value of the cross-validation (CV) error rate (Alexander et al., 2009). The PCA (principal components analysis) was carried out by rMVP R package (https://github.com/XiaoleiLiuBio/rMVP).

Association analysis and functional annotation of trait-associated SNPs
The SNP density in seven chromosomes was constructed by the rMVP R package. The Bonferroni Multiple Correction method based on FarmCPU model excuted in rMVP R package was applied for mapping the trait-associated SNPs by the screening threshold at 0.05 level. The Quantile-Quantile scatter map was constructed by GGplot2 and QQman software, and the threshold set for Manhattan (Manhattan) mapping was at the -Log (0.01/marker numbers) level for significant SNP loci screened. 'R' represents the effects of SNP associated with target trait and the greater the absolute value, the more effects represented; '-' means negative regulation while '+' means positive regulation. Gene models annotation of SNPs associated were set at the range of 5 Mb in upstream/downstream of the trait-associated SNPs, which was referred to the EnsemblPlants website (https://plants.ensembl.org/index.html).

Data statistic and analysis
Data statistics and charting were used by Windows. Excel 2010. Univariate significant ANOVA and the box plot were constructed by SPSS 24.0 software. '*' represents the significant level (P < 0.05), and '**' representes the high significant level (P < 0.01).

β-glucan trait and SNP density determination
In the collected population, the content of β-glucan was from 2.07-6.56%, the average content was 4.09% and the standard deviation was 0.77. The Shapiro-Wilk value used to detect normal distribution was 0.097, which was over 0.05 and indicated the content of βglucan trait in this population was in accordance with the normal distribution (Fig. 1)

Association mapping analysis
In Manhattan-plot, each point on the graph represents one SNP locus, the (-log 10 P − value ) for all 191,098 SNP loci formed the Y-coordinate. Through the FarmCPU model in GWAS analysis, totally 7 SNP loci were significantly associated with β-glucan trait; one chromosome obtained one SNP locus and all of them were located on the distal end of each chromosome (Fig. 5). In addition, the QQ diagram of GWAS analysis also showed the same seven genomic sites (i.e., SNPs) are eligible (Fig. 6). The points in the circle of the QQ-plot diagram were potential candidate genomic sites significantly related to β-glucan trait, which were divided into the 'Observation P-value' and the 'Expected P-value'. Those genomic sites (i.e. loci) with the 'Observed P-value' exceeded the 'Expected P-value' means they were significantly associated with β-glucan trait (Fig. 6).

Discussion
In this study, genome-wide association analysis (GWAS) was used to identified the barley whole genome-wide genetic sites associated with β-glucan trait by using 191,098 SNPs markers (i.e., the largest amount of SNP markers by far). Through using FarmCPU model excuted in rMVP R package, with the PCA parameters as the covariable, which could effectively control false positives (Stich et al., 2009;Park et al., 2019), seven highly significant SNPs (P < 4.33 × 10 − 7 ) were clearly and intuitively identified in manhattan-plot diagram, also, the association results were meanwhile confirmed by QQ-plot diagram.
The high content of β-glucan in barley is one of the important standards for its edible value, and it is meaningful to apply the strategy of molecular marker-assisted breeding in the traditional breeding. Here, we focused on the rapid identification of highly accurate variation markers (SNPs) in a population of barley natural germplasm resources by using the RRGS method. First of all, for GWAS applied in a species, the genome LD levels determines the density of markers needed. Crop barley was suitable for GWAS, as comparing to the outbred species, the LD levels in barley (inbred species) are higher and attenuate slower (Plagnol et al., 2006). For example, as a cross-pollinating plant, the LD attenuation of maize was significantly faster than that of Arabidopsis thaliana, rice and other crops (Tenaillon et al., 2001). In our study, the seven SNPs identified separately in seven chromosomes didn't have any significant LD relationships.
The GWAS technology based on RRGS had been applied in several crop traits analysis (Ma et al., 2016;Xu et al., 2014). However, as far as we know, in the important crop barley, the GWAS study by using SNPs developed from the whole genome sequences related to the β-glucan trait remains unknown, even through β-glucan has been confirmed to be recognized as a very important food health ingredient. Only some genetic map-based association analysis based on parental narrow resources (Han et al., 2004;Molina et al., 2007;Igartua et al., 2014), as well as some GWAS analysis based on limited number of SNPs markers (Shu and Rasmussen., 2014;Chutimanitsakun et al., 2013;Mohammadi et al., 2014) were carried out. In this study, through using the RRGS technology, we identified 7 high significantly SNP markers from 191,098 SNP markers, each has an effect on the β-glucan trait. In addition, of the barley population analyzed, a mix of hulled and hulless barley accessions was selected. It is generally believed that the quality of functional nutritional components of naked(i.e., hulless) barley is higher than that of hulled barley, including β-glucan trait (Oscarsson et al., 1997), our results confirmed that the content of β-glucan in naked barley of POP2 (hulless) was higher than of POP1 (hulled) and POP3 (hulless and hulled). Also, we found that, of the POP2 (hulless) barley (from northwest China), β-glucan content was higher than that of POP4 (from southeast of China) and the content of β-glucan in POP4 was also the least in four POPs, we concluded that the β-glucan trait was also affected by environmental effects, as several previous studies showed (Zhang et al., 2001;Yalcin et al., 2007).
Based on the SNPs developed from whole barley genome sequences from GWAS analysis, here, we finally quickly identify functional genes associated with β-glucan trait.
Previously, the genome-wide association analysis technique was used for the first time in plant sugarbeet (Beta vulgais L.), with the help of this technique, the bolting gene B was identified, which determines the bolting purification of sugarbeet (Hansen et al., 2001).
The method of GWAS analysis has the advantages of high throughput, high efficiency, no 13 candidate genes involved, no need to construct any assumptions, no dependence on mapping populations, etc, which have helped identify many loci/or alleles associated with traits in the face of a wide range of germplasm resources. In this study, the gene SIS3(AT3G47990) we identified, which is the homologous gene involved in plant sugar response in Arabidopsis (Huang et al., 2010), and it's a good candidate gene for further transgene confirmation; What'more, BillonGrand et al. (1997) had proved the nucleotide triphosphates could inhibit β-glucan synthase activities, considering the SNP7 had a high negative association with β-glucan trait (i.e., negative to HORVU7Hr1G000320, HORVU7Hr1G000040), and we concluded that SNP7 might be a positively active genomic site for β-glucan synthesis. In addition, the SNPs in the upstream of genes HORVU1Hr1G000020, HORVU5Hr1G000060 and HORVU7Hr1G000330 may be through the way of cis-for regulating those genes, for example, through the open chromosome sites, which was a hot research topic in these years (Lu et al., 2017).
Here, GWAS shows promising features that could help elucidate the genetic basis of the qualitative and economic β-glucan trait in barley. This is the first study to investigate the candidates genes behind the H. vulgare seed β-glucan trait by whole genome association mapping analysis since the latest barley genome sequences was available in 2017. Our results could broaden the scope of functional genes identification on seed β-glucan in H.
vulgare.  Manhattan plot of genome-wide association mapping of β-glucan using FarmCPU model excuted in rMVP R package. The horizontal line is the P < 4.33×10-7 cutoff.

Abbreviations
25 Figure 6 QQ-plot diagram of candidate genomic sites of β-glucan using 'Observation Pvalue' and the 'Expected P-value'.