A Large‐Scale Full GBA1 Gene Screening in Parkinson's Disease in the Netherlands

Abstract Background The most common genetic risk factor for Parkinson's disease known is a damaging variant in the GBA1 gene. The entire GBA1 gene has rarely been studied in a large cohort from a single population. The objective of this study was to assess the entire GBA1 gene in Parkinson's disease from a single large population. Methods The GBA1 gene was assessed in 3402 Dutch Parkinson's disease patients using next‐generation sequencing. Frequencies were compared with Dutch controls (n = 655). Family history of Parkinson's disease was compared in carriers and noncarriers. Results Fifteen percent of patients had a GBA1 nonsynonymous variant (including missense, frameshift, and recombinant alleles), compared with 6.4% of controls (OR, 2.6; P < 0.001). Eighteen novel variants were detected. Variants previously associated with Gaucher's disease were identified in 5.0% of patients compared with 1.5% of controls (OR, 3.4; P < 0.001). The rarely reported complex allele p.D140H + p.E326K appears to likely be a Dutch founder variant, found in 2.4% of patients and 0.9% of controls (OR, 2.7; P = 0.012). The number of first‐degree relatives (excluding children) with Parkinson's disease was higher in p.D140H + p.E326K carriers (5.6%, 21 of 376) compared with p.E326K carriers (2.9%, 29 of 1014); OR, 2.0; P = 0.022, suggestive of a dose effect for different GBA1 variants. Conclusions Dutch Parkinson's disease patients display one of the largest frequencies of GBA1 variants reported so far, consisting in large part of the mild p.E326K variant and the more severe Dutch p.D140H + p.E326K founder allele. © 2020 The Authors. Movement Disorders published by Wiley Periodicals LLC. on behalf of International Parkinson and Movement Disorder Society.

A BS T RA C T : Background: The most common genetic risk factor for Parkinson's disease known is a damaging variant in the GBA1 gene. The entire GBA1 gene has rarely been studied in a large cohort from a single population. The objective of this study was to assess the entire GBA1 gene in Parkinson's disease from a single large population. Methods: The GBA1 gene was assessed in 3402 Dutch Parkinson's disease patients using nextgeneration sequencing. Frequencies were compared with Dutch controls (n = 655). Family history of Parkinson's disease was compared in carriers and noncarriers. Results: Fifteen percent of patients had a GBA1 nonsynonymous variant (including missense, frameshift, and recombinant alleles), compared with 6.4% of controls (OR, 2.6; P < 0.001). Eighteen novel variants were detected. Variants previously associated with Gaucher's disease were identified in 5.0% of patients compared with 1.5% of controls (OR, 3.4; P < 0.001). The rarely reported complex allele p.D140H + p.E326K appears to likely be a Dutch founder variant, found in 2.4% of patients and 0.9% of controls (OR, 2.7; P = 0.012). The number of first-degree relatives (excluding children) with Parkinson's disease was higher in p.D140H + p.E326K carriers (5.6%, 21 of 376) compared with p.E326K carriers (2.9%, 29 of 1014); OR, 2.0; P = 0.022, suggestive of a dose effect for different GBA1 variants. gene is also referred to as GBA1. In most populations, 4%-12% of PD patients carry a heterozygous GBA1 variant and in Ashkenazi Jewish PD patients this is approximately 20%. 2,3 The risk of PD in GBA1 variant carriers is increased by an estimated overall 2-to 7-fold (odds ratios [ORs]). [2][3][4][5] Rare homozygous or compound heterozygous GBA1 variants can cause the autosomalrecessive lysosomal storage disorder Gaucher's disease (GD). More than 400 variants have been reported to be associated with GD, 6,7 and all these alleles are potential risk factors for developing PD.
Full GBA1 gene sequencing is essential to unambiguously identify gene variants, considering a long tail of rare variants or even population-specific variants. 3,4,8 Nevertheless, rarely the entire GBA1 gene has been sequenced in a large cohort from a single population. Here, we report such a large-scale GBA1 screening performed in the Netherlands in the framework of a large program aimed at identifying patients with GBA1 variants for a clinical trial targeting the GBA1 mechanism. We sequenced the GBA1 entire open-reading frame (ORF) in 3402 people with PD living in the Netherlands. Variant frequency was compared with an existing Dutch control cohort (n = 655). Family history of PD was assessed in a subset of patients with the most common variants to compare familial aggregation.

Materials and Methods
Participants PD patients were included in the Netherlands between April 2017 and March 2018 (see supplementary data for details). Age at diagnosis of ≤50 years was considered early onset, and > 50 years was considered late-onset PD.
This study was approved by an independent ethics committee. Written informed consent was obtained from all participants according to the Declaration of Helsinki.
An independent Dutch study of 655 patients with abdominal aortic aneurysms was used for comparison (see supplementary data), using whole-exome sequencing (WES) data (average GBA1 coverage was 101 times). Data regarding the presence of neurological disease were unavailable.

Genotyping
Saliva was obtained from patients using Oragene DNA OG-500 tubes (DNA Genotek). DNA isolation, nextgeneration sequencing (NGS), and data analysis was performed by GenomeScan B.V., Leiden, the Netherlands. Primers were selected to unambiguously sequence the functional GBA1 gene and not the pseudogene, using longrange polymerase chain reaction (PCR). In a post hoc experimental setup using long-read sequencing with the PacBio Sequel system, phasing was assessed in 3 samples. See supplementary material for methodological details, including validation of a subset using Sanger sequencing.
Historically, GBA1 variants have been described based on the amino acid position excluding the 39-residue signal sequence at the start (also known as "allelic nomenclature"). Both the Human Genome Variation Society recommended nomenclature, and the allelic nomenclature is given (NCBI Reference Sequence: NM_000157.3). If an allele contained more than 1 exonic variant, this is referred to as a complex allele.
Genotypes were classified into 4 categories based on clinical associations using the Human Gene Mutation Database 7 : (1) Gaucher's disease associated (GD), (2) Parkinson's disease associated (PD), (3) synonymous, or (4) novel. If a subject had both a known and a novel variant, the genotype was considered novel. See supplementary data for details.
All variants that were 6 nucleotides or closer to a splice site were assessed with 4 in silico splicing programs implemented in Alamut (Alamut Visual version 2.13; see supplementary data).
A 2-step cross-validation was performed to assess risk of both false-positive and false-negative results when using WES (see supplementary data).

Family History
All patients with the GBA1 p.D140H + p.E326K, p. E326K, p.N370S, or p.L444P variants and a random subset of patients who did not carry GBA1 variants as per our methods and variant selection criteria (henceforth referred to as GBA1 wild type) were given a questionnaire to assess familial aggregation of PD and to assess a possible founder location of the p.D140H + p. E326K complex allele. See supplementary material for details.

Statistical Analysis
Fisher's exact test was used for categorical variables and the Mann-Whitney U test for continuous variables. Significance was flagged at P < 0.05. ORs were calculated with a 95% CI. IBM SPSS Statistics 25 software was used.

Results
In total, 3638 PD patient samples were included, of which 3402 could be genotyped. Of the remaining 236 samples, no DNA could be extracted or PCR failed. Demographics can be found in Supplementary Table 1. Eighty-one percent of patients were recruited through referral by a neurologist.

Sequencing
Average coverage was 2703 times ( Supplementary Fig. 1). The subset of samples used in the Sanger sequencing validation were all confirmed (see supplementary data).
In total, 19 GD variants, 5 PD variants, 12 synonymous variants, and 18 novel variants were identified. In 1 sample with p.D140H + p.E326K, phasing was confirmed using PacBio sequencing. See supplementary data for a further description of variants found. Supplementary Table 3 contains a variant frequency comparison with data from GoNL 9 and GnomAD 10,11 for reference; however, methodology in these cohorts was not dedicated to GBA1 sequencing.
No intronic variants were assessed to have a possible effect on splicing (Supplementary Table 4).

Control Cohorts Cross-Validation
In the control cohort, 42 samples had a nonsynonymous GBA1 variant detected using WES that could be tested with our NGS protocol. Using NGS, 4 control samples were detected to be false-positive, and 3 samples were partially false-negative (for p.D140H in a p.D140H + E326K complex allele). Conversely, after rerunning 48 GBA-PD samples with WES, 1 false-negative was detected. See supplementary data for details.

Demographics Based on GBA1 Status
Demographics are given in Supplementary Table 1, divided over whether subjects carried a nonsynonymous variant. A larger portion of carriers had early-onset PD (27.2%) compared with noncarriers (18.2%), P < 0.001. Conversely, of all subjects with early onset, 20.1% had a GBA1 variant, compared with 13.1% in those with late onset (P < 0.001).

GBA Variants and Familial Aggregation of PD
A questionnaire was completed by 180 carriers of p.E326K, 24 carriers of p.N370S, 28 carriers of p.L444P (including 4 complex and 3 recombinant alleles), 73 carriers of p.D140H + p.E326K, and 135 GBA1 wild types. Combining all carriers, 3.6% of all siblings and parents combined had PD compared with 2.0% in siblings and parents of noncarriers (OR, 1.8; 95% CI, 1.0-3.2; P = 0.043). None of the children developed PD, probably because of the present younger age, so these were excluded from analysis of first-degree relatives (Supplementary Table 2). Supplementary Figure 2 depicts the total number of first-degree relatives (excluding children) per variant type and the percentage of these relatives with PD. A variant dose effect was seen (see supplementary data for details).

Founder Location p.D140H + p.E326K
Supplementary data and Supplementary Figure 3 show a heat map of descent of grandparents of p.D140H + p.E326K carriers, visually suggesting (no formal statistical testing) the northern Netherlands as a possible founder location for this complex allele.

Discussion
To our knowledge, this study is the largest cohort known to date from a single country that has had full gene GBA1 sequencing in PD patients. A total of 15.0% of all patients had nonsynonymous GBA1 variants, which is the highest prevalence reported to date in a non-Ashkenazi Jewish population. The relatively high prevalence of the population-specific p.D140H + p.E326K complex allele and the long tail of rare variants, including 18 novel variants, highlight the importance of sequencing the full GBA1 ORF. Identifying all these variants will strengthen our understanding of the effect of GBA1 variants, and it facilitates recruitment for the upcoming GBA1-targeted trials, hopefully resulting in a first disease-modifying drug for PD. 12 Comparing different countries, 3,4,8,13-26 the p.E326K variant is reported most frequently in the Netherlands (present study) and Scandinavian countries. 20,24 Table 2 compares the most common GBA1 variants and the p.D140H + p.E326K complex allele in large PD cohorts from single countries that performed full GBA1 ORF sequencing. Swedish 24 and Russian 15 cohorts were included despite selective sequencing because of their size to compare the p.E326K variant. This overview shows the near-exclusive appearance of p.D140H + p.E326K in the Netherlands. The p.D140H + p.E326K complex allele has only sporadically been reported, once in GD, 27,28 sporadically in PD 4,29 and once in Lewy body dementia. 30 Intronic splice-site variants have rarely been systematically assessed previously, 17,23 ; however, these do not seem to play a role in GBA-PD pathology in our Dutch cohort.
The importance of adequate genotyping methodology when sequencing GBA1 was once more confirmed. In the control cohort, the GBA1 variants were reassessed with NGS, which identified 4 false-positive p.L444P variants in WES. Also, 3 p.D140H variants were falsely    GD, Gaucher's disease; PD, Parkinson's disease; syn, synonymous; NA, not applicable; Intr., intronic. The sixth column "allelic name" contains the annotation historically used in Gaucher's disease literature, excluding the 39-amino acid signaling peptide. All genotype frequencies are compared with the abdominal aortic aneurysm control cohort, ORs are given with the 95% CIs and a P value. A P < 0.05 is given in boldface, and the rows of these genotypes are filled gray. OR could not be calculated if frequency was 0 in either group. If 6 cases or less were affected in patients and zero in controls, not identified in 3 samples that also carried the p.E326K variant. The performance of the hybridization capture panel was lower over the p.D140H region, reflected in local lower coverage. Combined with a possible allelic imbalance for this specific variant, in which the amplification prefers the wild-type allele over the p. D140H allele, this could explain the false-negative output. Therefore, caution is advised when using GBA1 data generated using a methodology not specifically designed for GBA1 sequencing (including databases like ExAC or gnomAD). Because the p.E326K and p.T369M variants do not cause Gaucher's disease, these have long been termed polymorphisms. However, it has been shown in metaanalyses that these variants do confer an increased risk of developing PD (OR, 1.99 for p.E326K and 1.74 for p.T369M) [31][32][33] and therefore, despite not causing GD, should not be considered neutral polymorphisms.
Of all participants diagnosed with PD at 50 years of age or younger, 20.1% had a GBA1 variant. In clinical practice, when genetic testing is performed in early-onset PD, GBA1 is not always included. Because of the high prevalence of GBA1 variants in early-onset PD, it deserves consideration to include this in the screening, although the predictive value of a GBA1 variant for offspring is still limited.
GBA1 variant carriers have a larger frequency of a positive family history for Parkinson's disease 4,5,34 compared with noncarriers. In the current study, carriers of p.D140H + p.E326K had significantly more first-degree relatives with PD compared with p.E326K carriers. This implies a dose effect of variant severity in familial aggregation. However, it did not reach statistical significance for other variant types, likely because of the rarity of these variants.
The current study has some limitations. Because our NGS method used short-read sequencing, phasing of multiple variants could not be determined, unless these were within approximately 500 base pairs of each other. However, for a single p.D140H + p.E326K sample phasing was confirmed using PacBio, and p.D140H was never seen without p.E326K. A recombinant gene could be identified if the long-range PCR resulted in 2 distinct peaks on the Fragment Analyzer. See supplementary data for a further discussion of possible limitations.
In conclusion, this study is a successful example of how to ascertain and genotype a large cohort of patients with PD within a short time frame, which is relevant for progressing clinical trials aimed at developing personalized treatments.
G B A 1 G E N E S C R E E N I N G I N P A R K I N S O N ' S D I S E A S E variants were detected. GBA1 variant carriers had a younger age at onset and a higher chance of a positive family history for PD, with a trend toward a dose effect based on clinical association of the variant.