[ad_1]
The common and rare variant genetic architecture of circulating retinol
We integrated common and rare variant data from up to 22,274 individuals of European ancestry in our discovery meta-analyses to estimate genetic effects on circulating retinol (Fig. 2A, B, Online Methods). Firstly, variants from the INTERVAL and METSIM studies were meta-analysed (NMeta = 17,268 – termed METSIM + INTERVAL, Fig. 1, Online Methods). After harmonisation, there were 8,173,975 common and 5,091,050 rare [minor allele frequency (MAF) < 1%] overlapping variants between INTERVAL and METSIM, respectively. As retinol effect sizes were estimated in the same units in both studies [plasma standard deviation (SD) units, quantified by the same instrument], we conducted both a fixed-effects inverse-variance weighted (IVW) meta-analysis and a sample-size weighted meta-analysis of Z scores (Stouffer’s method). Secondly, we conducted an additional meta-analysis (NMeta = 22,274) which also included data from two other studies (ATBC and PLCO), termed the METSIM + INTERVAL + ATBC + PLCO meta-analysis. However, there were markedly fewer variants available in this meta-analysis after imputing the ATBC and PLCO summary statistics and harmonising with METSIM + INTERVAL (NVar = 3,896,351, Online Methods). As a result, we focused on the METSIM + INTERVAL meta-analysis as the primary discovery dataset.

a Manhattan plot of the meta-analysis of common variants shared between the INTERVAL and METSIM cohorts (Stouffer’s sample size weighted meta-analysis). Variant-wise -log10 P-values for association are plotted, with the dotted red line denoting genome-wide significance. The closest genic transcription start site is labelled for each lead SNP. Constituent GWAS performed using multiple-linear regression. b Manhattan plot, as above, for the larger sample size meta-analysis that includes the ATBC and PLCO cohorts, but with fewer variants available for meta-analysis. Constituent GWAS performed using multiple-linear regression. c Estimates of SNP heritability of retinol (h2), with the error bars denoting the standard errors of the estimates. The first two panels denote estimates using the BLD-LDAK model and the LDAK-thin model, respectively, both using LD tagging files from the Great British ancestry participants in the UK Biobank. The last panel estimates heritability using the LDSR model with LD from the 1000 genomes European participants. Estimates were for the METSIM + INTERVAL meta-analyses (Stouffer’s and IVW), as well as the larger meta-analysis including ATBC/PLCO. d Empirical Bayes’ estimation of non-null effects on retinol genome-wide, stratified by bins of ascendingly sorted LD score by magnitude. The LD score bins were different for each panel – 1000 bins, 1000 genomes European LD scores (top left); 5000 bins, 1000 genomes European LD scores (top right); 1000 bins, UKBB white British LD scores (bottom left); 5000 bins, white British LD scores (bottom right). Each point represents the proportion of non-null effect sizes for that bin, with the trendline estimated using a generalised additive model for the relationship between ascending LD score bin and the proportion of non-null effects.
Considering common variants from the HapMap3 panel (MAF > 0.05, outside of the major histocompatibility complex (MHC) region), we observed a relatively subtle inflation of retinol signals across the genome as indexed by the mean \({\chi }^{2}\) statistic, with mean \({\chi }^{2}\) values around 1.03-1.04, regardless of the meta-analysis considered (Supplementary Data 1). Common variant SNP heritability (h2SNP) was firstly estimated using the linkage disequilibrium (LD) score regression (LDSR) approach and the 1000 genomes European reference panel. SNP heritability point estimates from this approach were between 6%-7% and nominally statistically significant (P < 0.05), although with somewhat large standard errors (2.6%-3.2%) (Fig. 2C, Supplementary Data 2). SNP heritability estimates of circulating retinol increased to between 10%-13%, but were still noisy, using two alternate models to estimate h2SNP and the UK Biobank (UKBB) as the LD reference (Fig. 2C, Online Methods, Supplementary Data 2). Partitioned h2SNP across tissues and cell-types demonstrated nominal enrichment in logical contexts given the known biology of retinol homoeostasis such as liver, adipose, pancreas, and blood (Supplementary Fig. 1). The somewhat large h2SNP standard errors are likely a product of sample size; however, we then conducted further analysis to explore the extent of the polygenic signal associated with retinol across the genome using an Empirical Bayes’ method (Online Methods). This method was utilised to model the number of non-null effects on retinol genome-wide stratified by bins of LD scores that index the extent of LD any given variant exhibits with other variants (Fig. 2D). Across all LD score bins, we estimated that the mean fraction of common variants across the genome with non-null effects on retinol was between 1.4% to 2.4%, depending on the modelling parameters used. In line with expectation, the proportion of non-null retinol effects was very high (> 50%) when considering the variants that display the most extensive LD (highest LD score bins). The application of these analyses to two GWAS of another vitamin (25-hydroxyvitamin D3) with either comparable or much larger sample sizes23,25, suggested that the less-polygenic architecture of retinol observed in this study may become more diffuse across the genome with greater sample sizes, in line with many other quantitative traits (Supplementary Fig. 2). However, we caution that further investigation will be required as these data become available to confirm this.
Next, we processed the common variant results of both meta-analyses to identify genome-wide significant loci associated with circulating retinol (PGWAS < 5 \(\times\) 10−8). In the primary discovery meta-analysis (METSIM + INTERVAL, Stouffer’s method), we uncovered eight genome-wide significant loci, six of which were not reported in the previous Mondul et al. retinol GWAS (Table 1). The absolute value of the effect sizes (\(\beta\)) of these lead SNPs were between 0.066 and 0.172 SD in circulating retinol per effect allele, as derived from the IVW meta-analysis. We observed minimal heterogeneity between the two cohorts for these lead SNPs, although heterogeneity was slightly more marked for rs6601299 (Supplementary Fig. 3,4). Replication was then attempted in the TwinsUK cohort (N up to 1635, Online methods). Considering the mean association across all timepoints retinol was measured, as well as the twin pairs separately, 7 out of the 8 lead SNPs in the loci had effect sizes in the same direction, which was greater than expected by chance alone (Binomial P = 0.035, Supplementary Data 3). Of these, the TTR, GCKR, FOXP2, and PPP1R3B lead SNPs, denoted as such based on their closest TSS, were at least nominally significant at one or more of the measured timepoints.
We also investigated the effect of retinol-associated lead SNPs on factors associated with dietary intake of retinoids. Using a GWAS of retinol intake derived from a self-reported 24-hour dietary recall in the UK Biobank (UKBB, N = 62,991)26, we found no evidence to suggest that any of the effect of these genetic signals on circulating retinol is mediated through influencing dietary intake behaviours (Supplementary Data 4, Supplementary Fig. 5).
In the larger sample-size meta-analysis with fewer variants available across all input datasets (METSIM + INTERVAL + ATBC + PLCO), six of the eight genome-wide significant loci from the smaller meta-analysis were available to test for association. Five of the six loci available became more statistically significant in this larger meta-analysis, whilst the chromosome 8 locus obtained a very similar level of statistical significance in both meta-analyses (Table 2). It should be noted that there was relatively large heterogeneity for the lead SNP at the TTR locus on chromosome 18 locus between cohorts (Supplementary Fig. 6). This was due to a more significant association in the ATBC + PLCO GWAS, although this region was still very strongly associated in both METSIM and INTERVAL in the same direction (Supplementary Fig. 6).
We also estimated rare variant (frequency <1%) effects on circulating retinol using variants available in the METSIM + INTERVAL meta-analysis. Despite relatively low power to detect rare variant association, we identified a genome-wide significant rare variant signal on chromosome five (chr5:86765041:T:C, dbSNP ID: rs138675130) associated with reduced circulating plasma retinol per C allele (−0.441 SD, SE = 0.0709, P = 6.37 \(\times\) 10−9). There was no significant heterogeneity between the contributing studies for this signal. The frequency of this C allele in Europeans (gnomAD v.3.1.2) ranges from 0.5% in non-Finnish Europeans to 0.8% in Finns, and whilst it is marginally rarer in Africans and South Asians, it is entirely absent in the East Asian and Middle Eastern populations in the gnomAD database. The variant is intergenic and in FinnGen release 8, the C allele was associated at phenome-wide significance (P < 1 \(\times\) 10−5) with increased odds of benign neoplasm of the eye and adnexa, as well as whooping cough. The closest canonical transcription start site to this variant is that of COX7C, which encodes a subunit of a terminal component of the mitochondrial respiratory chain. We also uncovered several suggestively significant rare variant signals (P < 1 \(\times\) 10−5), including three non-synonymous variants in the genes FREM2, NAXD, and CHD1 (Supplementary Data 5). Of these, the rare NAXD non-synonymous allele suggestively associated with lower retinol (rs3742192) had some in silico evidence to suggest deleteriousness (Supplementary Data 5), although it is classed as benign in ClinVar. Finally, to boost power, we also statistically aggregated rare variants to genes (Online Methods). There were no significant retinol genes after Bonferroni correction of these gene-level association results (P < 3.02 \(\times\) 10−7); however, there were two novel suggestively associated genes (P < 3.02 \(\times\) 10−5), GALM and ZDHHC18 (Supplementary Data 6).
Prioritisation of retinol-associated genes reveals mechanisms of genetic influence on circulating retinol
In common variant loci, prioritising causal genes can be difficult due to confounding factors like linkage disequilibrium. For circulating retinol, we employed a multi-faceted approach to prioritise genes that are confidently associated. Firstly, we sought to interrogate the eight genome-wide significant loci uncovered in the main discovery meta-analysis (METSIM + INTERVAL) to uncover plausible causal genes. This was achieved by adapting an integrative pipeline developed in previous work that considers annotation, probabilistic finemapping, integrative scoring, and in silico prediction (Online Methods, Supplementary Data 7). We describe these results further for each locus in supplementary note 1 but summarise the prioritisation evidence forthwith. There was quite consistent evidence in four loci for a likely causal gene (GCKR, FOXP2, RBP4, and TTR). RBP4 (chromosome 10 locus) and TTR (chromosome 18 locus) were previously associated with retinol in the only other dedicated GWAS of this trait and form the complex that transports retinol in serum22, thereby, having a direct biological link to retinol abundance. GCKR, a gene encoding a protein that binds to and regulates the key metabolic enzyme glucokinase, was confidently the causal gene for the locus on chromsome 2 with the rs1260326 lead SNP. This gene is known to have a large and varied metabolic association profile due to the role of glucokinase in glycaemic and lipid-related processes, amongst others27,28,29. Interestingly, the lead SNP (rs1260326), and most likely causal variant derived from probabilistic finemapping, was a common missense allele, whereby the retinol increasing T allele corresponds to a substitution of leucine for proline in the GCKR protein. Previous experimental work suggests that this variant impacts GCKR affinity for glucokinase27,30.
The transcription factor gene FOXP2 was also strongly supported by multiple lines of evidence as a causal gene for the locus on chromosome seven. The role of this gene in the brain and in relation to neurological phenotypes like language has been extensively studied31, but less so in the periphery despite relatively high expression across many different systemic tissues. Therefore, we analysed RNA-sequencing data of FOXP2 overexpression in a cell-line not derived from the central nervous system (human osteosarcoma epithelial cell line) and revealed the transcriptional correlates of FOXP2 overexpression were enriched for a broad range of pathways related to factors including extracellular matrix biology, glycosylation, and interleukin signalling, amongst many others (Supplementary Data 8-11, Online Methods)32.
The remaining four loci exhibited less clear evidence of which gene to prioritise, although all point to potentially interesting functional mechanisms. On chromosome eight, there is some evidence to support PPP1R3B as a gene that influences retinol, which encodes a catalytic subunit of the phosphatase PP1 that is implicated in relevant metabolic processes like glycogen synthesis33. However, other lines of evidence point to the role of long-noncoding RNA in this locus. The loci on chromosomes 16 and 20 are noteworthy as the closest transcription start sites to the respective lead SNPs are two transcription factors (TF) from the Maf family (MAF and MAFB). Interestingly, MAFB has been shown to regulate both TTR and RBP4 expression in various tissue contexts from human or murine studies34,35. As only some lines of evidence support these two TF, further functional characterisation of these two loci is warranted. Interestingly, in the locus on chromosome 16, two of the other genes with some evidence for a retinol-related function (MAFTRR and LINC01229) have been shown experimentally to regulate MAF expression and are also associated with other biochemical traits like urate, further supporting the role of the Maf family on retinol abundance in serum36. Finally, the remaining locus on chromosome 2 (2:122078406-122084285) had the least interpretable functional prioritisation results. The closest transcription start site to the lead SNP was another TF (TFCP2L1) that has broad physiological roles, including in the kidney37.
We then sought to expand our scope for gene discovery beyond genome-wide significant retinol-associated loci through further integration of genetics with transcriptomics and proteomics (Online Methods, Supplementary Data 12-13). Firstly, we leveraged multivariate models of genetically regulated expression (GReX) to perform a transcriptome-wide (liver, whole blood, adipose, small intestine, pancreas, and breast mammary tissue) and proteome-wide (plasma) association study (TWAS/PWAS) of circulating retinol. Tissues for the TWAS were selected based on prior knowledge of retinol biology and the results of the partitioned heritability analyses (Online Methods), whilst plasma was the only tissue available for PWAS. After applying multiple-testing correction to the TWAS and PWAS individually (FDR < 0.05) and testing whether there was a shared causal variant via colocalisation [Posterior probability (PP) of a shared causal variant (H4), PPH4 > 0.8], we identified strong evidence of four additional retinol-associated genes outside of genome-wide significant loci (at least +/− 1 megabase away). These were as follows: MLXIPL, which binds to carbohydrate response elements to regulate triglycerides38,39,40; GSK3B, a gene that encodes a member of the glycogen synthase kinase family involved in metabolism and glycaemic homoeostasis41; the tankyrase gene (TNKS) implicated in processes like Wnt signalling42; and INHBC, part of the inhibin family of proteins with important endocrine functionality43. Genetically predicted mRNA expression of MLXIPL in adipose, pancreas, and breast mammary tissue was inversely associated with circulating retinol levels. Conversely, TWAS analyses revealed that genetically predicted expression of GSK3B and TNKS was positively associated with circulating retinol levels. Finally, genetically predicted plasma protein expression of INHBC showed a positive correlation with retinol levels (ZPWAS = 4.72). We then used a more conservative approach whereby finemapped variants that putatively influence protein expression (pQTLs) were used as instrumental variables (IV) to estimate the causal effect of plasma proteins on retinol using Mendelian randomisation (MR, Online Methods, Supplementary Data 14). We applied the same filters to the results (FDR < 0.05 and PPH4 > 0.8). The colocalisation tests assist to detect proteomic effects on retinol that have higher confidence; however, as we are mostly using a single finemapped IV per protein, we do not have access to the same suite of sensitivity analyses that are usually deployed in a polygenic MR approach which uses multiple IVs. These pQTL-MR analyses further supported that upregulated INHBC likely increases serum retinol, with each SD increase in plasma protein expression associated with a small but highly statistically significant impact on circulating retinol [0.05 SD increase in retinol per SD in INHBC expression, 95% CI: 0.03, 0.07]. In line with expectation, pQTL-MR, and subsequent colocalisation, further genetically validates that elevated RBP4 protein abundance correspondingly increases serum retinol with somewhat large effect (0.6 [95% CI: 0.48, 0.72] SD in retinol per SD in plasma RBP4 expression). There was also evidence that RBP4 protein expression and retinol colocalise under the hypothesis of a single causal variant (PPH4 = 1). Considering the eight genes prioritised in this and the previous section (RBP4, GCKR, FOXP2, TTR, MLXIPL, GSK3B, TNKS, INHBC), we found that these genes exhibited upregulated expression in the liver (PAdjusted < 0.05) upon analysing data from 54 GTEx tissues. This further consolidates the salience of hepatic processing to genetic influences on circulating retinol abundance. Pathway analyses of these genes demonstrated that they were enriched amongst factors such as those involved in carbohydrate metabolism (Supplementary Data 15).
Evidence of causal effects of retinol across the human clinical phenome using retinol binding protein 4 variation as a genetic instrument
The role of retinol in human health and disease has been of long-standing interest. However, most evidence has been observational, limiting the ability for causal inference. Moreover, randomised controlled trials (RCT) of interventions like retinol supplementation and endogenous/synthetic retinoids have only been performed for a small fraction of the traits implicated through observational studies. We sought to increase our understanding of causal effects of genetically predicted retinol on human health by leveraging genetic variants associated with retinol uncovered in this study as IVs. Given certain assumptions are met, these genetic proxies of retinol can be utilised to estimate causal effects of circulating retinol at scale using Mendelian randomisation (MR) (Online Methods). Firstly, we utilised a single IV in RBP4 (rs10882283), as this gene has a clear and unambiguous association with circulating retinol levels, and therefore, is less likely to be prone to horizontal pleiotropy than other retinol-associated loci. We do caution, however, that RBP4 does exert some other functionality that may not be directly related to retinol transport44, although this gene is still likely the best available single IV associated with circulating retinol at genome-wide significance. We utilised this RBP4 IV (METSIM + INTERVAL meta-analysis effect size in SD units) to estimate the effect of retinol on over 19,500 outcomes in the IEUGWAS database (IEUGWASdb, Online Methods, Supplementary Data 16). IEUGWASdb collates and harmonises GWAS summary data from GWAS catalog, consortium based GWAS, UK Biobank GWAS, and several other sources. After multiple-testing correction (FDR < 0.01), genetically predicted retinol was found to putatively exert causal effects on outcomes including several lipid traits, leucocyte counts (total leucocytes and neutrophils), reticulocytes, optic disc area, and resting-state connectivity of a functional MRI (fMRI) derived network edge. For instance, each SD in circulating retinol was associated with a −0.1 SD [95% CI: −0.14, −0.06] decrease in leucocyte count, whilst this unit retinol increase was estimated to increase optic disc area by 0.22 SD [95% CI: 0.12, 0.32]. We further interrogated a representative subset of these associations using colocalisation to test if the signals are driven by a shared causal variant in RBP4 (Fig. 3, Supplementary Data 17). Colocalisation strongly supported that the effect of circulating retinol on leucocyte count, optic disc area and the edge in the fMRI network arises from a shared causal variant in RBP4. In contrast, the effect of retinol on lipids through RBP4 was shown to likely arise from linkage due to the proximal gene FFAR4 that encodes a free-fatty acid receptor. Therefore, the lipid findings may not represent a causal impact of circulating retinol, at least through RBP4, but rather the influence of FFAR4, although further investigation is warranted, particularly using analyses that consider multiple causal variants at this locus.

The effect size units are the beta per standard deviation (SD) increase in circulating retinol. The error bars denote 95% confidence intervals. Traits highlighted orange demonstrate strong evidence of a single shared causal variant with circulating retinol in the RBP4 region (PPH4 > 0.8).
A limitation of the above phenome-wide analyses is that the multiple-testing burden that arises from the inclusion of over 17,000 traits may obscure retinol effects on binary disease phenotypes, as these are usually less powered than GWAS of continuous traits. To overcome this, we also estimated the effect of retinol using the RBP4 IV on 1141 binary endpoints with at least 1000 cases in FinnGen release 8 (not featured in IEUGWASdb), allowing a phenome-wide analysis of electronic health record derived binary outcomes. However, there were no traits that survived multiple-testing correction. The most statistically significant result was some nominal evidence of a protective effect of circulating retinol on the odds of carpal tunnel syndrome – OR = 0.682 [95% CI: 0.558, 0.860], P = 2.2 \(\times\) 10−4, q = 0.25.
Exploratory analysis of phenome-wide effects of retinol using multiple instrumental variables
To boost power for discovery of causal retinol effects, we then utilised non-palindromic, independent (LD r2 < 0.001) genome-wide significant SNPs as IVs (Online Methods). We do caution that due to the immense pleiotropy of some IVs, such as the SNP in GCKR, these results must be interpreted carefully. As a result, we aimed to implement a pipeline that refines the trait pairings with the strongest evidence. Firstly, we used the inverse-variance weighted estimator with multiplicative random effects (IVW-MRE) to estimate the effect of retinol on IEUGWASdb outcome GWAS for which at least six of the IVs were available (> 17,000 outcome phenotypes). There was moderate positive correlation between the IVW-MRE and single RBP4 estimates across all traits tested (\(r\) = 0.43, Supplementary Fig. 7). Whilst well powered, the IVW-MRE assumes all IVs are valid, which is unlikely in practice. Therefore, we developed a pipeline to prioritise the most confident causal relationships that survive multiple-testing correction considering the IVW-MRE estimates (FDR < 0.01, Online Methods, Fig. 4A, Supplementary Data 18–20). This was comprised of three tiers, with Tier #1 being the highest level of evidence. Retinol/outcome pairings that were assigned a tier had to exhibit no significant heterogeneity between IV-outcome effects, a non-significant MR-Egger intercept (which screens for unbalanced pleiotropy), and not be driven by a single IV. Four additional MR methods with differing assumptions were then applied in this study (Online Methods). From the trait pairings that passed the above heterogeneity and pleiotropy filters, Tier #1 traits were those for which all five methods were nominally statistically significant (P < 0.05), whilst Tier #2 traits had 4/5 significant methods, and Tier #3 traits 3/5 methods significant. There were no Tier #1 retinol/outcome trait pairings, but several Tier #2 and Tier #3 trait pairings (Fig. 4B, C), with all of them directionally consistent with the estimates from the single RBP4 IV, supporting their validity. Before formally reporting trait pairings as either Tier #2 or Tier #3 in the manuscript, we excluded traits that would have been assigned a tier with non-European ancestry (Age hay fever, rhinitis or eczema diagnosed, cereal type, and sitting height ratio), and ensured that the GWAS reported were large and well powered relative to the outcome trait. Broadly, we found evidence that genetically predicted retinol may increase body fat-related measures, resting-state fMRI connectivity of several network edges, as well as food consumption phenotypes related to carbohydrates. Retinol also exhibited evidence of a relationship with the cortical thickness and surface area of several brain regions, as well as microbiome composition and keratometry measurements. These results were dominated by continuous traits, for which we are better powered, however, there were some binary traits assigned Tier #2 or Tier #3 evidence. Specifically, genetically proxied retinol was associated with decreased odds of coxarthrosis (arthrosis of the hip), whilst it was associated with adverse asthma/COPD medication effects and dental problems. We caution that all these inferred associations require further investigation, and should be treated with requisite caution, as reviewed elsewhere19,45,46. GWAS of these outcome traits with greater sample size will also no doubt emerge in the future, and these findings should be assessed for consistency using those larger studies. Moreover, the power and assumptions of each MR test is also a key consideration when further interrogating these reported trait pairings – for instance, the MR-Egger approach is relatively low-powered compared to other methods used here but is unbiased in the presence of pleiotropy under the InSIDE (Instrument Strength Independent of Direct Effect) assumption. Another consideration of this approach, which leverages multiplicative random effects with relatively few IVs (<10), is the potential influence of residual standard errors below 1 on the estimation of the IVW standard errors. We see some evidence of this impact on the IVW-MRE estimates for Tier #2 and Tier #3 traits – as these traits exhibit no significant heterogeneity between IV estimates. Specifically, whilst all fixed effect IVW results are still highly statistically significant for these traits, they have larger standard errors than the IVW-MRE, indicative of residual standard errors <1. This is a function of the MRE not scaling the standard error of the IVW by the model residual standard error like in the fixed effects model. We discuss this further in Supplementary note 2 and in Supplementary Fig. 8. However, these issues only impact the P-value of the IVW-MRE relative to that of the fixed effects, and Tier #2/Tier #3 traits still exhibit non-zero evidence across multiple methods and no indication of a single IV driving the association. Further, using instead a fixed effects IVW estimator as the primary test for trait pairings with no heterogeneity (Cochran’s Q P < 0.05) yields similar outcomes being prioritised as most statistically significant after FDR correction (Supplementary Note 2). It is also important to consider when interpreting these estimates that MR approaches are only valid under the assumptions they make, and as a result, any potential causal relationship reported here requires further validation, ideally using a randomised control trial design.

a Prioritisation pipeline overview for retinol causal estimates [inverse-variance weighted estimator with multiplicative random effects (IVW-MRE)] that survive multiple-testing correction (FDR < 0.01). These estimates are then subjected to tests for heterogeneity and pleiotropy (Online Methods), with a tier then assigned based on how many of the five Mendelian randomisation (MR) methods applied are at least nominally statistically significant. In panels b and c, the left-hand plot denotes the Z-score (beta/SE) from the MR IVW-MRE estimates. Positive Z scores denote a positive IVW-MRE estimate of the effect of circulating retinol on that trait, and vice vera. The traits are coloured by their broad phenotypic category. The right-hand plot visualises the Z score using the IVW-MRE model verses that of the MR estimate using the RBP4 IV alone (Wald Ratio). The dotted lines approximately represent nominal statistical significance (P < 0.05). In panel b, just tier #2 traits are plotted (Online Methods), whilst panel c plots both tier #2 and tier #3 traits.
We hypothesised that the putative effect of retinol on body fat could be one explanation for its relationship in this study to brain phenotypes beyond a direct effect of retinoid signaling, particularly given that obesity and adiposity have been linked with brain indices captured by MRI47. However, using IVs for body fat percentage, we did not find any strong evidence that it is causally related to any of the retinol-associated brain regions (Supplementary Data 21), suggesting a direct effect of retinol on these regions/networks or a relationship induced through some other unobserved confounder.
We then applied the above pipeline to the FinnGen r8 traits. There were eight disease phenotypes that retinol was associated with after multiple testing corrections (Supplementary Fig. 9, FDR < 0.01), which increased to 19 with an exploratory FDR < 0.1 threshold (Supplementary Data 22). There was a moderate degree of concordance between the FinnGen r8 phenome-wide results using the single RBP4 IV and the IVW-MRE method (\(r\) = 0.48). We found three disease endpoints with Tier #3 evidence (Fig. 4B, Supplementary Data 23). Specifically, genetically predicted retinol was estimated to increase the odds of congenital malformations of the heart and great arteries, whilst it was protective for type 2 diabetes with coma and inflammatory liver disease. As these traits had tier #3 evidence, there was some inconsistency in the strength of the results across different MR methods, and therefore, these relationships should be interpreted cautiously. One of the most active areas of research in retinol epidemiology is the relationship between retinol and cancer risk48. The estimated effect of circulating retinol on the odds of any malignant neoplasm was not significantly different than one – OR = 0.97 [95% CI: 0.91, 1.04], P = 0.423 (Supplementary Fig. 10). However, there was some indication of a protective effect of retinol on squamous non-small cell lung cancer, which approached the threshold for statistical significance after FDR correction – OR = 0.64 [95% CI: 0.51, 0.80], P = 8.19 × 10−5, q = 0.01. There was also some data to support retinol having effects on other respiratory neoplasm endpoints (Supplementary Fig. 10). Using the single RBP4 IV, there was additionally some weak, nominal evidence for a protective effect of retinol on all non-small cell lung cancers (P = 2.3 × 10−3, q = 0.45) Given previous observational evidence that retinol is protective for lung cancer48, as well as some data supporting the use of synthetic retinoids like bexarotene in lung neoplasms49, this relationship warrants further exploration.
Genetic evidence that lipids and kidney function influence circulating retinol
It is also clinically valuable to understand exposures and diseases that impact circulating retinol abundance. To explore this in greater detail, we leveraged retinol as an outcome trait in MR analyses. We utilised a diverse range of thousands of continuous and ordinal phenotypes from IEUGWASdb as exposures in a similar pipeline described above (Online Methods). Several lipid species were demonstrated to putatively influence retinol abundance after multiple-testing correction (FDR < 0.01, Fig. 5A, Supplementary Data 24); for example, triglycerides were implicated to increase circulating retinol whilst cholesteryl ester-related traits decreased circulating retinol. We also saw some evidence for a positive effect of the frequency of solarium and sun lamp use on retinol, which may arise from behavioural-related mechanisms. Furthermore, our findings suggest that increased retinol levels are associated with a susceptibility-weighted MRI measure in the left putamen, called T2*. T2*, reflecting magnetic susceptibility relative to tissue water, can be influenced by factors such as iron and calcium content. This association may also be influenced by behavioural or other pathways that warrant further investigation.

a Exposure traits that demonstrated a significant causal estimate (IVW-MRE) on circulating retinol after multiple-testing correction (FDR < 0.01). Traits are coloured relative to their broad phenotypic category. b Multivariable MR (MVMR) models investigating the effect of creatinine and major lipid species on circulating retinol. Each panel represents the results from a different MVMR model (each with different underlying assumptions (Online Methods). The exposure – retinol relationship plotted is conditional on the three other traits in the model.
After applying the same tiering system used for retinol as an exposure, we observed Tier #1 evidence strongly supporting a causal effect of creatinine on circulating retinol. This relationship is biologically plausible and may represent an association with kidney function50,51, although circulating creatinine is biologically heterogenous and is also influenced by unrelated factors like muscle mass. Due to the biological complexity of lipid traits, we observed significant heterogeneity between IV effects, and as a result, they were not assigned a tier in our analysis. However, the effect of triglycerides on increasing circulating retinol levels is consistent with established knowledge of retinol biology, as well as this study implicating two genes that are mechanistically confirmed to impact triglycerides (GCKR and MLXIPL). We then leveraged the CAUSE model to distinguish causal effects of creatinine on circulating retinol from correlated pleiotropy that may arise between these two traits due to the extensive polygenicity of creatinine52. We found that a model that includes a causal effect of creatinine on retinol was more parsimonious than a model of pleiotropy alone (sharing model) through comparison of these models using the Bayesian expected log pointwise posterior density (ELPD) method (Supplementary Fig. 7, \(\triangle\)ELPDSharing vs Causal = −4.34, P = 8.9 ×10−3). Given that IVs for creatinine could plausibly act through lipid species like triglycerides to influence circulating retinol, we then constructed multivariable MR (MVMR) models that estimated the creatinine to retinol relationship conditioned on high density lipoprotein (HDL), low density lipoprotein (LDL), and triglycerides (Online Methods). While there was some evidence that the effect of creatinine on retinol could arise due to triglycerides, there was also evidence to suggest an independent effect of both triglycerides and creatinine on increasing circulating retinol, depending on the modelling parameters used (Fig. 5B, Supplementary Note 3).
Finally, we explored pharmacological agents and molecular perturbagens that may influence circulating retinol (Online Methods). Considering the novel genes prioritised in this study with an assigned direction of expression (TWAS/PWAS, pQTL MR), it was found that GSK3B is a drug-target known to be inhibited by lithium and related compounds. This may be of clinical interest as it suggests that lithium, utilised as a therapy in mood disorders, may decrease circulating retinol via its inhibition of GSK3B given that genetically predicted expression of this gene was positively associated with retinol. We then employed computational signature mapping to further characterise pharmacological agents related to retinol (Online Methods). However, these analyses did not yield any compounds for which the in vitro transcriptomic signature significantly matched or opposed genetically predicted expression associated with retinol after multiple-testing correction (Supplementary Data 25). We then considered perturbagen signatures aggregated to biological pathways or compounds grouped by overall mechanisms of action (MOA) (Online Methods). After multiple-testing correction (FDR < 0.05), there were 13 gene-set based perturbagen signatures that were significantly similar to the directionality of genetically predicted expression associated with retinol (Supplementary Data 26). For example, the expression signature of compounds in the HDAC inhibitor MOA opposed expression genetically predicted to increase serum retinol. This can be interpreted as while no single HDAC inhibitor was significantly associated with retinol, there was at least some evidence for the overall relationship with this MOA. This accords with the suggested effect of HDAC inhibitors like valproic acid on downregulating the expression of RBP453,54,55, a gene not included in the signature mapping analyses given the large effect of its encoded protein on retinol.
No strong evidence for traits which exhibit bidirectional relationships with circulating retinol
We then considered whether any of the tested traits may exhibit a bidirectional causal relationship with circulating retinol. Reverse causality for Tier #2 and Tier #3 traits from the IEUGWASdb pipeline was first assessed, although for binary traits this should be treated purely as a test of the null hypothesis given the difficulties in using IVs for binary traits56. There was no strong evidence for reverse causality of any of these traits (Supplementary Data 27). One exception to this was in relation to expression of the protein PEAR1, for which there was very nominal evidence of bidirectional effects. We also investigated evidence for bidirectional effects involving FinnGen Tier #3 traits that retinol is genetically predicted to influence. The use of either genome-wide significant or suggestively significant SNPs (P < 1 \(\times\) 10−5) as IVs did not indicate evidence of reverse causality of these diseases to retinol (Supplementary Data 28). However, given some of the statistical limitations of these analyses, such effects warrant further consideration. Lastly, we considered whether creatinine, as the most confident trait pairing with retinol as the outcome, exhibited a bidirectional relationship. There was only very weak evidence that circulating retinol has a negative effect on circulating creatinine (P = 0.023).
Genetically proxied retinol can identify individuals outside of the normative range of circulating retinol for a given age
We were also interested in evaluating the performance of a genetically proxied index of circulating retinol, that is, a circulating retinol polygenic score (PGS). The independent TwinsUK cohort was utilised to tune and evaluate retinol PGS (Online Methods). We used several methods to evaluate the performance of a retinol PGS in this cohort. Firstly, we test different retinol PGS configurations in a random selected training subset of the model (70% of cohort), using a linear mixed model to account for relatedness between the twin pairs. The best performing retinol PGS configuration in the training subset explained approximately 2.12% of the phenotypic variance of retinol when applied to the remaining 30% of the cohort (mean variance explained across three retinol measurement timepoints). We also found similar performance when applied to each subset of twin pairs (Supplementary Data 29). A limitation of this approach for using the same cohort for tuning and testing the retinol PGS is that the estimated effect sizes may not be representative. As a result, we employed an approach to tune weights for the PGS using the summary statistics alone. This was achieved through leveraging the principles of probabilistic finemapping to update variant weights by their posterior probability of association (Online Methods). Due to the modest polygenicity of retinol, this method upweights a small number of variants. Despite this, these scores were still significantly associated with circulating retinol in TwinsUK (Supplementary Data 29).
Like most micronutrients, circulating retinol has been shown previously to have a complex relationship with age57. However, population-level approaches investigating these effects do not account for inter-individual variability. We hypothesised that normative modelling could be used to characterise individual patterns of circulating retinol, and to evaluate the contribution of genetics to these individualised effects. Normative modelling, derived from the application of growth charts in paediatric medicine, aims to estimate normative reference ranges of variation in the population (e.g., of circulating retinol) based on age and/or other relevant variables. Here, we established reference ranges for circulating retinol as a function of age using generalised-additive models for location, scale, and shape (GAMLSS) frameworks in the TwinsUK dataset (Online Methods, Supplementary Note 4). These models were constructed in the full cohort, as well as in one twin subset only for comparison. Briefly, this involved identifying the optimal distribution for the GAMLSS model using all samples, followed by splitting the data into two partitions. One partition was utilised to estimate centiles (5th, 25th, 50th, 75th and 95th), and the left-out subjects were benchmarked against this reference chart to determine the position of their individual retinol measurement (Online Methods). This process was then repeated using the opposite subject allocation. An individual’s retinol level was classified as infra-normal if their measured retinol fell below the 5th percentile for their age, and as supra-normal if their retinol value exceeded the 95th percentile for their age.
We then investigated the extent to which retinol PGS was associated with these individual profiles of circulating retinol with respect to age (Supplementary Data 30). By way of example, we report results forthwith from the more conservative modelling approach which only used one half of the twins. Retinol PGS was at least nominally significantly associated with supra and infra-normal deviations except for supra-normal deviations at the first (youngest) visit for which there was only a trend observed (Supplementary Figs. 1213). For example, at the second visit each SD in retinol PGS was associated with an approximately 70% [95% CI: 23%, 136%] increase in the odds of exhibiting supra-normal retinol levels for a given age relative to all remaining participants, whilst conversely reducing the odds of displaying infra-normal levels by approximately 37% [95% CI: 12%, 55%]. These data suggest that genetics is a non-zero contributor to circulating retinol levels that fall outside the normative range for a given age. In future, a normative modelling approach could be utilised to examine additional factors, including dietary intake, as well as the interplay between genetics and other influences, that might contribute to individualised deviations in retinol levels relative to the population benchmarks.
[ad_2]
Source link