Skip to main content

Transcripts and genomic intervals associated with variation in metabolite abundance in maize leaves under field conditions

Abstract

Plants exhibit extensive environment-dependent intraspecific metabolic variation, which likely plays a role in determining variation in whole plant phenotypes. However, much of the work seeking to use natural variation to link genes and transcript’s impacts on plant metabolism has employed data from controlled environments. Here, we generated and analyzed data on the variation in the abundance of 26 metabolites across 660 maize inbred lines under field conditions. We employ these data and previously published transcript and whole plant phenotype data reported for the same field experiment to identify both genomic intervals (through genome-wide association studies (GWAS)) and transcripts (using both transcriptome-wide association studies (TWAS) and an explainable artificial intelligence (AI) approach based on random forest (RF)) associated with variation in metabolite abundance. Both genome-wide association and random forest-based methods identified substantial numbers of significant associations including genes with plausible links to the metabolites they are associated with. In contrast, the transcriptome-wide association identified only six significant associations. In three cases, genetic markers associated with metabolic variation in our study colocalized with markers linked to variation in non-metabolic traits scored in the same experiment. We speculate that the poor performance of transcriptome-wide association studies in identifying transcript-metabolite associations may reflect a high prevalence of non-linear interactions between transcripts and metabolites and/or a bias towards rare transcripts playing a large role in determining intraspecific metabolic variation.

Peer Review reports

Introduction

Plants can produce a wide array of metabolites with diverse structures that perform essential roles in growth, cellular regeneration, resource allocation, development, and responses to biotic and abiotic stresses. While the total number of metabolites produced by plants likely exceeds one million, each plant species typically synthesizes between several thousand and several tens of thousands [1, 2]. In addition to interspecies metabolic diversity, substantial metabolic diversity exists between members of the same species (intraspecies diversity). Investigating the genetic determinants of intraspecies variation in plant metabolism can provide insight into both the enzymes responsible for specific steps in metabolic pathways and also the role variation in plant metabolism plays in determining whole plant phenotypes [2, 3]. Variation in the abundance of lignin precursors is correlated with variation in biomass production in Arabidopsis thaliana and maize [4, 5]. Many metabolic and morphological trait pairs exhibited significant correlations in an analysis of 64 metabolite traits and 35 morphological traits scored across a tomato mapping population [6]. QTL for nine of twenty-six whole plant phenotypes evaluated in potato colocalized with QTL for variation in the abundance of one or more of 85 metabolites profiled in the same population [7].

Plant metabolic traits tend to be moderately heritable within species. More than half of metabolic features profiled in a rice diversity panel exhibited heritability coefficients (H2) generally greater than 0.5 and for nearly one-quarter, H2 exceeded 0.7 [8]. Metabolic quantification of a population of 289 maize genotypes identified 26 metabolites where one or more genetic markers were significantly associated with variance in abundance in the leaves of maize seedlings grown in controlled environment conditions including a locus on chromosome 9 associated with variation in lignin precursors [5]. More than 1,400 genetic loci were significantly associated with variation in the abundance of 983 metabolites measured in mature (dry) maize kernels [9]. Comparative GWAS for the abundance of metabolites in dried seeds conducted in rice and maize identified 420 and 292 loci associated with 123 metabolites in the two species, respectively. These hits included 42 associated with homologous loci in both species [10].

A combination of genome-wide association studies and transcriptome-wide association studies was able to link thirteen genes to variation in the abundance of tocochromanol (vitamin E) including five genes not previously linked to this metabolite [11].Transcriptome-wide association studies can provide advantages narrowing association signals down to single candidate genes for several reasons. TWAS appear to be less influenced by linkage disequilibrium than genome wide association studies and more frequently identify a single candidate gene per genomic interval as experimentally demonstrated by Li et al. (2021) [12]. This difference likely arises because gene expression reflects both cis- and trans-regulatory effects with trans-regulatory variation occurring independently even among closely linked genes Torres-Rodriguez (2025) [13]. As a result, the correlation in expression between neighboring genes is lower than the correlation of genotype status between nearby genetic markers. Furthermore, TWAS in plants typically uses the expression of genes as input, such value is describing a single gene ID. Thus, when the association test is performed the ID is linked to changes in the phenotype and no down-stream analysis is required to name a candidate gene. Combined genome-wide and transcriptome-wide association studies have also been used to identify loci associated with variation in seed oil content in Brassica napus [14]. This frequent use of seeds for population-level metabolic profiling [9,10,11, 14] is likely due to the significant economic and food security importance of crop seeds, as well as the practical challenges associated with collecting and sampling equivalent vegetative tissues from large populations grown under field-relevant conditions

Here we seek to identify loci associated with variation in metabolite abundance under field conditions in maize. We employ data generated from the Wisconsin Diversity panel, a population including maize lines developed in 15 countries across North and South America, Asia, Africa, and Europe, and includes representatives of dent corn, sweet corn, and popcorn [15].This panel has been previously resequenced, providing an extremely high marker density for genome-wide association studies [16], and we leveraged a parallel RNA-seq experiment with profiled gene expression using leaf samples collected from the same plants on the same day as those employed for quantification of metabolites [17].

Results

A total of 795 maize leaf tissue samples were initially quantified for 26 unique metabolites via Gas Chromatography-Mass Spectrometry (GC-MS), consisting of 88 repeated check samples of B97 and 47 biological replicates collected from different plants of the same genotype within the same experiment. In total, the dataset included 660 unique sets of measurements representing distinct maize genotypes. After excluding the repeated check samples, the dataset included 660 unique sets of measurements representing distinct maize genotypes. Several metabolites exhibited high biological repeatability across these replicates. Threonine showed the highest repeatability (r = 0.87), suggesting strong genetic control. Other metabolites such as raffinose (r= 0.82), chlorogenic acid (r= 0.81), malic acid (r= 0.80), and galactonic acid (r= 0.77) also displayed higher repeatability relative to other metabolites analyzed in this study. The average biological repeatability was 0.57 (Fig. 1A, Supplementary Figure S1). These values were compared to average biological repeatability of 0.75 for 28 whole plant traits [18], 0.42 for 10 hyperspectral traits [19], and 0.25 for 3 photosynthetic traits scored for the same maize genotypes within the same experiment (Supplementary Figure S2, Supplementary Figure S3, and Supplementary Figure S4).

Fig. 1
figure 1

Factors explaining variation in metabolite abundance across a maize diversity panel. A Estimated repeatability of measured metabolite abundance for each metabolite quantified in this study. Repeatability is defined as the proportion of total variance in metabolite abundance which can be explained by genotype in a dataset of 47 maize genotypes sampled and analyzed twice independently from different plants in the same field. B Proportion of total variance in each metabolite’s abundance explained by the factors genotype, batch, run order, and the remaining residual variance. Genetic variance in panel B is not equivalent to repeatability in panel A as a model with more factors was fit to explain variance in panel B

We did not observe strong evidence for similarities in metabolite abundance profiles among maize genotypes belonging to the same subpopulations (Supplementary Figure S5), However, a significant proportion of the non-genetic variance for some individual metabolites could be explained by variation between different batches of samples, which refers to groups of samples processed at different times or differences in data generated and analyzed earlier or later within a given batch during the GC-MS run (Fig. 1B). Several expected correlations were observed between different metabolites profiled, such as the positive correlation between glucose and fructose, both of which are sugars commonly involved in similar metabolic pathways, and between quinic acid and shikimic acid, which are both intermediates in the same biosynthetic pathway (Supplementary Figure S6). The abundance of a number of metabolites was also significantly correlated with whole plant phenotypes scored in the same field (Supplementary Figure S7). Shikimic acid abundance measured in the field at the late vegetative/early flowering stage showed a significant positive correlation with plant height (r = 0.23; p < 0.0001) and a significant negative correlation with percent ear fill (r = − 0.25; p < 0.0001) at harvest. Similarly, Beta-alanine abundance showed a significant positive correlation with 100 kernel mass (r = 0.22; p < 0.0001) at harvest.

After controlling for batch effects and order of quantification effects, 150 genetic markers were significantly associated with one or more of 26 metabolites at a resampling model inclusion probability (RMIP) threshold \(\ge\) 0.1 (Fig. 2). Among these 150 genetic markers, five were found to be associated with two different metabolites.The 17 most supported metabolite-genetic marker associations all exceeded RMIP \(\ge\) 0.3 (Table 1). These 17 metabolite-marker associations included two markers each associated with variation in the abundance of phosphoric acid, chlorogenic acid, galactonic acid, and trans-aconitic acid (eight marker-metabolite associations total). The remaining nine metabolite-marker associations consisted of one genetic marker each associated with variation in the abundance of glyceric acid, shikimic acid, L-serine, quinic acid, raffinose, sucrose, tyrosine, D-glucose, and fructose. The three most strongly supported metabolite-genetic marker associations exceeded RMIP \(\ge\) 0.5. These included two markers linked to variation in the abundance of phosphoric acid and one linked to variation in the abundance of galactonic acid. In seven cases a gene known to play a specific role in plant metabolism was located within 50 kilobases of a genetic marker linked to metabolic variation at RMIP \(\ge\) 0.3 and markers within the gene appeared to be in significant linkage disequilibrium with the genetic marker identified via GWAS (Supplementary Figure S8;Table 1). However, in many cases, the windows around individual metabolite abundance-associated genetic markers, defined by linkage disequilibrium, included multiple gene models, with a median of 4 gene models and a mean of approximately 5 gene models per interval.

Fig. 2
figure 2

Genetic markers associated with metabolite variation via resampling model inclusion probability genome-wide association. Each circle’s position on the x-axis represents the genomic position of a genetic marker in the maize genome, while its position on the y-axis indicates the proportion of resampling runs in which the marker was significantly associated with variation in the metabolite of interest via FarmCPU GWAS. For metabolites where at least one marker was associated with RMIP \(\ge\) 0.3, color indicates the specific metabolite a given marker is associated with, while markers associated with all other tested metabolites are shown in gray. The three horizontal dashed lines indicate RMIP = 0.1, RMIP = 0.3, and RMIP = 0.5. Text labels identify genes discussed in the next section that are near metabolite-associated markers. Alternating color horizontal lines along the x-axis indicate the start and end of each maize chromosome

Table 1 Location, support, and closest gene model for each RMIP GWAS hit \(\ge\)0.3 shown in Fig. 2

Unlike genome-wide association studies, transcriptome-wide studies can frequently provide single gene resolution [20], ameliorating the challenge of translating metabolite abundance associations to individual candidate genes. A previously published gene expression dataset generated from leaf tissue collected from the same plants at the same time as the leaf tissue samples employed for quantifying the abundance of metabolites [17] was utilized to conduct transcriptome-wide association studies (TWAS) for metabolite abundance. The abundance of only three of the 26 metabolites was significantly associated with the expression of individual genes in the same leaves at a Bonferroni-corrected significance threshold of 0.05 (Fig. 3; Supplementary Figure S9). These included four genes whose transcript abundance was significantly associated with variation in the abundance of glycerol 1-phosphate, and one gene each associated with variation in L-glutamic acid and quinic acid. Variation in the expression of the same gene Zm00001eb431150, which encodes a Cu(2+)-exporting ATPase (CUEA), was linked to both variations in the abundance of glycerol 1-phosphate and L-glutamic acid. The sole gene whose transcript abundance was associated with quinic acid was Zm00001eb147850, which is a multi-copper oxidase (MCO).

Fig. 3
figure 3

Transcripts associated with variation in metabolite abundance via transcriptome-wide association. Each circle’s position on the x-axis indicates the annotated location of a gene model on the maize genome and its position indicates the statistical significance of the link between variation in the expression of the primary transcript of that gene model and variation in the abundance of a specific metabolite indicated by the color of the dot. Results are shown only for the three metabolites where the significance of at least one transcript exceeded \(-\log _{10}(2.03 \times 10^{-6})\), corresponding to the Bonferroni-corrected p-value of 0.05 after correcting for the 24,585 transcripts tested. This threshold p-value is indicated via a horizontal dashed red line. The names of either the proteins encoded by genes above the threshold, or gene model IDs are labeled. Two separate genes, both encoding Cu(2+)-exporting ATPase (Cu(2+)-exporting ATPase) were associated with variation in the abundance of glycerol 1-phosphate, one of which was also associated with variation of L-glutamic acid. Alternating color horizontal lines along the x-axis indicate the start and end of each maize chromosome

We speculated that the limited number of transcripts associated with variation in the abundance of metabolites could reflect non-linear associations between transcript abundance and metabolite levels. As a complement to conventional TWAS which assumes linear relationships we adopted an explainable AI/random forest-based method [21], combined with a permutation-based estimate of expected background associations (Supplementary Figure S10) to identify those transcripts with the most predictive power to explain the abundance of the three metabolites for which at least one significant transcript association was identified via TWAS. A total of 26, 29, and 24 transcripts were identified which exceeded (FDR \(\le\) 0.05) for L-glutamic acid, quinic acid, and glycerol 1-phosphate respectively (Supplementary Figure S11). None of the transcripts identified via this method overlapped with genes located near trait-associated markers identified via GWAS, however, one of the six genes identified via TWAS, CUEA, was also identified in the random forest-based analysis. In addition, one of the genes associated with variation in the abundance of L-glutamic acid via random forest N-acetyl-gamma-glutamyl-phosphate reductase (argC) is known to play a critical role in the arginine biosynthesis pathway using glutamate as a precursor [22,23,24].

In three cases the same region of the maize genome was linked to variation in metabolite abundance and one of a set of 41 non-metabolic traits, including 28 whole plant phenotypes, 10 traits extracted from hyperspectral leaf reflectance, and three traits related to photosynthetic parameters. Out of 223 genetic markers associated with variation in these non-metabolite traits at a threshold of RMIP \(\ge\) 0.1 (Supplementary Figure S12), three were located within 100 kilobases of genetic markers associated with variation in the abundance of specific metabolites (RMIP \(\ge\) 0.1). The first of these three cases was an interval of less than 32 kilobases on chromosome 6 containing markers significantly associated with both variation in the abundance of L-serine in mature leaves and variation in the percent grain fill of ears at harvest. This interval contained a gene (Zm00001eb277100) encoding an aldehyde dehydrogenase (ALDH) an enzyme involved in the detoxification of aldehydes by catalyzing their conversion to carboxylic acids, which plays a role in various metabolic pathways, as well as the response to oxidative stress (Fig. 4). The second case involved an interval of approximately 64 kilobases on chromosome 1, where markers were significantly associated with both the variation in the abundance of mucic acid and the variation in the LV9 – latent variable 9 – hyperspectral reflectance derive trait. This interval contained a gene (Zm00001eb009750) encoding a transcriptional regulatory protein carrying a Myb/SANT-Like DNA-Binding Domain (Fig. 4). The third case was an interval of about 93 kilobases on chromosome 1 that contained markers significantly associated with both variation in the abundance of chlorogenic acid and variation in the number of branches per tassel. This interval included a gene (Zm00001eb035350) encoding a cyclin-dependent kinase (CDK), which is involved in cell cycle regulation (Fig. 4).

Fig. 4
figure 4

Genomic intervals in maize associated with variation in both metabolic and non-metabolic traits. Genomic intervals that include at least one genetic marker associated with a metabolic trait and at least one genetic marker associated with a nonmetabolic trait (RMIP \(\ge\) 0.1). For each panel, colored points indicate the positions and significance of GWAS hits as shown in Fig. 1, black lines indicate the positions of annotated genetic markers within the genomic interval shown, coloration in the triangles at the bottom of the figure indicates the decrease of correlation between different genetic markers in the region, black arrows indicate the positions and strands of annotated genes within the genomic interval shown and green and blue boxes indicate the positions of protein-coding exons and untranslated regions respectively. Candidate genes discussed in the main text or figure legend are labeled in red. A A region (Chromosome 1, 30.35 MB- 30.90 MB) containing one marker (chr1:30,626,701) significantly associated with mucic acid abundance and another (chr1:30,562,591) associated with LV9–latent variable 9–a hyperspectral reflectance data derived non-metabolite trait. B A region (Chromosome 1, 193.32 MB- 193.95 MB) containing one marker significantly associated with chlorogenic acid abundance and another marker (chr1:193,546,683) associated with the number of branches per tassel. C A region (Chromosome 6, 113.50 MB- 113.94 MB) containing one marker (chr6:113,734,505) significantly associated with L-serine abundance and another marker (chr6:113,703,113) associated with the proportion of the total length of maize ears which develop filled kernels (percent fill)

Discussion

Understanding the genetic determinants of intraspecies metabolic variation can help to understand causal relationships in plant metabolism and how plant metabolism determines whole plant phenotypes.However, plant metabolism, like transcript abundance, is dynamic and varies over time and in response to a wide range of environmental signals and perturbations, making it challenging to profile metabolite abundance in comparable conditions from large diversity panels under field-relevant conditions. Here we sought to mitigate the issues of environmental variation and diurnal cycling by employing a set of samples collected in a two-hour period from a large maize diversity panel grown in the field. Metabolite abundance is shaped by both genetic variation and environmental factors with G\(\times\)E interactions potentially playing a major role in influencing the observed metabolic differences. While we minimized environmental variation by collecting samples within a short time window, the findings of this study may not fully generalize to different genotypes or environmental conditions. Future studies should incorporate multi-location field trials to better account for G\(\times\)E interactions and enhance the transferability of results. However, this study represents the first large-scale metabolic analysis of maize under field conditions, establishing a foundation for further research into metabolic variation across diverse environments. The patterns of estimated relative abundance for a number of metabolites were reasonably repeatable between independently collected biological samples from the same genotypes (Fig. 1A) even before correcting for a number of experimental factors that influenced estimated abundance (Fig. 1B). The repeatability for metabolite traits was, in general, lower than equivalent values for whole plant phenotypes scored in the same population (Supplementary Figure S2, Supplementary Figure S3, and Supplementary Figure S4).

Some of the lower repeatability of metabolite abundance estimates may be explained by quantification error, patterns of diurnal changes even within the two-hour window of collection, and the high plasticity of plant primary metabolism. A previous analysis of the transcript abundance data collected as part of the same experiment identified substantial variation in the expression of transcripts encoding core portions of the circadian clock between the first and last samples collected, but overall less than 5% of all expressed genes exhibited a correlation with order of collection which exceeded R2 = 0.10 [17]. However, it should also be kept in mind that typical protocols for scoring many whole plant traits represent averages or aggregate assessments across several (plant height) to dozens (flowering time) of genetically identical plants and these aggregate assessments will tend to reduce residual variance relative to measurements collected from only a single plant. While it remains cost and labor-prohibitive to quantify metabolite abundance in multiple replicated samples from each genotype, as the sampling procedure improves, it may become feasible to collect and pool samples from larger numbers of plants within a single plot, increasing the repeatability of field-measured metabolite abundance.

In any case, genome-wide association studies conducted using dense resequencing-based marker data and the metabolite abundance data generated in this study identified a substantial number of genetic markers that were significantly associated with variation in a number of the metabolites profiled, including 17 marker-trait associations with the highest RMIP scores (all \(\ge\) 0.3) (Fig. 2). These marker-trait associations were each associated with variation in the abundance of 13 different metabolites (Table 1). Notably, four of these associations exceeded RMIP scores (all \(\ge\) 0.5), providing additional support for their reliability. While RMIP (all \(\ge\) 0.3) is sometimes considered moderate, previous studies [25, 26] have successfully validated candidate genes identified at even lower thresholds (RMIP \(\ge\) 0.2) through loss-of-function analysis in related plant species. Given the complexity of metabolic traits and the genetic architecture underlying metabolite abundance, even associations at these thresholds can provide meaningful insights. In seven cases, these markers were located within 50 kilobases of a gene known to play a specific role in plant metabolism, although not necessarily within the expected metabolic pathway. Often, windows defined by linkage disequilibrium around individual trait-associated markers included multiple annotated genes, including genes of unknown function or possessing only extremely general functional annotations. The ability to link trait-associated genetic markers with a single high-confidence candidate gene remains a major challenge and limitation of genome-wide association studies, even in species such as maize where linkage disequilibrium decays rapidly [27].

Transcriptome-wide association studies can frequently link specific candidate genes to roles in determining plant phenotypes [17], even in species with elevated linkage disequilibrium [20]. In our case, we had access to gene expression data generated using tissue samples collected from the same leaves of the same plants at the same time as the samples employed for metabolite profiling. In principle, this design should further increase the power for linking transcripts and metabolites as even variation in transcripts induced by non-genetic factors (e.g. diurnal changes, local environmental variation within the field,differences in the developmental stage of the fourth leaf from the top of different maize plants and/or differences micro environments these leaves experienced in the minutes or hours prior to sampling) could improve the degree of correlation between transcript abundance and downstream metabolic consequences of those same abundance changes. However, despite these advantages, we identified only six significant transcript-metabolite associations via TWAS (Fig. 3) and no associations at all for twenty-three of the twenty-six metabolites evaluated. The successful results from GWAS suggest the relatively poor performance of TWAS on the same dataset cannot simply be attributed to the quality/repeatability of the metabolite abundance data. Current best practices for TWAS emphasize focusing on transcripts expressed in more than 50% of the samples examined. This may bias against the discovery of transcripts that exhibit the presence or absence of variation among maize genotypes [28] and have the potential to play important roles in plant metabolism [29, 30]. In addition, tests for transcript-metabolite associations via transcriptome-wide association studies assume linear relationships between transcript abundance and metabolite abundance. In many causal relationships between transcripts and metabolites, this assumption may not hold true.

We implemented an explainable AI approach based on the random forest algorithm to search for transcripts that exhibit variation in the abundance of specific metabolites in either linear or non-linear fashion [21]. A total of 79 transcripts were linked to the abundance of three metabolites at an FDR threshold of \(\le\) 0.05, defined based on permutation (Supplementary Figure S10). Notably, in this analysis, N-acetyl-gamma-glutamyl-phosphate reductase (argC) was linked to variation in the abundance of L-glutamic acid. ArgC catalyzes the reduction of N-acetylglutamate 5-phosphate to N-acetylglutamate 5-semialdehyde in the arginine biosynthesis pathway. This step is critical as it represents a committed stage in the production of arginine. Glutamic acid serves as a precursor in this pathway, and thus its accumulation can be tightly linked to ArgC activity. The proper functioning of ArgC is crucial not only for arginine synthesis but also for overall nitrogen metabolism in plants, impacting growth and stress responses [22,23,24]. However, the random forest method was also unable to identify transcripts significantly associated with the majority of measured metabolites. This could be explained by measurement error as a result of the limited replication of both metabolite and transcript abundance datasets, limited statistical power as a result of the limited number of total genotypes present in our study, and/or a greater role for post-transcriptional regulation or peripheral rather than core genes in determining genetically controlled variation in metabolite abundance in maize. Therefore, the results obtained from this random forest method must be interpreted with some caution. While the method allows the identification of transcripts with nonlinear relationships to metabolites, our current implementation does not rigorously control for the confounding influences of population structure and kinship, which can produce false positives in genome-wide association studies, although the potential for similar effects in transcriptome-wide studies remains less clear. We applied a Bonferroni corrected significance threshold to correct for the effects of multiple testing in both the GWAS and TWAS analyses. This represents a conservative approach and the Bonferroni method likely overestimates the number of effective independent tests in cases like GWAS and TWAS where tested variables are correlated with each other as a result of population structure and linkage disequilibrium (GWAS) or coexpression (TWAS). However, Bonferroni correction remains a widely accepted standard in GWAS studies and we adopted this method for both GWAS and TWAS methods to maintain consistency and comparability. Alternative approaches to controlling for false discovery and multiple testing such as the Benjamini-Hochberg procedure would likely identify larger sets of associations as significant. The RF analysis did not generate p-values and, as a result, it was not possible to apply the Bonferroni correction. Instead, for RF, we estimated false discovery rates using a permutation-based method. Specifically, we generated 20 control datasets by shuffling the pairing of expression data and metabolite abundance across genotypes while keeping other variables constant. The distribution of feature importance scores for models trained on the shuffled data was used to establish a significance threshold, corresponding to an approximate FDR of 0.05 (Supplementary Figure S11).

Of the three methods we employed-GWAS, TWAS, and random forest-GWAS is by far the most generally accepted and widely used. However, we were particularly interested in whether genetic variants that altered plant metabolism would also exhibit impacts on whole plant phenotypes. When a set of 41 non-metabolic traits, including whole plant phenotypes, hyperspectral leaf reflectance, and photosynthetic parameters were analyzed using the same GWAS approach employed for metabolite analysis, three GWAS hits were identified in reasonable proximity to GWAS hits for variation in metabolite abundance (Fig. 4). The first case, on chromosome 6, involved a 32-kilobase interval associated with L-serine abundance and percent grain fill, containing a gene encoding aldehyde dehydrogenase. The second case, on chromosome 1, involved a 64-kilobase interval linked to mucic acid abundance and the LV9 trait, containing a gene encoding a Myb/SANT-like DNA-binding domain. The third case, also on chromosome 1, involved a 93-kilobase interval associated with chlorogenic acid abundance and tassel branching, containing a gene encoding cyclin-dependent kinase. The small total number of common genomic intervals identified between metabolite and non-metabolite traits was somewhat unexpected. While some metabolite-trait associations, such as those between Shikimic acid and plant height or Beta-alanine and 100 kernel mass, exhibit relatively weak but statistically significant correlations and such results are expected in complex metabolic networks where multiple genetic and environmental factors contribute to trait variation. Weak correlations are expected because metabolic pathways operate as highly interconnected networks, where individual metabolites are regulated by multiple upstream and downstream processes. Even small associations may indicate meaningful biological relationships, providing a basis for further investigation. These findings highlight the need for larger datasets and additional validation and could be further explored in broader studies across multiple environments to better understand the metabolic-genetic links underlying whole-plant phenotypes.The use of expanded populations, increased replication, improved protocols for collecting tissue samples, and improved methods for dealing with non-linear interactions may be necessary, either individually or jointly to improve the detection of genetic variants which impact both plant metabolism and non-metabolic traits.

Materials and methods

Field experiments and trait scoring

The maize field experiment from which the plant phenotypes, gene expression data, and metabolite abundance data employed in this study were collected was conducted in the summer of 2020 at the Havelock farm of the University of Nebraska-Lincoln (40.852\(^{\circ }\)N, 96.616\(^{\circ }\)W). The field was laid out in a randomized complete block design on May 6, 2020, consisting of two replications of each genotype. A total of 1680 plots, with each block consisting of 840 plots including 660 entries from the Wisconsin Diversity panel [15], and the remaining plots consisting of a repeated check line (B97). The layout for each plot consisted of two rows, each 7.5 (about 2.3 meters) feet long, with rows spaced 30 (roughly 0.76 meters) inches apart. Plants within the rows were placed 4.5 (approximately 11.5 centimeters) inches apart from each other, and the plots were separated by 30-inch (around 0.76 meters) alleyways. The experimental design and trait evaluation methodology conducted in the Lincoln, Nebraska field trial has also been previously described [17, 18, 25, 31].

Quantification of metabolite abundance

On July 8th, 2020, when the majority of plots were at the late vegetative or tasseling (VT) stage, duplicate leaf tissue samples were collected from one representative plant per plot in block 1 which consisted of the 840 blocks on the western side of the overall field experiment. Each sample consisted of five leaf disks sampled from the pre-ante-penultimate leaf (the fourth leaf down from the top) of the chosen plant. The leaf tissue was immediately subjected to flash freezing in liquid nitrogen and subsequently stored on dry ice until it could be transferred to a freezer at − 80 °C. This collection was performed by seven researchers in parallel with all samples collected in two hours of a single day, with collection ending before noon.

One sample per plot was employed for quantification of metabolite abundance. Frozen leaf samples were ground to a fine powder using TissueLyser II (Qiagen). Samples of approximately 25 mg of ground tissue were extracted from each set of ground tissue, precisely weighed, and mixed with 700 \(\mu\)L of methanol and 30 \(\mu\)L of 20 mg/mL ribitol in a 2 ml Eppendorf microfuge by vortexing and stored on ice. Sample tubes were shaken for 15 minutes at 950 rpm on thermomixer at 70 °C. Samples were then spun at 17,000 g for 10 minutes and the resulting supernatant was transferred to a new tube and mixed with 325 \(\mu\)L chloroform and 750 \(\mu\)L water by vortexing for 30 seconds. Samples were spun at 1500 g for 15 minutes. Finally, an aliquot of 50 \(\mu\)L from the upper polar phase was transferred into a fresh 2 mL tube and dried with a centrifugal vacuum concentrator. After vacuum drying, each tube was filled with argon gas and tightly closed to prevent the oxidation of metabolites.

Dried metabolite extracts were derivatized by methoxyamination in 20 mg/mL methoxyamine hydrochloride in pyridine for two hours at 37 °C. The samples were further trimethylsylilated for 30 minutes at 37 °C with 70 \(\mu\)L N-Methyl-N-(trimethylsilyl) trifluoroacetamide (Millipore Sigma). A fatty acid methyl ester mixture was added to the trimethylsylilation solution for retention time calibration. One microliter of each sample was injected into a GC-MS (7200 GC-QTOF system, Agilent) equipped with a HP5 msUI (30 m length, 0.25 mm diameter, 0.25 \(\mu\)m thickness) column. GC and MS parameters are exactly as described [32].

Chromatographic peaks were annotated using MassHunterUnknowns (Agilent) to match retention time and mass spectrum with data in the Fiehn Metabolomics Library (Agilent). Manual curation was used to subset peaks to those which could be confidently identified across all samples in all runs, resulting in a final set of 26 metabolites matched to peaks: aspartic acid, \(\beta\)-alanine, caffeic acid, chlorogenic acid, citric acid, glucose, glucose- 6-phosphate, fructose, galactonic acid, glyceric acid, glycerol- 1-phosphate, glutamic acid, loganin, alanine, threonine, malic acid, mucic acid, myo-inositol, phosphoric acid, quinic acid, raffinose, serine, shikimic acid, sucrose, trans-aconitic acid, and tyrosine. The analysis was conducted over 12 unique batches, with the number of samples per batch varying from 26 to 147, across a total of 27 runs. After subtracting the background noise, the abundance of metabolite was estimated based on the peak height of the representative ion for each metabolite, normalized against the internal standard ribitol, and adjusted for the exact fresh weight of the samples used for extraction. These initial estimates of relative metabolite content were log-transformed prior to downstream analysis.

Initial estimates of metabolite abundance were generated for 795 samples, including 88 observations of B97, the repeated check, and duplicate biological samples collected from different plants in the same plot for 47 additional genotypes. After applying quality control (QC) procedures, which involved removing samples with incomplete data, outliers, or those with inconsistencies in metabolite measurements, data for at least one sample of 660 unique maize genotypes were retained, including 47 genotypes where metabolite abundance was quantified for two duplicate samples collected from separate plants. Although B97 was used as a repeated check, it was not explicitly employed to correct for batch effects. Instead, its inclusion allowed for the assessment of consistency in metabolite measurements across different batches

Non-metabolite datasets employed in this study

The 28 whole plant phenotypes employed in this study were collected from the same field experiment and the procedure used to measure them, as well as the specific trait values employed are described previously [18]. The ten latent variables employed in this study were also generated from the same field experiment, with hyperspectral leaf reflectance data collected from one leaf per plot using a spectroradiometer [33] and the resulting data summarized into ten variables through the use of an autoencoder [19]. Fv_P/Fm_P (maximum efficiency of PSII in the light), relative chlorophyll, and leaf temperature were measured in the same field experiment [26] using MultiSpeq v2 instruments [34]. Gene expression was quantified via RNA-seq from the second set of leaf tissue samples collected in parallel with those employed for metabolite quantification [17].

Quantitative Genetic Analyses

Best Linear Unbiased Estimates (BLUES) for each metabolite were estimated using a mixed linear model, generated using the lme4 package [35] implemented in R v4.2.1 [36] with the equation:

$$\begin{aligned} y_{ijk} = \mu + \text {Genotype}_{i} + (1|\text {Batch}_{j}) + (1|\text {Run Order}_{k}) + \text {error}_{ijk} \end{aligned}$$

where \(y_{ijk}\) is the mean value for the metabolite of interest in the \(i^{\text {th}}\) genotype, run in the \(j^{\text {th}}\) batch and \(k^{\text {th}}\) run order during the GC-MS pipeline. The variance explained by each factor included in the model was extracted. Repeatability for the estimated abundance of each metabolite was determined using data from 47 genotypes where two independently collected samples were separately processed and quantified. The repeatability was calculated using the following simplified model:

$$\begin{aligned} R = \frac{\sigma _{G}^2}{\sigma _{G}^2 + \frac{\sigma _{E}^2}{N}} \end{aligned}$$

where:

  • \(\sigma _{G}^{2}\) is the genotypic variance, representing the variance explained by genotype.

  • \(\sigma _{E}^2\) is the residual variance, which includes environmental factors and measurement errors.

  • \(N\) is the number of replicates per genotype (two in this case), adjusting the residual variance accordingly.

Principal component analysis of metabolic abundance data was performed using the FactoMineR R package [37]. Before conducting GWAS and TWAS, we manually examined distributions of BLUEs for each metabolite and set cutoffs for each trait to remove a subset of extreme values (Supplementary Figure S10) to reduce violations of the normality of residuals assumed by the FarmCPU approach to conducting GWAS. While it is unclear whether the removed values represent technical errors or biologically valid extreme trait values, it is important to note that their removal will not introduce false positive associations into our dataset although it could in some cases increase the number of false negatives. On average, 3 values were removed per metabolite, with a maximum of 6 values removed for any single metabolite. The specific cutoffs applied for each metabolite are shown in Supplementary Figure S13.

Resampling model inclusion probability genome-wide association

Genome-wide association studies were performed on both metabolite abundance and non-metabolite traits using a set of 660 maize genotypes that had undergone metabolite abundance measurement and passed quality control. A set of 9,794,508 segregating biallelic SNP markers was generated by filtering a larger set of 46 million markers genotyped via resequencing of the Wisconsin Diversity panel [16] to retain only those with a minor allele frequency greater than 0.05 and heterozygosity less than 0.05 among the 660 maize genotypes included in this study using plink2 (v2.0a1) [38]. The Fixed and Random model Circulating Probability Unification (FarmCPU) algorithm [39], implemented in the rMVP package [40], was run 100 times for each phenotype with a different random subset of 10% of phenotypic records masked in each iteration [41]. Four principal components (PCs) calculated from genetic marker data were included in the analysis to control for the confounding effects of population structure. Genetic markers were considered significantly associated with a trait of interest in a given iteration when they exceeded a p-value threshold of \(5 \times 10^{-9}\) set using a Bonferroni correction, calculated as 0.05 divided by the total number of SNPs used in the analysis. A given marker-trait association was considered significant if it was identified in at least ten out of the one hundred total GWAS runs conducted, corresponding to a resample model inclusion probability (RMIP) of 0.1.

Transcriptome-wide association study

TWAS analyses described in this study were conducted using the expression levels measured via RNA-seq [17] and the compressed mixed linear model [42] as implemented in the Genomic Association and Prediction Integrated Tool (GAPIT) [43]. For improved comparability, we utilized the same set of 24,585 genes used in the study by Torres et al. (2024) [17], which were selected based on the following criteria: each gene or transcript that passed the quality filtering process (24,585 gene models with expression \(\ge 0.1\) TPM in at least 347 of the remaining 693 genotypes) was converted to a range from 0 to 2 using the methodology described by Li et al. (2021) [12]. Briefly, the 5% of samples with the lowest transcripts per million (TPM) values for each gene were scored as 0, the 5% of samples with the highest TPM values for each gene were scored as 2, and the remaining 90% of samples were re-scaled between 0 and 2 using the formula:

$$\begin{aligned} \frac{2 \times (\text {sampleTPM} - \text {5th percentile TPM})}{\text {95th percentile TPM} - \text {5th percentile TPM}} \end{aligned}$$

These data were generated using 693 maize genotypes constituting a superset of the 660 maize genotypes employed in this study. The first three principal components of variation calculated by GAPIT from the expression data were included as covariates. Additionally, a kinship matrix was calculated using the VanRaden method [44], which was used to control for the relatedness among genotypes. A gene was considered significantly associated with the trait of interest when the associated p-value was less than \(2.03 \times 10^{-6}\), corresponding to a Bonferroni corrected p-value of 0.05, considering the number of expressed genes employed in the TWAS analysis.

Random forest-based explainable AI

The random forest algorithm [45] was employed to predict the genes associated with the metabolites in 660 genotypes using the same gene expression data used for transcriptome-wide association studies. The randomForest package in R [46] was used to build the random forest models with five different tree counts (100, 200, 300, 400, and 500). The caret package in R [47] was then utilized to facilitate 5-fold cross-validation, evaluate model performance using root mean square error (RMSE), and calculate feature importance based on the increase in mean squared error (IncMSE) when a gene was excluded from the model. Twenty control sets were created by shuffling the taxa order while keeping other variables constant. Models were trained, and feature importance scores were calculated using both the original and shuffled datasets. For each metabolite, a threshold corresponding to a false discovery rate of approximately 0.05 was selected based on a comparison of the feature importance scores reported for shuffled datasets.

Data availability

The data utilized in this study, including metabolite data for 26 metabolites (Supplementary Data File 1), markers associated with 26 metabolite traits and 41 non-metabolite traits identified by GWAS (Supplementary Data File 2), genes associated with three specific metabolites identified by Random Forest (Supplementary Data File 3), and the peaks, retention times, and quantitative ion masses of the 26 metabolites (Supplementary Data File 4), is publicly available at https://doiorg.publicaciones.saludcastillayleon.es/10.6084/m9.figshare.26543479.

References

  1. Rai A, Saito K, Yamazaki M. Integrated omics analysis of specialized metabolism in medicinal plants. Plant J. 2017;90(4):764–87.

    Article  CAS  PubMed  Google Scholar 

  2. Fang C, Fernie AR, Luo J. Exploring the diversity of plant metabolism. Trends Plant Sci. 2019;24(1):83–98.

    Article  CAS  PubMed  Google Scholar 

  3. Medeiros DB, Brotman Y, Fernie AR. The utility of metabolomics as a tool to inform maize biology. Plant Commun. 2021;2(4):100187.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Lisec J, Meyer RC, Steinfath M, Redestig H, Becher M, Witucka-Wall H, et al. Identification of metabolic and biomass QTL in Arabidopsis thaliana in a parallel analysis of RIL and IL populations. Plant J. 2008;53(6):960–72.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Riedelsheimer C, Lisec J, Czedik-Eysenberg A, Sulpice R, Flis A, Grieder C, et al. Genome-wide association mapping of leaf metabolic profiles for dissecting complex traits in maize. Proc Natl Acad Sci. 2012;109(23):8872–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Toubiana D, Semel Y, Tohge T, Beleggia R, Cattivelli L, Rosental L, et al. Metabolic profiling of a mapping population exposes new insights in the regulation of seed metabolism and seed, fruit, and plant relations. PLoS Genet. 2012;8(3):e1002612.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Carreno-Quintero N, Acharjee A, Maliepaard C, Bachem CW, Mumm R, Bouwmeester H, et al. Untargeted metabolic quantitative trait loci analyses reveal a relationship between primary metabolism and potato tuber quality. Plant Physiol. 2012;158(3):1306–18.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Chen W, Gao Y, Xie W, Gong L, Lu K, Wang W, et al. Genome-wide association analyses provide genetic and biochemical insights into natural variation in rice metabolism. Nat Genet. 2014;46(7):714–21.

    Article  CAS  PubMed  Google Scholar 

  9. Wen W, Li D, Li X, Gao Y, Li W, Li H, et al. Metabolome-based genome-wide association study of maize kernel leads to novel biochemical insights. Nat Commun. 2014;5(1):3438.

    Article  PubMed  Google Scholar 

  10. Chen W, Wang W, Peng M, Gong L, Gao Y, Wan J, et al. Comparative and parallel genome-wide association studies for metabolic and agronomic traits in cereals. Nat Commun. 2016;7(1):12767.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Wu D, Li X, Tanaka R, Wood JC, Tibbs-Cortes LE, Magallanes-Lundback M, et al. Combining GWAS and TWAS to identify candidate causal genes for tocochromanol levels in maize grain. Genetics. 2022;221(4):iyac091.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Li D, Liu Q, Schnable PS. TWAS results are complementary to and less affected by linkage disequilibrium than GWAS. Plant Physiol. 2021;186(4):1800–11.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Torres-Rodríguez JV, Li D, Schnable JC. Evolving best practices for transcriptome-wide association studies accelerate discovery of gene-phenotype links. Curr Opin Plant Biol. 2025;83:102670.

    Article  PubMed  Google Scholar 

  14. Tang S, Zhao H, Lu S, Yu L, Zhang G, Zhang Y, et al. Genome-and transcriptome-wide association studies provide insights into the genetic basis of natural variation of seed oil content in Brassica napus. Mol Plant. 2021;14(3):470–87.

    Article  CAS  PubMed  Google Scholar 

  15. Mazaheri M, Heckwolf M, Vaillancourt B, Gage JL, Burdo B, Heckwolf S, et al. Genome-wide association analysis of stalk biomass and anatomical traits in maize. BMC Plant Biol. 2019;19:1–17.

    Article  Google Scholar 

  16. Grzybowski MW, Mural RV, Xu G, Turkus J, Yang J, Schnable JC. A common resequencing-based genetic marker data set for global maize diversity. Plant J. 2023;113(6):1109–21.

    Article  CAS  PubMed  Google Scholar 

  17. Torres-Rodríguez JV, Li D, Turkus J, Newton L, Davis J, Lopez-Corona L, et al. Population-level gene expression can repeatedly link genes to functions in maize. Plant J. 2024;119(2):844–60.

  18. Mural RV, Sun G, Grzybowski M, Tross MC, Jin H, Smith C, et al. Association mapping across a multitude of traits collected in diverse environments in maize. GigaScience. 2022;11:giac080.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Tross MC, Grzybowski MW, Jubery TZ, Grove RJ, Nishimwe AV, Torres-Rodriguez JV, et al. Data driven discovery and quantification of hyperspectral leaf reflectance phenotypes across a maize diversity panel. Plant Phenome J. 2024;7(1):e20106.

    Article  Google Scholar 

  20. Li D, Wang Q, Tian Y, Lyu X, Zhang H, Sun Y, et al. TWAS facilitates gene-scale trait genetic dissection through gene expression, structural variations, and alternative splicing in soybean. Plant Commun. 2024;5(10):101010.

  21. Shah SH, Angel Y, Houborg R, Ali S, McCabe MF. A random forest machine learning approach for the retrieval of leaf chlorophyll content in wheat. Remote Sens. 2019;11(8):920.

    Article  Google Scholar 

  22. Baich A, Vogel HJ. N-acetyl-\(\gamma\)-glutamokinase and N-acetylglutamic \(\gamma\)-semialdehyde dehydrogenase: Repressible enzymes of arginine synthesis in Escherichia coli. Biochem Biophys Res Commun. 1962;7(6):491–6.

  23. Majumdar R, Barchi B, Turlapati SA, Gagne M, Minocha R, Long S, et al. Glutamate, ornithine, arginine, proline, and polyamine metabolic interactions: the pathway is regulated at the post-transcriptional level. Front Plant Sci. 2016;7:78.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Charlier D, Bervoets I. Regulation of arginine biosynthesis, catabolism and transport in Escherichia coli. Amino Acids. 2019;51:1103–27.

    Article  CAS  PubMed  Google Scholar 

  25. Sahay S, Grzybowski M, Schnable JC, Glowacka K. Genetic control of photoprotection and photosystem II operating efficiency in plants. New Phytol. 2023;239:1068–82.

    Article  CAS  PubMed  Google Scholar 

  26. Ali W, Grzybowski M, Torres-Rodriguez JV, Li F, Shrestha N, Mathivanan RK, et al. Quantitative genetics analysis of photosynthesis-related traits of maize in the field. BioRxiv. 2024; https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2024.11.25.625283.

  27. Tibbs Cortes L, Zhang Z, Yu J. Status and prospects of genome-wide association studies in plants. Plant Genome. 2021;14(1):e20077.

    Article  CAS  PubMed  Google Scholar 

  28. Swanson-Wagner RA, Jia Y, DeCook R, Borsuk LA, Nettleton D, Schnable PS. All possible modes of gene action are observed in a global comparison of gene expression in a maize F1 hybrid and its inbred parents. Proc Natl Acad Sci. 2006;103(18):6805–10.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Paschold A, Marcon C, Hoecker N, Hochholdinger F. Molecular dissection of heterosis manifestation during early maize root development. Theor Appl Genet. 2010;120:383–8.

    Article  PubMed  Google Scholar 

  30. Paschold A, Jia Y, Marcon C, Lund S, Larson NB, Yeh CT, et al. Complementation contributes to transcriptome complexity in maize (Zea mays L.) hybrids relative to their inbred parents. Genome Res. 2012;22(12):2445–54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Sun G, Mural RV, Turkus JD, Schnable JC. Quantitative resistance loci to southern rust mapped in a temperate maize diversity panel. Phytopathology. 2022;112(3):579–87.

    Article  CAS  PubMed  Google Scholar 

  32. Wase N, Abshire N, Obata T. High-throughput profiling of metabolic phenotypes using high-resolution GC-MS. In: High Throughput Plant Phenotyping: Methods and Protocols. New York: Springer; 2022. pp. 235–60.

  33. Malvern Panalytical Ltd . FieldSpec4 Spectroradiometer. Previously Analytical Spectral Devices. 2023. https://www.malvernpanalytical.com. Accessed 30 Apr 2025.

  34. Kuhlgert S, Austic G, Zegarac R, Osei-Bonsu I, Hoh D, Chilvers MI, et al. MultispeQ Beta: a tool for large-scale plant phenotyping connected to the open PhotosynQ network. R Soc Open Sci. 2016;3(10):160592.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Bates DM. lme4: Mixed-effects modeling with R. New York: Springer; 2010.

  36. R Core Team. R: A language and environment for statistical computing. https://www.R-project.org/. Accessed 30 Apr 2025.

  37. Lê S, Josse J, Husson F. FactoMineR: an R package for multivariate analysis. J Stat Softw. 2008;25:1–18.

    Article  Google Scholar 

  38. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4(1):s13742-015.

    Article  Google Scholar 

  39. Liu X, Huang M, Fan B, Buckler ES, Zhang Z. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet. 2016;12(2):e1005767.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Yin L, Zhang H, Tang Z, Xu J, Yin D, Zhang Z, et al. rMVP: a memory-efficient, visualization-enhanced, and parallel-accelerated tool for genome-wide association study. Genomics Proteomics Bioinformatics. 2021;19(4):619–28.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Valdar W, Holmes CC, Mott R, Flint J. Mapping in structured populations by resample model averaging. Genetics. 2009;182(4):1263–77.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 2010;42(4):355–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Wang J, Zhang Z. GAPIT version 3: boosting power and accuracy for genomic association and prediction. Genomics Proteomics Bioinforma. 2021;19(4):629–40.

    Article  Google Scholar 

  44. VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–23.

    Article  CAS  PubMed  Google Scholar 

  45. Breiman L. Random forests. Mach Learn. 2001;45:5–32.

    Article  Google Scholar 

  46. Liaw A, Wiener M, et al. Classification and regression by randomForest. R News. 2002;2(3):18–22.

    Google Scholar 

  47. Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008;28:1–26.

    Article  Google Scholar 

Download references

Funding

This project was supported by the Department of Energy under Award Nos. DE-AR0001367 and DE- SC0023138, the Foundation for Food and Agriculture Research under Grant No. 602757, and the National Science Foundation under Grant No. IOS- 2332611, and US Department of Agriculture - National Institute of Food and Agriculture under the AI Institute: for Resilient Agriculture, Award No. 2021 - 67021 - 35329.

Author information

Authors and Affiliations

Authors

Contributions

J.C.S., T.O., and R.V.M. conceived the experiments, and C.P. and J.T. conducted experiments and generated data. R.K.M., J.V.T.R.,W.A, and N.S. analyzed the data and interpreted the results. R.K.M. generated the figures. R.K.M. drafted the manuscript with input from J.V.T.R., T.O., and J.C.S. All authors contributed to the editing of the final manuscript.

Corresponding author

Correspondence to James C. Schnable.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

All authors approve of submission and publication.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mathivanan, R., Pedersen, C., Turkus, J. et al. Transcripts and genomic intervals associated with variation in metabolite abundance in maize leaves under field conditions. BMC Genomics 26, 434 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-025-11580-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-025-11580-3