# Integration of absolute multi-omics reveals dynamic protein-to-RNA ratios and metabolic interplay within mixed-domain microbiomes

Sep 18, 2020

### Taxonomic and functional resolution of the omics

In order to explore the RNA/protein dynamics in a microbiome setting, we first needed to characterize our test community over time at the molecular level. We previously genomically reconstructed and resolved the SEM1b community, retrieving 11 metagenome-assembled genomes (MAGs) as well as two isolate genomes (see “Methods” section)10, covering the taxonomic and functional niches that are required to convert cellulosic material to methane/CO2 in an anaerobic biogas reactor15. Taxonomic analysis of both 16S rRNA genes and the MAGs of SEM1b inferred population-level affiliations to Rumini(Clostridium) thermocellum (RCLO1), Clostridium sp. (CLOS1), Coprothermobacter proteolyticus (COPR1, BWF2A, SW3C), Tepidanaerobacter (TEPI1-2), Synergistales (SYNG1-2), Tissierellales (TISS1), and the methanogen Methanothermobacter (METH1)10, as depicted in Fig. 1a. Herein we estimated that the total genomic potential of SEM1b includes 39,144 open reading frames (ORFs) (Supplementary Data 1). Since ORFs with very high sequence similarity may produce RNAs and proteins that are indistinguishable in MT and MP data, all the ORFs were gathered into ORF-groups (ORFGs) during the MT and MP data processing (see “Methods” section), where a singleton ORFG is defined as a group with a single ORF, and thus a single gene. Using this approach, our MT and MP data identified 12552 (96% singleton) and 3235 (78% singletons) highly transcribed and translated ORFGs, respectively. The discrepancy between the singleton percentages was as expected, due to the fact that variations in DNA/RNA sequences are greater than in proteins since different codons can code for the same amino acid (codon degeneracy). Degeneracy also implies that the chance to distinguish between homologous genes using MT is greater than using MP. Previous MG analyses using assembly algorithms have shown that genomic regions difficult to assemble in a given environmental contig can harbor variants from multiple, closely related strains, which can be further linked to normal strain-level variability within a population and species divergence16,17,18. Within SEM1b, the ORFGs that contained multiple homologous ORFs predominantly originated from several strains of a single species. For example, in the MT, 444 non-singleton ORFGs (88% of the total) contained ORFs from different strains of the same species, whilst this was the case for 294 ORFGs (32%) in the MP.

All ORFs were annotated using KEGG Ontology (KO), and at least one term was found for 19070 (49%) representatives from our complete data set (Supplementary Data 2). The predominant ORF annotations included Membrane transport, Carbohydrate metabolism, Translation, Amino acid metabolism, and Replication and repair (Supplementary Fig. 1). As expected, these functional categories were also among the top five most abundant for the MT, and top six in MP (plus Energy metabolism), although in a different order. The Membrane transport category is poorly represented in the MP (2% of the terms), which is likely explained by well-known technical issues with the gel-based sample preparation method that we used, which limits the extraction of transmembrane proteins19. The most abundant annotation categories mentioned above are all in line with the community function of cellulose degradation. The abundance ranking of the KO categories was assessed using the Kendall τ, which takes values from −1 (opposite direction of the ranking) to +1 (total agreement in ranking). Its score is interpreted as a correlation measure; however, it is more conservative. The ranking is largely preserved from MG to MT (Kendall τ: 0.77, p < 10−8) and from MT to MP (τ 0.74, p < 10−6) while less so from MG to MP (τ 0.68, p < 10−5). The results show that the functional potential observed in the genomes is more preserved in the diversity of produced transcripts than in the produced proteins and thus hints to post-transcriptional regulation having an important role in addition to transcriptional regulation in prokaryotes.

### Protein-to-RNA ratios within a mixed-domain microbiome

To determine whether or not microbial RNA/protein dynamics vary between ecological status (isolate vs community), metabolic state and/or taxonomic phylogeny, we quantified and resolved the numbers of transcript and protein molecules per sample (i.e., the total SEM1b consortia within each 60 ml flask, see “Methods” section), which averaged 3.8 × 1012 (range 3.45 × 1011–1.10 × 1013) and 2.2 × 1015 (range 2.88 × 1014–3.46 × 1015), respectively (Supplementary Data 3 and 4). Microbial cell volume and associated transcriptome size has been shown to change in yeast according to cell status (proliferation vs. quiescence), while the proteome is merely reshaped in its composition between these states20. In our case, the number of total transcripts per sample increased more than three-fold during the first 15 h (from ~1.2 × 1012 in t1 to ~4.0 × 1013 in t4) in the SEM1b consortium’s life cycle and then decreased sharply, whereas the number of proteins per sample reached a plateau after 18 h post-inoculation at ~2.7 × 1015 molecules. SEM1b approximated the exponential growth phase in t3 (18 h), therefore we used the protein-to-RNA ratio from this time point for comparison against previously reported axenic estimates2,21,22,23,24. The replicate-averaged protein-to-RNA ratio for the bacteria in SEM1b ranges from ~102 to 104 (median = 949, Fig. 2a), which fits the estimated range reported for E. coli2. This means that for every RNA molecule one can expect from 100 to 10000 protein molecules with a median value of 949. Our results showed a population-specific variation in the protein-to-RNA ratio within bacteria (Fig. 2a), with the median ratios for the bacteria in SEM1b at 18h ranging from 658 in CLOS1 to 1137 in RCLO1. Although the limited number of published studies and data that enable estimation of protein-to-RNA ratios prevented our assessment of higher-resolution taxon-specific distributions within Bacteria, clear patterns were observed at a broader Domain level and are presented below (Fig. 2a).

In contrast to bacterial protein-to-RNA ratios that were relatively comparable to one another, the median protein-to-RNA ratio for an archaeal organism was ~10× higher at 12,035 protein molecules per detected RNA (Fig. 2a: METH1). The reported values for eukaryotes are 4200–5600 for yeast21,22 and 2800–9800 for Homo sapiens23,24; therefore, we find that archaeal translation dynamics are closer to that observed within the eukaryotic domain than that of Bacteria. Structurally, the translation system of archaea more closely resembles eukaryotes14. Correspondingly, the RNA of Eukarya and Archaea have been shown to exhibit longer half-lives than Bacteria25,26, with Archaea found to possess a cap complex similar to those in eukaryotes at the 5′-triphosphate end of the RNA molecule that is involved with mRNA stability27. Like eukaryotes, it has also been shown that archaeal RNA is regulated by post-transcriptional modification of the RNA molecule in order to upregulate protein expression28,29. Findings that show transcripts are present in archaeal cells for longer than bacterial cells can be used to hypothesize that this feature could have a greater role in optimizing the efficient production of protein molecules. In a microbiome setting, the greater turnover of RNA molecules and lower protein–RNA ratio in bacteria could potentially facilitate their faster adaption to changes in metabolic state and substrate availabilities in their environment, at higher rates than their archaeal counterparts. However, in many complex microbiomes archaea occupy highly specialized niches such as the biological production of methane via methanogenesis, which is the energy-yielding metabolism of methanogens and is unique to the Archaea. In this context, proteins involved with hydrogenotrophic methanogenesis have been shown to be the most highly detectable in methanogens grown in co-culture with syntrophic acetate oxidizing bacteria, when compared to the same methanogen grown in axenic culture with higher concentrations of supplemented H230. This discrepancy between H2 supply and protein levels suggests there is a requirement for methanogens to maintain highly active protein expression levels in order to keep H2 at levels that are low enough to keep syntrophic acetate oxidation energetically favorable31. We, therefore, speculate that methanogens, via their molecular mechanisms of maintaining high protein levels, are at an advantage to stably and efficiently maintain low H2-levels, a process that is critical to the metabolic equilibrium of many microbial ecosystems32.

In axenic culture, a microorganism is considered to be in steady state during the log phase of its growth cycle9,33,34, specifically when the changes in proteome size are believed to be mainly dictated by a change in the transcriptome35. In contrary to these assumptions, comparisons of RNA and protein levels between single cells of E. coli grown at steady state have not been shown to correlate, however patterns do emerge when the individual cells are collectively considered at the population level2. In SEM1b, we wanted to see if correlations between RNA and protein levels exist across a larger microbial community and if they are affected by changes in time and life stages. We calculated gene-wise Pearson correlation coefficients (PCCs) of protein and transcripts over time for the entire SEM1b consortium and showed that the PCC value varied greatly (Fig. 2b) with a median of 0.41. A high average R2 value (0.85 for both MT and MP) was also determined between replicates indicating the stability of our results and the lack of influence from random noise. This suggested that no direct correlations between RNA and protein levels exist at any stage at a community level and that it is nearly impossible to predict the level of the given protein based on the level of the corresponding transcript.

Looking at relationships between proteome and transcriptome for individual populations within SEM1b (examples from four populations in Fig. 2c) was observed to follow a more predictable relationship, which can be described by the monomial function:

$${mathrm{Protein}} = a cdot {mathrm{RNA}}^k,$$

(1)

The formula for log10-transformed RNA and protein levels takes the form of a linear model (see “Methods” section) that was fitted to protein and RNA distributions per time point from MAGs with the highest quality (RCLO1, CLOS1, COPR1, TISS1, TEPI1, TEPI2, and METH1) (Fig. 2d). The linearity parameter k can be interpreted as the rate of which a change in RNA level is reflected in a change in the corresponding protein level. For example, if k = 1, a doubling in RNA level means a doubling in protein level, whereas if k = 0.5 a doubling in RNA level means a ~40% increase in protein level. Ranging from 0 to 1, it implies that, in the “perfect” condition where k = 1, the number of proteins is linked to the number of RNAs by the scalar constant a, whilst if k approaches 0, there will be much lower expected protein levels for the same number of RNAs. With the exception of TEPI2, the linearity (k) between protein and RNA levels was observed to start at values above 0.5 at 13 h (t2) (Fig. 2d). The evolution of the MAGs’ k values over time is then divided into three groups: one where the k values decrease rapidly (TISS1 and COPR1); one where they slowly decline (RCLO1, CLOS1, and METH1) and one where they stay constant if not increase (TEPI1 and TEPI2) (Fig. 2d). Notably, CLOS1, METH1, and TEPI1 are converging toward the same k values, while RCLO1 has a parallel trend to them. If these trends can be used to retro-fit the steady-state definition, we can hypothesize that these four populations possess a metabolic equilibrium and that this equilibrium is approximately reached within the 10 h window between 33h and 43h (t6 and t8 respectively, Fig. 2d).

### Interplay between microbiome function and RNA/protein dynamics

Using multi-omic data and the above-described RNA/protein dynamics, we were able to visualize that at least four populations within SEM1b converge upon a dominant metabolic state that we speculate to strongly shape the overall SEM1b community phenotype and suggest a functional co-dependence between the individual populations. To determine if this was the case, we annotated the genes and metabolic pathways for SEM1b MAGs (Fig. 3) and reconstructed their temporal expression patterns (Fig. 4). The SEM1b consortium is able to convert cellulose (and hemicellulose) to methane via the combined metabolism of its seven major constituent populations (Fig. 4a). Based on the previous analysis that showed that RCLO1 is closely related to R. thermocellum10, we predict that it senses36 its growth substrate (cellulose) and moves toward it (Fig. 4d). RCLOS1 then transcribes, translates, and secretes the components of the cellulosome, such as scaffoldins, dockerins, and carbohydrate-active enzymes (CAZymes)37, which assemble into a dynamic multi-protein complex that degrades the substrate to smaller carbohydrates. Via the MG, we predicted that non-cellulosomal CAZymes were also employed by the Clostridium-affiliated CLOS1, which acted upon the hemicellulose fraction (mainly xylan) trapped in the spruce cellulose, which was supported by the observed release of its main monomer xylose (Fig. 4a). Sugars generated via the actions of RCLO1 and CLOS1 are subsequently consumed by RCLO1, CLOS1, and Coprothermobacter-affiliated populations (COPR1, BWF2A, and SW3C), which were all observed to express sugar transporters (Fig. 3). Notably, CLOS1 has the most diversified transporters, making it a flexible consumer, and for the most part demonstrated highest levels of hydrolytic and fermentative gene expression after RCLO1, which again is likely tied to xylose release at later stages of the SEM1b life cycle (Fig. 4a). However, some of the transporters, such as those for oligogalacturonide, raffinose/stachyose/melibiose, and rhamnose, were not expressed, likely due to the absence of their substrates in the largely cellulose and xylan dominated spruce wood used in this study. CLOS1 was also the only population to possess the aldouronate transporter with 20 copies of gene lplA, 20 of lplB, and 16 of lplC (20/20/16) and expressing 0.4/0.7/3.8 × 1010 and 92.8/3.5/7.0/ × 1011 combined median transcripts and proteins per sample; making it one of the few transporters detectable at the protein level. Similarly, the C. proteolyticus strains (BWF2A and SW3C) possess and express unique sugar transporters, likely gaining access to an undisputed pool of arabinogalactan or maltooligosaccharide. The transporter for pentamers ribose/xylose was the most common and possessed by RCLO1, C. proteolyticus populations and Tepidanaerobacter populations (TEPI1 and TEPI2). Notably from Fig. 3, it is clear that the proteins from the transporters are almost never found in the samples, even if the respective RNAs are present in the data set. This is likely due to the difficulties in extracting transmembrane proteins19 with the gel-based method that we employed.

The process of degrading cellulose and simple saccharides via hydrolysis and fermentation ultimately results in the production of short-chain fatty acids (SCFAs) such as propionate, butyrate, and acetate, which are subsequently metabolized by the predicted SCFA-oxidizing populations in SEM1b (TISS1, TEPI1, TEPI2) (Fig. 4a). The only metabolically active SCFA-oxidizing population in SEM1b was predicted to be TEPI1, as it demonstrated high k values that increased over time (Fig. 2d) and harbored a complete Wood–Ljungdahl pathway that was detectable in both MT and MP (Fig. 3). It has been shown that oxidizers can improve the oxidation of SCFAs (up to double speed) when superior NADPH and ATP generators (e.g., glucose) are consumed in small amounts to complement the stoichiometry through the pentose phosphate pathway (PPP) without triggering the shift of the entire cell’s metabolism toward another substrate38. In this context, it is interesting to note that TEPI1 was the only MAG that encoded and expressed a hexose (allose) transporter (Fig. 3). Aldohexoses (such as d-allose, d-glucose, d-mannose, etc.) are imported and transformed into fructose-6P in two reactions (both expressed in TEPI1), which can then be fed into either the PPP or the Glycolysis pathways. Xylose is a product of the degradation of hemicellulose present in our system (Fig. 4a) and can be converted to ribulose-5P and fed to the PPP in three reactions. This data, in combination with a highly expressed and detectable Wood–Ljungdahl pathway over time (Fig. 4a), points to the establishment of TEPI1 as the only syntrophic acetate oxidizing bacteria in the SEM1b consortium. We speculate that TEPI1’s syntrophic metabolism is helped by the other SEM1b populations that generate acetate as a fermentation end-product and the supply of sugars released by the cellulosomal complex such as glucose and xylose. Interestingly the closely related MAG TEPI2 was observed to lack the Wood–Ljungdahl pathway and to express ~10 times more transcripts for the ribose/xylose transporter than TEPI1; relegating it to the role of mere sugar degrader, and probably scavenger in the community.

Although TISS1 seems mostly to phase out of the community and its k value associated with its protein to transcript relationship (Fig. 2d), TEPI2 implements an exit strategy in the form of sporulation. All the Gram-positive populations from the SEM1b consortium (RCLO1, CLOS1, TISS1, TEPI1, and TEPI2) were able to produce spores and express the spore marker spoIV, an ATPase associated to the surface of the neospore that promotes the assembly of the coating and is common to all the spore-forming bacteria39 (Fig. 4b). TEPI2 however increased the level of transcripts for spoIV by 1000 times within the 13 h and the 18 h time points, reaching the maximum at 23 h, and having a production ten times higher than the phylogenetically related TEPI1. All SEM1b populations, except the C. proteolyticus isolates and TEPI1, have the genetic potential for flagellar synthesis but the respective transcripts were only observed for RCLO1, CLOS1, TISS1, and TEPI2. The filament protein of RCLO1 is by far the most abundant protein in the samples with an average of 2.8 × 1013 molecules per sample, which matches the idea of RCLO1 investing in motility to reach the cellulose fibers and starting with the highest level of marker flgD in the community (Fig. 4d).

In microbial ecosystems, acetate is oxidized by secondary fermenters to CO2/H2 or formate, a process that is mediated by the Wood–Ljungdahl pathway in reverse. The oxidation of acetate associated with the reverse WLP is coupled with the transition between NADH/NAD+, and translocates Na+ to create an electrochemical gradient, which is then used by the type-V ATPase to synthesize ATP40. Indeed the NAD+-Fdred-dependent Na+ translocation system rnf is expressed in both the fermenting and syntrophic acetate oxidizing bacteria of SEM1b, while type-V ATPase, which produces energy by exploiting the Na+/H+ gradient, were detected in all the SEM1b populations aside from METH1 and C. proteolyticus-affiliated populations (Fig. 3). Moreover, the TEPI1 MAG expresses the NAD+ (NADP+)-reducing hydrogenases complex, which reduces hydrogen ions to H2 using NAD(P)H as the electron donor. The molecular hydrogen generated here would then be used by the syntrophic partner METH1 to form methane (Fig. 4a). However, acetate oxidation that is mediated by the reverse Wood–Ljungdahl pathway is thermodynamically unfavorable unless coupled with syntrophic hydrogenotrophs. Within SEM1b, the METH1 population is a hydrogenotrophic methanogen and the methanogenesis pathway, which is observed in the METH1 MG and MP, is the largest pathway in SEM1b according to the number of genes involved (n = 112). In METH1, we also observed transporters for nickel, the metal ion found in the F430 prosthetic group in the methyl-coenzyme M reductase complex (McrABG), which is responsible for the terminal step in anaerobic methanogenesis41. Transporters for another key metal, cobalt, which is utilized by cobalamin-requiring enzymes such as the energy-conserving methyl-H4MPT:CoM-SH methyltransferase complex (MtrABCDEFGH), were also detected in the MG and MP of METH1. Within hydrogenotrophic methanogens, electrochemical gradients generated Na+ ion exclusion by the Mtr complex allows for the inflow of Na+ through ATP synthases to generate energy. Surprisingly, no H+/Na+nha ion transporter, which is commonly observed in methanogens were observed in the METH1 MG, MT, or MP. The Na+/H+ antiporter nha was encoded and expressed in populations TEPI1, TEPI2 and TISS1 (Fig. 2), which does point to an important role of these ions in the bacterial component of the SEM1b consortium. Overall, our more classical pathway-wise exploration of the SEM1b populations supported that RCLO1, CLOS1, TEPI1, and METH1 indeed share functional co-dependencies and supported our predictions via RNA/protein dynamics that they converge upon a dominant metabolic state.

### Translation control impacts cell status and resource usage

A change in protein regulation can be causally linked to a change in the population status (steady state or transition). Within the cell, proteins are predominately the performers of cellular functions thus the change in cell status can be achieved by actively altering the protein level. In addition to protein-to-RNA ratio assessments, our absolute multi-omics analysis allowed us to explore the crucial aspect of protein-level regulation, which is poorly understood in microbiomes. The control of protein levels in bacteria is believed to occur via transcription control, “control by dilution”42 (dispersal of proteins via subsequent cell divisions), sRNA activity43, and rarely by protein degradation44. Similar to transcription control, translation can also be controlled by a dynamic pool of translational factors, such as initiation, elongation, and ribosome components45. The processes targeted by these systems require a rapid change in the number of proteins in the cell that cannot wait for a change in RNA levels or a dilution effect. We used our absolute quantifications of SEM1b transcripts and proteins as well as PECA-R46 to predict the “change-point”, which takes into account estimates of protein translation and degradation rates (Supplementary Data 5). The analysis found 305 significant rate changes, accounting for 302 ORFs. Of the rate changes’, 94% were downregulated and 71% of the ORF were functionally annotated. RCLO1 has 28 downregulated ORFs between 13 and 18 h (t2–t3), mostly from complexes involved in chemotaxis (cheY, cheW, mcp), flagellum assembly (flgG, flgK, fliD) and shape determination (mreB). In the following 5 h several systems concerning carbon fixation are affected, such as phosphoglycerate kinase (PGK), triosephosphate isomerase (TPI), phosphate acetyltransferase (EC 2.3.1.8), isocitrate dehydrogenase (IDH1), and pyruvate orthophosphate dikinase (PPDK). In the next 5 h, it downregulates the translation of the cell division protein ZapA as well. The reduction in protein production for chemotaxis, mobility, and then cell division matches the idea that within 13 h of the inoculation, RCLO1 sensed, reached, and colonized the cellulose fibers. Contextually the release of medium length oligosaccharides enables RCLO1 to engage in the more energetically favorable fermentation metabolism. TISS1 has a decrease in translation rates of ORFs related to metabolic processes between 13 and 18 h, mostly involving cofactors (fhs, folC, folD, lplA, metH, pdu0, and nadE) and amino acids (aorQ, hutI, LDH, metH, mtaD, and pip). TEPI1 downregulated 60 ORFs, accounting for part of its carbohydrate metabolism (e.g., PGK, TPI), the amino acid transporters, and the NADH dehydrogenase complex (NDH). TEPI2 has 19 ORFs subject to downregulation in the 13–18h interval, such as pyruvate ferredoxin oxidoreductase (PFOR), GK, fructose-bisphosphate aldolase (FBA), tansaldolase EC 2.2.1.2, and the ribose/xylose transporter subunit rbsB. In the last interval (33–38 h), RCLO1 upregulated the translation of 10 ORFs, among which the flagellar FlbD and shape determination (mreB); seemingly starting to restore the functions downregulated in the 13h-18h interval.