INTRODUCTION
Almost all metazoans experience aging, a process of progressive decline in functionality and resilience that results in death (Kenyon, 2010). Germline development is the only known biological process capable of reversing or avoiding the effects of aging, and the discovery that somatic cells can be reprogrammed to a pluripotent state demonstrated that the stages of this developmental process can be reversed (Gurdon, 1962). Seminal work further demonstrated that cells could be reprogrammed to an induced pluripotent stem cell (iPSC) state through induction of four transcription factors—Sox2, Pou5f1 (Oct4), Klf4, and Myc (SOKM) (Takahashi and Yamanaka, 2006; Yu et al., 2007).
Evidence from several groups further suggests that pluripotent reprogramming also reverses features of aging. iPSCs generated from young and aged donors are largely indistinguishable, and this similarity persists after differentiation into multiple fates (Lapasset et al., 2011; Nishimura et al., 2013; Mertens et al., 2015). This erasure of aging features does not occur in direct reprogramming protocols (Mertens et al., 2015; Kim et al., 2018), suggesting that dedifferentiation during pluripotent reprogramming is intimately related to this phenomenon.
Transient induction of SOKM in mice using an inducible germline allele was reported to improve multiple physiological functions in aged animals and to extend lifespan in progeroid mice (Ocampo et al., 2016). This partial reprogramming strategy does not generate iPSCs but rather alters gene expression without inducing a pluripotent state in most cells. Similarly, partial reprogramming was reported to reduce transcriptional features of aging in multiple human and mouse cell types (Sarkar et al., 2020; Lu et al., 2020; Gill et al., 2021; Shahini et al., 2021) and to improve regenerative capacity in the heart, muscle, and eye (Chen et al., 2021; Sarkar et al., 2020; Lu et al., 2020; Rodrı´guez-Matella´n et al., 2020).
These striking results suggest that even transient induction of pluripotency-associated gene regulatory networks (GRNs) is sufficient to ameliorate features of aging. However, it remains unclear if these effects occur uniformly in all cells and at which stage of the reprogramming process features of aging are lost. It is also unknown if partial reprogramming suppresses the GRNs that establish somatic cell identity. Early reports have suggested that cell identity programs are unaffected by partial reprogramming, but this stands in contrast to observations from iPSC reprogramming experiments (Polo et al., 2012; Maza et al., 2015; Guo et al., 2019b; Tran et al., 2019) and the known neoplastic effects of in vivo reprogramming (Abad et al., 2013; Ocampo et al., 2016). Likewise, it has been suggested that some of the beneficial effects of partial reprogramming stem from the re-engagement of somatic cell identity programs, but there is currently little evidence to support this hypothesis.
The necessity or sufficiency of individual pluripotency factors to ameliorate features of aging is also unclear. The widely used SOKM factors were identified based on their ability to induce a pluripotent state. Given that partial reprogramming explicitly avoids this outcome, it is plausible that only a subset of the SOKM GRNs are necessary to reverse age-related changes. Recent experiments suggest that Myc is not necessary at a minimum (Lu et al., 2020). It is also unknown if alternative reprogramming strategies (e.g., multipotent reprogramming) might also restore youthful gene expression or if activation of the full pluripotency program is required.
The goal of this work is to address these unanswered questions about the effects of partial reprogramming on features of aging and somatic identity and the necessity of individual partial reprogramming factors to restore youthful expression. We hypothesized that single-cell expression profiling of multiple cell types during partial reprogramming would capture a series of cell states induced by the perturbation, helping to reveal any suppression of somatic identities that might exist. We further reasoned that a reverse genetics screen measuring the gene expression effects of partial reprogramming with all combinations of the Yamanaka Factors would allow us to determine which factors in the set are necessary to restore youthful expression.
To investigate our hypothesis that somatic identity might be suppressed by partial reprogramming, we performed partial reprogramming in primary young and aged adipogenic and mesenchymal stem cells (MSCs) and profiled single-cell gene expression at multiple time points after reprogramming factor withdrawal. These experiments revealed that partial reprogramming restored youthful gene expression but transiently repressed somatic cell identity in both cell types, further supported through a re-analysis of public expression data for partial reprogramming in diverse cell types. To perform a reverse genetics screen with combinations of the Yamanaka Factors, we developed a pooled screening system for transient transcription factor overexpression and tested all combinations of the Yamanaka Factors in young and aged primary cells. We found in our screens that no single Yamanaka Factor was necessary to restore youthful expression or suppress somatic identity, and additional experiments revealed that even a multipotent reprogramming strategy with no Yamanaka Factors could restore youthful expression. Our primary contributions are therefore as follows: (1) the discovery that partial reprogramming transiently suppresses somatic cell identity in diverse cell types, orthogonal to the effects on aging genes; (2) the discovery that no single Yamanaka Factor is necessary to restore youthful expression; and (3) a demonstration that alternative, nonpluripotent reprogramming can exhibit similar effects.
RESULTS
Partial reprogramming restores youthful gene expression in aged adipogenic cells
To evaluate the effects of partial reprogramming using the Yamanaka Factors, we designed a polycistronic tetracycline-inducible SOKM lentivirus with a fluorescent reporter (LTV-Y4TF; Figure 1A) (Carey et al., 2009). We used LTV-Y4TF and a tetracycline-transactivator lentivirus (LTV-Tet3G) to transduce primary adipogenic cells from young (2–4 months old) and aged (20– 24 months old) C57BL/6 mice in vitro. Following transduction, we performed a pulse/chase by adding doxycycline (Dox) to the cell culture media for 3 days and chasing for 3 days (Figure 1B). Based on half-life references and measured transgene abundance, we estimate that there is minimal activity of the Yamanaka Factor programs by the end of the chase period (Figure S1). We used a multiplicity of infection (MOI) sufficient to transduce < 50% of cells in each condition, such that a fraction of cells in each well are exposed to Dox but do not contain both the transgenes required for reprogramming. These cells therefore serve as an in situ control. Following the chase period, we sorted dual transgene-positive (Tg+) and single transgene or transgene-negative (Tg ) cells of each age by cytometry and profiled cellular transcriptomes by single-cell RNA-seq (STAR Methods).
We captured 30,000+ high-quality adipogenic cell mRNA abundance profiles. After denoising and dimensionality reduction with a variational autoencoder (Lopez et al., 2018), we clustered cell profiles and found that reprogramming induced a set of novel gene expression states unseen in control cells (Figures 1C and 1D). Control cells occupied an adipogenic state marked by Lpl and a secretory state marked by Rspo1, each predicted to match in vivo mesenchymal cell states by a cell type classifier (Figures S2A–S2C). The vast majority of treated cells occupied reprogramming-specific cell states, suggesting that partial reprogramming effects are highly penetrant. Young and aged cells occupied distinct regions of the latent space in both control and reprogrammed populations, similar to a previous study of control cells in vivo, suggesting that some features of aging persist after reprogramming (Figures 1D and S3) (The Tabula Muris Consortium, 2020). Animal-to-animal differences were a small source of variation (< 1%; Figure S4, ANOVA, STAR Methods). We note that some cells in the transgene-negative (Tg- ) samples co-embed with the reprogrammed populations. This is most likely the result of imperfect cell sorting prior to singlecell analysis and may lead us to underestimate the magnitude of partial reprogramming effects.
We first wanted to determine if the transcriptional distance between young and aged control cells—the magnitude of aging— was decreased by partial reprogramming. We measured the magnitude of aging using the maximum mean discrepancy (MMD), a statistic from representation learning that measures the similarity of two populations and provides a test for statistical significance (Gretton et al., 2012; Kimmel et al., 2020a). Here, we computed the MMD using autoencoder latent variables to capture changes across the transcriptome. We compared both aged control and aged reprogrammed cells with young control cells by MMD and found that the magnitude of aging decreased significantly after reprogramming (Figure 1E; p < 0:001, Wilcoxon rank sum test; STAR Methods). Isochronic comparisons between animals of the same age were smaller than the magnitude of aging as expected. We found consistent results using alternative methods of computing an aging magnitude (Figures S2E–S2G). This decrease in the distinction of young and aged cells demonstrates that partial reprogramming can restore youthful gene expression across the transcriptome.
To determine which genes drive this change, we performed differential expression analysis and found that reprogramming induced significant changes toward the youthful expression level in 3,485 genes of a total of 5,984 genes changed with age (Figure 1F). Youthful changes occurred significantly more often than would be expected by random chance (binomial test, p < 0:0001). We also found that reprogramming altered many genes that are not changed with age, indicating that reprogramming induces some orthogonal effects (Figures S2E and S2H). We used gene set enrichment analysis (GSEA) (Subramanian et al., 2005) to compare the effects of aging and reprogramming and found that reprogramming counter-acted age-related changes in many gene sets (Figure 1G). Two of the strongest examples were the adipogenic ‘‘fatty acid metabolism’’ gene set, downregulated with age and upregulated by reprogramming, and the ‘‘inflammatory response’’ gene set, upregulated with age and downregulated by reprogramming (Figure 1H). We found a similar downregulation of adipogenic genes with age in singlecell data collected in vivo (Figure S3) (The Tabula Muris Consortium, 2020). Partial reprogramming also shifted the expression of some gene sets further in the aged direction. For example, we found that the ‘‘oxidation phosphorylation’’ and ‘‘mTOR signaling’’ gene sets were elevated by aging and partial reprogramming, whereas ‘‘epithelial-to-mesenchymal transition’’ (EMT) was suppressed by both (Figures 1H and S2D). Therefore, not all gene expression effects of partial reprogramming are counter to the axis of aging.
Cell identity dictates partial reprogramming effects
We next wondered if restoration of youthful gene expression was consistent across cell types and performed partial reprogramming in muscle-derived MSCs (Joe et al., 2010) to investigate. We captured 20,000+ MSC profiles and found that young and aged control cells were less distinct than in the adipogenic case (Figures S5A and S5B), but cell age could still be readily distinguished by a classification model (93% accurate; Figure S6; STAR Methods). Similar to our experiments in adipogenic cells, partial reprogramming induced a novel set of reprogrammingspecific gene expression states unseen in control MSCs (Figure S5C).
Reprogramming states were characterized by a downregulation of the EMT program (Hallmark gene set; Fischer’s exact test, q < 0:0001). We also found that youthful gene expression was restored in 712 genes (significant change in youthful direction), an aging-induced fibrotic gene program was suppressed, and an aging score derived from bulk RNA-seq data decreased (Figures S5D, S5E, and S7). We compared the genes restored with more youthful levels in MSCs and adipogenic cells and found 261 genes enriched for inflammatory gene sets (e.g., ‘‘macrophage activation,’’ q < 0:1) were shared across cell types.
However, when we computed the magnitude of aging in MSCs, we found that aged cells were more distinct from young controls after reprogramming (Figure S5F). This result was likely due to the fact that more than 4000 genes were significantly changed with reprogramming, but not with age. We hypothesized that the magnitude of aging might be smaller in MSCs than adipogenic cells, such that the orthogonal effects of reprogramming dominate in MSCs. Consistent with this hypothesis, we found that the magnitude of aging was significantly larger in control adipogenic cells than MSCs in a shared transcriptional space (Figures S8A and S8C; STAR Methods). We explored these orthogonal effects using gene set enrichments and linear discriminant analysis (LDA) and found that suppression of mesenchymal identity and activation of cell cycle programs were the dominant effects of reprogramming unrelated to MSC aging (Figures S5G–S5I). These results suggest that partial reprogramming effects unrelated to aging may dominate in some cell types.
To confirm that partial reprogramming effects in our data were consistent with previous reports, we compared data from both cell types with three previous studies (Samavarchi-Tehrani et al., 2010; Buganim et al., 2012; Tran et al., 2019). These studies employed three distinct gene expression analysis techniques (single-cell qPCR, microarrays, and single-cell RNAseq) and used different reprogramming strategies, such that any shared effects are unlikely to be artifactual. We found that both marker gene dynamics (e.g., EMT markers) and reprogramming-specific gene programs extracted from previous studies show similar dynamics in our data (Figure S9).
To test if reprogramming had cell type-specific effects, we performed differential expression analysis to identify reprogramming-by-cell type interactions (Methods). This analysis revealed more than 10,000 genes for which reprogramming induced significantly different effects (q < 0:05, likelihood-ratio test). We also found that 11% of nonresidual variation was explained by cell type-specific effects of reprogramming in a shared transcriptional latent space (Figure S8B, ANOVA).
To unravel the influence of cell type on partial reprogramming outcomes, we extracted the top genes with significant cell type: reprogramming interactions and performed gene ontology (GO) analysis. Extracellular matrix and EMT programs were suppressed by reprogramming in both cell types but significantly less so in adipogenic cells. By contrast, hypoxia and glycolysis gene sets were more upregulated by reprogramming in MSCs than adipogenic cells (Figures S8D–S8F; q < 0:05, Fischer’s exact test). It is possible that some of these cell type-specific effects are the result of different induced transgene levels across cell types. Our data therefore suggest that cell identity dictates the effects of partial reprogramming across many genes.
Polycomb repressive complex-2 targets are recalcitrant to reprogramming
We have thus far focused on features of youthful expression that are restored by partial reprogramming. However, we found that in both adipogenic cells and MSCs, many aging genes are recalcitrant to partial reprogramming and retain their aged expression state (Figure S10A). In MSCs in particular, the majority of aging genes (1990) were not restored to a youthful level. We found that 462 of these recalcitrant genes were shared across cell types, suggesting there may be a core set of aging features not amenable to partial reprogramming.
To determine if recalcitrant genes shared common regulatory features, we performed gene set analysis using a database of DNA binding protein gene sets (ChEA3) (Keenan et al., 2019). This analysis revealed that both shared and cell type-specific recalcitrant genes were strongly enriched for targets of polycomb repressive complex-2 (PRC2; Figure S10B). Many PRC2 target genes were downregulated with age in both cell types and further downregulated by partial reprogramming (Figures S10C and S10D). Suppression of somatic cell identity genes by PRC2 is an essential step in pluripotent reprogramming (Onder et al., 2012; Fragola et al., 2013), and PRC2 targets are known to be hypermethylated with age (Teschendorff et al., 2010). These results therefore suggest that aging genes important for somatic cell identity and targeted by PRC2 may be refractory to partial pluripotent reprogramming.
Somatic cell identity networks are repressed by partial reprogramming
Based on our observation of novel cell states in reprogrammed cells, we hypothesized that the gene networks enforcing somatic cell identity may be repressed, although reprogramming factors were only expressed transiently. To enable analysis of gene expression changes across the continuous reprogramming trajectories, we used pseudotime analysis to assign a one-dimensional coordinate to each cell, ordering cells from the most somatic to the most reprogrammed. For intuitive interpretation, we oriented our inferred pseudotime coordinate system so that baseline cell states occupy lower coordinate values and reprogrammed cells occupy higher values (no effect on quantitative analysis; STAR Methods).
We first investigated the effects of partial reprogramming on individual marker genes for somatic cell identity across the pseudotime trajectories. In adipogenic cells, we found that adipogenic genes Lpl and Fabp4 were significantly downregulated in the most distal reprogrammed cells, whereas pluripotency-associated genes Nanog, Snca, and Fgf13 were upregulated (Figure 2B; STAR Methods). We found a similar pattern in MSCs where mesenchymal genes Acta2, Thy1, and Col1a1 were downregulated, whereas Nanog, Snca, and Fgf13 were upregulated (Figure 2D). Activation of Nanog and other pluripotency genes after a 2–4 day SOKM induction is consistent with previous single-cell timecourse studies of iPSC reprogramming (Figure S11) (Schiebinger et al., 2019). We next summarized the activity of cell identity GRNs using regulatory network inference methods (Aibar et al., 2017; Han et al., 2018), a cell identity classification model (scNym) trained on a mouse cell atlas and latent gene programs learned from previous pluripotent reprogramming time series experiments (Kimmel and Kelley, 2021; Tabula Muris Consortium, 2018). We found that somatic cell identity programswere suppressed in distal reprogramming states for both cell types using all analysis approaches (Figures 2E, S9, and S12). We also performed pseudotime analysis using two alternative pseudotime methods (Palantir, scVelo latent time) and found that our results were robust across algorithms (Figure S13) (Setty et al., 2019; Bergen et al., 2020).
Our results stand in contrast to previous reports that partial reprogramming did not suppress somatic cell identity or activate pluripotency genes (Sarkar et al., 2020; Lu et al., 2020; Olova et al., 2019). What could explain this difference? To make a more direct comparison, we interrogated gene sets from these reports in our data, re-analyzed raw RNA-seq data from contrasting studies where available (Lu et al., 2020), and re-analyzed three additional public partial reprogramming RNA-seq datasets in three unique cell types (Francesconi et al., 2019; Chen et al., 2021; Wang et al., 2021). Three of the public datasets are from in vivo partial reprogramming experiments, helping generalize our conclusions beyond the in vitro setting explored in this study. Using gene sets from previous reports, we confirmed that most pluripotency genes were upregulated and somatic identity genes were downregulated in our data (Figure S14). We also found that our cell identity classifiers learn to use somatic marker genes not included in previous gene sets, providing a more complete representation of cell identity (Figure S14D).
Across four cell types available in public data (B cells, cardiomyocytes, skeletal muscle, and retinal neurons), we found that pluripotency genes were upregulated and somatic identity gene programs were downregulated in each case (Figure 2H; q < 0:1, likelihood-ratio test). We likewise found that scNym identity classifiers leveraging the ARCHS4 RNA-seq compendium as training data (Lachmann et al., 2018) detected significant upregulation of ESC-like pluripotency programs and suppression of somatic identity programs in the three in vivo partial reprogramming experiments (Figure 2I; p < 0:01, Wald test). We found that pluripotency genes were upregulated and somatic identities were suppressed in RNA-seq data from one contradictory report using SOK in vivo (Lu et al., 2020).
We note that it is possible the biology of dermal fibroblasts used in some previous studies (Ocampo et al., 2016; Lu et al., 2020; Sarkar et al., 2020) may be uniquely resistant to identity suppression, therefore explaining our contrasting results. However, this seems unlikely, as our single-cell measurements are consistent with lineage-tracing studies in fibroblasts showing that partial reprogramming induces pluripotency programs (Maza et al., 2015), and our re-analysis of public data is consistent across diverse cell identities. Our results therefore suggest that partial reprogramming represses somatic cell identity GRNs and activates late-stage pluripotency GRNs in a subpopulation of cells across diverse cell types in vitro and in vivo, potentially posing a neoplastic risk.
Partially reprogrammed cells re-acquire somatic identities through secondary differentiation
It has been proposed that transient expression of reprogramming factors induces a transient de-differentiation and redifferentiation process, such that reprogrammed cells temporarily adopt early phase reprogramming features but then re-acquire their original state (Samavarchi-Tehrani et al., 2010; Nagy and Nagy, 2010; Sarkar et al., 2020). To date, there has been little evidence for this model in the context of restoring youthful gene expression. In addition to rich profiles of the current cell state, single-cell RNA-seq provides information on future gene expression states through RNA velocity (La Manno et al., 2018).
We inferred RNA velocity for both our adipogenic cell and MSC experiments and found that cells in reprogrammed states were moving toward baseline states based on velocity maps (arrows point from high pseudotime values toward low pseudotime values). This result suggests that in both cell types, reprogrammed cells were re-differentiating toward their baseline state (Figures 2A and 2C) (Bergen et al., 2020). To identify genes that drive predicted changes in cell state, we performed differential velocity analysis. We found that EMT and extracellular matrix gene sets were enriched in the velocity markers for reprogrammed cells, suggesting that partially reprogrammed cells reacquire their original mesenchymal identities (Figure 2F).
We further quantified RNA velocity maps using phase simulations, a technique from dynamical systems to measure properties of vector fields by simulating the motion of a point moving through the field (Kimmel et al., 2020b, 2020a; Strogatz, 2015). Here, we simulated reprogrammed cells in each velocity map and evolved their positions based on the inferred velocities of neighboring cells (Methods). We found that simulated reprogrammed cells evolved toward somatic cell states in both cell types (Figure 2G).
From these analyses, we hypothesized that partially reprogrammed cells would re-acquire their full somatic identity if reprogramming factors were withdrawn for a longer period. To test this hypothesis, we repeated our partial reprogramming experiment in adipogenic cells and extended the period between reprogramming factor withdrawal and profiling (the chase period) from 3 to 10 days. We integrated the results of three independent 10 day chase experiments with our original 3 day chase experiment, allowing us to ask where the 10 day chase cells are located in the original reprogramming pseudotime trajectory (Figure 2J; STAR Methods).
We found that reprogrammed cells had pseudotime coordinates significantly closer to control cells after the 10 day chase period, suggesting that cells move ‘‘backward’’ in pseudotime after reprogrammingfactorsarewithdrawn(Figure2J).Investigating marker genes, we found that pluripotency marker genes had significantly lower expression after 10 days than at 3 days, and somatic identity genes had significantly higher expression. Further analysis of differentially expressed reprogramming genes and gene set enrichments revealed that most reprogramming effects were attenuated after the 10 day chase period (Figures 2J and S15D–S15F). To test the possibility that somatic cell identity was restored by selective cell death and expansion, rather than by state transitions, we performed a timecourse partial reprogramming experiment in a cell line model and found that observed cell death and proliferation rates were incompatible with a selectionexpansion mechanism (Figure S16;STAR Methods). Collectively, our results suggest that partial reprogramming involves the transient suppression of somatic identity GRNs and subsequent redifferentiation and reactivation of these networks.
We next wondered if youthful gene expression persisted in aged cells after the 10 day chase period. We found that hundreds of significantly rejuvenated genes from the 3 day chase experiment were still rejuvenated after 10 days, with a strong correlation between the 3 day and 10 day fold-changes (Figures S15G and S15H; Pearson’s r = 0:67; p < 0:001). We were particularly curious if the rejuvenation of adipogenic gene expression we observed at 3 days likewise persisted at 10 days. To investigate, we analyzed the distribution of cell states in the 10d chase experiments and found that reprogramming induced a significant shift from a fibrotic state toward an adipogenic state relative to control cells (Figures S15I and S15J; p < 0:001; c2 test). We also found that the GO ‘‘fatty acid metabolism’’ program was still elevated in reprogrammed cells after 10 days (Figure S15K). Our results indicate that some but not all features of youthful gene expression induced by partial reprogramming persist after longer-term withdrawal of reprogramming factors.
Subsets of the Yamanaka Factors are sufficient to elicit partial reprogramming effects
Partial reprogramming studies in aged cells have largely investigated the effects of the canonical Yamanaka Factors (SOKM) along with additional pluripotency regulators known to induce pluripotency (Nanog, Lin28) (Sarkar et al., 2020; Yu et al., 2007). The pluripotency programs activated by the Yamanaka Factors contain positive feedback; hence, it is possible that a subset of factors is sufficient to restore youthful expression, as suggested by a recent report using only SOK (Lu et al., 2020). Several efforts have also demonstrated the sufficiency of Yamanaka Factor subsets to generate iPSCs (Wernig et al., 2008; Nakagawa et al., 2008; Utikal et al., 2009; Velychko et al., 2019), but it remains unknown if subsets exert similar transcriptional effects after only a short, transient induction. Both the full SOKM and reduced SOK factor combinations are oncogenic, motivating a search for alternative reprogramming strategies to restore youthful gene expression (Abad et al., 2013; Senı´ s et al., 2018).
Here, we investigated the effects of all possible sets of the Yamanaka Factors using a custom pooled screening system (Figure 3A). Our system is inspired by other pooled screening and lineage tracing systems (Dixit et al., 2016; Hill et al., 2018; Norman et al., 2019; Guo et al., 2019a) and encodes a tetracycline-inducible reprogramming factor along with a constitutively expressed barcode on lentivirus. We introduced these vectors in two independent experiments into young and aged MSCs and performed a 3-day pulse/3-day chase. The expressed lentiviral barcodes allowed us to demultiplex the unique combination of lentiviruses infecting each cell in silico (Figures 3A and 3B; STAR Methods). We confirmed the accuracy of demultiplexing by comparing with an orthogonal demultiplexing approach (Figure S17; STAR Methods).
We recovered more than 100 transduced cells for all combinations of the Yamanaka Factors (Figures 3B and 3C). To confirm the efficacy of our pooled screening system, we investigated the effects of each combination on reprogramming marker genes. We found that mesenchymal marker genes decreased as a function of combinatorial complexity (number of unique factors), whereas reprogramming marker genes increased (Figure 3D). This result is consistent with previous gene-interaction studies of differentiation factors (Norman et al., 2019) and the known biology of the Yamanaka Factors, where higher order combinations are more effective at reprogramming (Takahashi and Yamanaka, 2006).
To confirm that our pooled reprogramming method induced effects similar to our polycistronic vector (Figure 1A), we performed data integration for our MSC polycistronic and pooled screening experiments. Higher order combinations clustered more readily with polycistronic-SOKM reprogrammed cells (Figures S18A–S18E). We also found that reprogramming marker genes from the polycistronic experiment had highly correlated fold-changes in the pooled experiment (r > 0:63; p < 0:001; Figures S18F and S18I). We did observe that polycistronic reprogramming more strongly suppressed the EMT program, possibly due to differences in transcription factor stoichiometry between the pooled and polycistronic approach as previously reported (Carey et al., 2009, 2011) (Figures S18G and S18H).
We next wanted to determine if different Yamanaka Factor combinations elicit unique effects or if each combination induced similar gene expression programs, varying more in magnitude than in direction. We investigated this question by training a cell classification model to discriminate cells perturbed with each combination of Yamanaka Factors (Methods) (Kimmel and Kelley, 2021). Combinations with similar effects might befrequently confused by the model, whereas combinations with unique effects might be readily discriminated. To analyze our trained classifier, we extracted prediction probabilities across combinations for each cell and then computed the correlation of these probabilities for each pair of combinations as a similarity metric. We found that combinations of similar combinatorial complexity were highly similar, whereas combinations of different complexities were readily distinguished. For instance, most triplet combinations were frequently confused with the full Yamanaka Factor set (Figure 3E). We also compared differentially expressed genes for each combination using the Jaccard index and mean expression vectors using the cosine similarity, yielding similar results (Figures S19A and S19B).
Most combinations of Yamanaka Factors with similar complexity therefore have similar transcriptional effects when induced for a short period of time in MSCs. The similarity among higher order combinations may be due to the activation of shared effectors within the pluripotency network, consistent with previous reports (Wernig et al., 2008; Nakagawa et al., 2008; Utikal et al., 2009; Velychko et al., 2019). It is also plausible that the endogenous copy of the missing factor is activated by the remaining factor, as the Yamanaka Factors can activate one another (Han et al., 2018) (Figure S19C). We did not observe significant upregulation of missing factors in our data (q > 0:1, Monte Carlo differential expression), but we cannot rule out the possibility of transient activation. Taken together, our results suggest that no single Yamanaka Factor is required for restoration of youthful gene expression.
Yamanaka Factor subsets decouple rejuvenation and cell identity suppression
We wondered if suppression of cell identity and restoration of youthful gene expression were strongly associated across factor combinations, such that stronger restoration of youthful gene expression was predictive of a more suppressed identity program. To investigate, we scored a mesenchymal cell identity program using a cell identity classifier and the activity of an aging gene set extracted from our MSC experiments across cells in our screening experiment (Kimmel and Kelley, 2021; STAR Methods). We found that all combinations of Yamanaka Factors suppressed the cell identity program and reduced the aging score (Figure 3). Aging score reduction was significant (Wald tests, p < 0:05) for all but two programs with lower cell numbers where our analysis may be underpowered (SOK, OKM). We likewise found that 791 aging genes shifted toward a youthful level in at least one reprogramming condition (Figure 3G). Gene set enrichment analysis confirmed that coherent biological processes were restored to a youthful level, including p53 signaling and G2M checkpoint activity, whereas other programs suppressed with age were further suppressed by reprogramming (EMT, IFN signaling; Figure 3H).
However, suppression of the mesenchymal identity score and reduction in the aging score were not meaningfully correlated (Spearman r =- 0:28; p > 0:31). This result was replicated when we considered each of the two independent experimental batches from our screen separately (Spearman r < 0 for both). Some combinations reduced the aging score to roughly the same degree as the full Yamanaka Factor set, although suppressing mesenchymal identity significantly less (Wald test, p < 0:05). As two examples, the oncogene-free combination SO had a similar reduction in the aging score to SOKM but suppressed mesenchymal cell identity 54% less than the full set. Similarly, we compared a pro-youthful outlier combination (OM) and a highly suppressive combination (SOK) and found that OM had stronger pro-youthful changes across more than 100 genes. The combinations also significantly differed across multiple gene sets, including higher unfolded protein response activity in SOK and lower interferon signaling in OM (Figure 3I). We repeated these experiments at a smaller scale in adipogenic cells and found similar qualitative results (Figure S20). Our results suggest that suppression of cell identity and restoration of youthful gene expression are not tightly coupled, but none of the reprogramming combinations we tested here entirely disentangle the two phenotypes. Future work may therefore attempt to reduce the suppression of somatic identity while retaining pro-youthful benefits.
Partial multipotent reprogramming improves myogenesis in aged cells
Our results with the Yamanaka Factors suggest that alternative reprogramming strategies might also restore youthful expression. We wondered if partial reprogramming with multipotent reprogramming factors could also be effective. To test this hypothesis, we turned to a multipotent reprogramming system (as opposed to a pluripotent system) in myogenic cells inspired by limb regeneration in urodele amphibians using the transcription factor Msx1. Msx1/2 are known to dedifferentiate myocytes to a multipotent state and have documented roles in digit and limb regeneration (Odelberg et al., 2000; Yilmaz et al., 2015; Taghiyar et al., 2017). We performed a 3-day pulse/3–4-day chase of Msx1 expression using a doxycycline-inducible lentiviral system in aged myogenic cells in vitro, then profiled cells by singlecell RNA-seq in two independent experiments (Figure 4A). As in our pluripotent reprogramming experiments, we sorted transgene positive and negative cells based on reporter expression to capture control and reprogrammed mRNA profiles (Figure 4B; STAR Methods).
We captured myogenic cells in both progenitor (Pax7+) and differentiating states (Myog+). This is expected because our in vitro culture system provides a differentiation stimulus (Gilbert et al., 2010) (Figure 4B). It has previously been reported that aged myogenic cells do not differentiate into mature myocytes as effectively as young cells (Ulintz et al., 2020; Kimmel et al., 2020a). We found that partially reprogrammed aged cells were more differentiated than aged control cells based on pseudotemporal distributions, expression of marker genes, and gene set analysis (Figures 4C, 4D, and S21D). This result was robust to independent analysis of each experiment (Figures S21A– S21C). We also measured an aging transcriptional signature derived from a previous study of myogenic differentiation and found that partial reprogramming significantly reduced the aging score (Figure 4C; STAR Methods, p < 0:01, t test) (Kimmel et al., 2020a). The myogenic aging score itself is strongly influenced by the differentiation program, such that pseudotime explains a significant amount of variation in the aging score (r2 = 0:58; p < 0:0001 Wald test). These results suggest that multipotent reprogramming with Msx1 can partially restore youthful gene expression in myogenic cells, similar to the Yamanaka Factors in adipogenic cells.
DISCUSSION
Aging induces broad gene expression changes across diverse mammalian cell types, and these changes have been linked to many of the prominent hallmarks of aging (Kimmel et al., 2019; The Tabula Muris Consortium, 2020; Lo´pez-Otı´n et al., 2013). Cell reprogramming experiments have shown that young animals can develop from adult cells and aging features can be erased through complete reprogramming to pluripotency (Gurdon, 1962; Takahashi and Yamanaka, 2006; Lapasset et al., 2011). Recent reports have further suggested that transient expression of the Yamanaka Factors is sufficient to reverse features of aging and improve cell function (Ocampo et al., 2016; Sarkar et al., 2020; Lu et al., 2020; Gill et al., 2021). However, it was unclear whether these partial reprogramming interventions suppress somatic cell identities, if all of the Yamanaka Factors are required, or whether alternative reprogramming strategies could restore youthful gene expression.
Here, we investigated these questions using single-cell measurements of gene expression to capture the phenotypic trajectory of partial reprogramming and evaluate the impact of alternative reprogramming methods. Partial reprogramming restored youthful expression in many aging genes, but the degree of restoration was cell identity dependent, and some gene programs (e.g., PRC2 targets) appear resistant to restoration (Figures 1, S5, and S10). We also found that a subset of youthful expression features persist even after longer-term withdrawal of reprogramming factors (Figure S15), adding nuance to previous reports that the pro-youthful effects of partial reprogramming were reversed on short timescales (Ocampo et al., 2016). We cannot determine from our data which of these expression changes may confer youthful function or potentially have other, undesired effects. Decomposing the relevant effectors from other effects of reprogramming remains an important area for future research.
We also found that partial reprogramming suppressed somatic cell identities and upregulated hallmark pluripotency programs in diverse cell types both in vitro and in vivo (Figure 2), contrary to some previous reports (Ocampo et al., 2016; Sarkar et al., 2020; Lu et al., 2020) but consistent with timecourse iPSC reprogramming experiments (Figure S11) and lineage-tracing studies of transient SOKM expression (Maza et al., 2015). We predicted based on RNA velocity and dynamical systems analysis (La Manno et al., 2018; Bergen et al., 2020; Strogatz, 2015) that this suppression of somatic identity would be transient, and we observed that somatic identity is indeed restored after longer-term withdrawal of reprogramming factors (Figure S15). We further found that somatic identities are most likely regained through state transitions, rather than cell selection effects, using cell line and mathematical models (Figure S16, Note S1). Our experiments and analyses therefore support a model in which partial reprogramming transiently suppresses diverse somatic identity programs that are later reacquired through differentiation. Future lineage-tracing experiments that indelibly mark partially reprogrammed cell states will provide further supporting evidence for this model.
Restoring youthful gene expression can improve tissue function, implying that partial reprogramming may be therapeutic. However, pluripotent reprogramming is well-known to be an pro-neoplastic process (Abad et al., 2013; Mosteiro et al., 2016; Chen et al., 2021), even when Myc is excluded from the reprogramming set (Senı´s et al., 2018). Neoplastic doses of reprogramming factors in in vivo studies are often very close to doses that confer youthful benefit. For example, two in vivo partial reprogramming studies found that increasing the therapeutic induction time or SOKM copy number only two-fold led to 100% mortality (Ocampo et al., 2016; Chen et al., 2021). Given this context, our results demonstrating somatic identity suppression across diverse cell types raise the hypothesis that even partial reprogramming with SOKM may promote neoplasia. We propose that these neoplastic risks may be masked in previous in vivo studies due to the low rate of such risks, small sample sizes employed, short timescales of the studies, and specific features of gene induction systems used (e.g., local delivery of reprogramming factors).
Prior to this work, it was unknown which of the Yamanaka Factors were required to restore youthful gene expression or which subsets might exhibit distinct effects during partial reprogramming. Previous studies have explored only one set of factors at a time, preventing accurate comparisons to address these questions (Ocampo et al., 2016; Sarkar et al., 2020; Lu et al., 2020; Gill et al., 2021). Our pooled screens of all possible Yamanaka Factorsubsets revealed that combinations of 3–4 Yamanaka Factors have remarkably similar effects, suggesting that no single factor is required to restore youthful gene expression. Combinations of two Yamanaka Factors were also more similar to the full SOKM set than to control or single factor perturbations, and nearly all reprogramming factor combinations reduced an aging gene expression score (Figure 3). Our screen demonstrates that no single pluripotency factor is required to mask features of aging and suggest that oncogene-free reprogramming strategies may also restore youthful gene expression. These results are consistent with previous reports that subsets of the Yamanaka Factors are in fact sufficient to induce pluripotency (Wernig et al., 2008; Nakagawa et al., 2008; Utikal et al., 2009; Velychko et al., 2019). Similarly, recent work suggests that other pluripotency factors can restore youthful cell function (Shahini et al., 2021). Our multipotent reprogramming experiments in myogenic cells further support this hypothesis, indicating that youthful gene expression may be restored even without activating the pluripotency factors (Figure 4).
Given the proposed therapeutic applications of SOKM reprogramming, we believe even a modest risk of neoplasia (e.g., 0:1% of animals) as suggested by our data and previous reports is highly undesirable. Identifying alternative reprogramming strategies to restore youthful gene expression with lower neoplastic risk is therefore a worthwhile goal. Toward this aim, we have shown that partial reprogramming with multiple subsets of the Yamanaka Factors induces highly similar transcriptional effects to the full set and that a distinct multipotent reprogramming system can confer youthful expression. These results suggest the feasibility of disentangling the rejuvenative and pluripotency inducing effects of partial reprogramming in future work and serve as a resource for further interrogation of partial reprogramming effects in aged cells.
We have not directly measured whether partial reprogramming can improve physiological cell functions. Future work is required to determine if the restoration of youthful expression we observe is sufficient to improve cell and tissue function in the cell types we interrogated. We likewise have not directly measured the neoplastic risks that may be posed by partial reprogramming in vivo. In future work, in vivo partial reprogramming or transplant experiments with large numbers of animals are required to quantify these risks (e.g., 162 animals are required to detect a 1% risk of neoplasia). For our comparisons of partial SOKM reprogramming effects across cell types, we do not have the ability to finely tune transgene levels in our experimental system. Some differences between cell types may therefore be due to differences in the level of transgene activation achieved in each cell population.
STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:
● KEY RESOURCES TABLE
● RESOURCE AVAILABILITY
Lead contact
Materials availability
Data and code availability
● EXPERIMENTAL MODEL AND SUBJECT DETAILS
Animals
Cell culture
● METHOD DETAILS
Cell isolation
Lentiviral cloning and production
Lentiviral transduction
Partial reprogramming with polycistronic vectors
Screening Yamanaka Factor subsets
Partial multipotent reprogramming in myogenic cells
Single cell RNA-seq
Read alignment and cellular demultiplexing
mRNA profile denoising, latent variable inference, and data integration
MULTI-seq demultiplexing
Bulk RNA-seq of young and aged MSCs
Estimating the contribution of covariates to total variation
Differential expression
Gene set enrichment analysis
Generalized additive modeling of gene expression over pseudotime
Estimating aging magnitudes in transcriptional space
RNA velocity analysis
Pseudotemporal trajectory inference
Phase simulations in RNA velocity fields
Cellular age classification
Embedding reprogramming factor perturbations with scNym
Scoring cell identity programs with scNym
Interpreting scNym models with expected gradient attributions
Scoring aging gene expression in pooled screening experiments
Scoring aging gene expression in myogenic multipotent reprogramming experiments
Analysis of orthogonal reprogramming effects
Data integration and comparison of polycistronic and pooled MSC reprogramming
Analysis of 10 day chase experiments in adipogenic cells
Re-analysis of iPSC reprogramming timecourse
Re-analysis of Tabula Muris Senis adipose tissue aging
Re-analysis of reprogramming in murine B cells
Re-analysis of in vivo partial reprogramming in the murine retina
Re-analysis of in vivo partial reprogramming in the murine heart
Re-analysis of in vivo partial reprogramming in the murine skeletal muscle
Somatic identity and pluripotency program scoring in public partial reprogramming data
Timecourse partial reprogramming and population analysis in C2C12 myoblasts
Mathematical modeling of population dynamics in partial reprogrammed cells
mRNA and protein half-life analysis
● QUANTIFICATION AND STATISTICAL ANALYSIS
SUPPLEMENTAL INFORMATION
Supplemental information can be found online at https://doi.org/10.1016/j.
cels.2022.05.002.
ACKNOWLEDGMENTS
We would like to thank David Botstein, Maria Ingaramo, Calvin Jan, David G. Hendrickson, David R. Kelley, Margaret Roy, and Andrew G. York for helpful discussions. All funding for this study was provided by Calico Life Sciences, LLC.
AUTHOR CONTRIBUTIONS
A.E.R. designed and performed molecular biology experiments and contributed intellectually. C.Z. and J.Z.-S. performed animal experiments. J.P. performed cytometry experiments. E.M. and T.V. performed bulk RNA-seq experiments. G.K. supervised research. C.K. contributed intellectually and edited the paper. J.C.K. conceived the study; designed experiments; performed cell culture, molecular biology, reprogramming, and single-cell RNA-seq experiments; analyzed data, supervised research, and wrote the paper.
DECLARATION OF INTERESTS
All authors were employees of Calico Life Sciences, LLC at the time of this
study. J.C.K. is an employee of NewLimit, Inc.
Received: June 6, 2021
Revised: February 15, 2022
Accepted: May 10, 2022
Published: June 10, 2022
RESOURCE AVAILABILITY
Lead contact
Further information and requests for resources, data, and software should be directed to and will be fulfilled by the lead contact, Jacob C. Kimmel (该Email地址已收到反垃圾邮件插件保护。要显示它您需要在浏览器中启用JavaScript。).
Materials availability
Plasmids generated in this study are available by request.
Data and code availability
Our data have been submitted to the NCBI Gene Expression Omnibus under accession numbers GEO: GSE176206 and GSE197437.
Our data have also been made available at https://reprog.research.calicolabs.com/. No original code is reported with this paper. Any
additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Animals
We isolated primary cells from the subcutaneous adipose tissue (inguinal pad) and limb muscles of young (2-4 months old) and aged (20-30 months old) C57Bl/6 male mice. Mice were ordered from the Jackson labs and aged at Calico Life Sciences, LLC. Young animals were allowed to acclimate for at least 3 weeks prior to experimentation. All animal experiments were approved by the Calico Institutional Animal Care and Use Committee, an AAALAC-accredited body.
We isolated cells from two animals of each age for adipogenic SOKM reprogramming experiments (aged animals, 28 months old) and three animals of each age for mesenchymal stem cell SOKM experiments (aged animals, 23 months old). We isolated MSCs from three animals of each age for one independent MSC pooled screening experiment, and from two animals of each age in a second independent experiment (aged animals, 28 months old). We isolated adipogenic cells from two animals of each age for one adipogenic pooled screening experiment (aged animals, 28 months old), and from three animals of each age in a second experiment (aged animals, 30 months old). For myogenic cell experiments, we isolated cells from two aged animals (28 months old) for the first experiment and three aged animals (30 months old) for the second experiment.
Cell culture
Adipogenic cells were passaged prior to experiments. Cells were dissociated with TrypLE reagent (Gibco) and seeded at 100,000 cells/well in a 6 well plate, using separate wells for each animal and transduced with lentivrus after overnight incubation. Musclederived MSCs were passaged once prior to transduction using TrypLE. MSCs were seeded at 50,000 cells/well in a 6 well plate for polycistronic reprogramming experiments and 25,000 cells/well for pooled screening experiments. For arrayed screening experiments, MSCs were seeded at 7,500 cells/well in a 24 well plate, one plate per animal. For pooled screening in adipogenic cells, cells were seeded at 100,000 cells/well in 6 well plates. Myogenic cells were expanded for 5-7 days prior to lentiviral transduction in myogenic growth media and were passaged for seeding at 100,000 cells/well in 6 well plates for transduction. Myogenic cells were passaged with Cell Dissociation Buffer (Gibco). We cultured C2C12 murine myoblasts in C2-growth media (DMEM, 20% FBS, 1% Pen/Strp; Gibco) for expansion and switched to C2-plating media (DMEM, 10% FBS, 1% Pen/Strep) for reprogramming experiments.
METHOD DETAILS
Cell isolation
We isolated adipogenic cells from subcutaneous adipose tissue by dissecting tissue into small (1-2 mm) pieces with scissors and performing a 0.1% (w/v) collagenase II (Gibco) digestion in DMEM at 37o C for 1 hour. Adipogenic cell suspensions were washed with complete media (10% FBS, DMEM) and incubated in a 6-well plate overnight with one well per animal of origin.
We isolated mesenchymal stem cells from limb muscle tissue by dissecting muscle into small pieces with scissors and performing two enzymatic digestions. We first digested in 0.2% (w/v) collagenase II in DMEM for 90 minutes, then washed cell suspensions in digestion wash buffer (F10, 10% horse serum) and performed a second incubation in 0.4% (w/v) dispase II (Gibco) for one hour. Digested muscle cell suspensions were filtered through a 70 mm cell strainer, then a 40 mm cell strainer and stained with fluorophoreconjugated antibodies. We sorted mesenchymal stem cells as CD31 , CD45 , PI , Sca1+ by FACS. MSCs were plated in one well each of a 6-well plate for each animal and incubated for 24 hours.
We isolated myogenic cells by performing the same digestion described for MSCs on limb muscle tissue. Once single cell suspensions were obtained, we performed a pre-plating procedure by incubating cells in a sarcoma-derived ECM coated well plate for 10 minutes, then immediately transferring cell suspensions to new ECM coated wells. This pre-plating step captures the majority of adherent, non-myogenic cells in the suspension and allows for capture of a high purity myogenic cell population in the final wells. Cells were incubated overnight in myogenic growth media (F10, 20% FBS, 1% Pen/Strep; Gibco, supplemented with 5 ng/mL rFGF2; R&D Systems).
Lentiviral cloning and production
For polycistronic reprogramming experiments, we synthesized murine cDNAs for Pou5f1, Klf4, Myc, and Sox2. We removed stop codons from all cDNAs except Sox2 and concatenated cDNAs into a polycistron using 2A-peptide sequences for a final open reading frame O-P2A-K-T2A-M-E2A-S. We inserted this OKMS polycistron into a 3rd generation lentiviral transfer vector flanked by a 5’ TRE3G tetracycline-inducible promoter and a 3’ woodchuck promoter response element. We also included a cytomeglovirus (CMV) promoter driven mCherry reporter transcript 3’ from the WPRE prior to the 5’ lentiviral terminal repeat (LTR). We refer to this vector as LTV-Y4TF.
For pooled screening experiments, we synthesized cDNAs for the Yamanaka Factors followed by an EF-1 alpha short (EFS) promoter driven eGFP with a unique 8-mer barcode in the 3’ UTR, followed by an internal SV40 polyA signal within the lentiviral genome. We cloned each Yamanaka Factor transgene into lentiviral transfer vectors with a 5’ TRE3G promoter to drive expression of the Yamanaka Factor. These designs were inspired by the CellTag lineage-tracing system (Guo et al., 2019a) and allow us to detect the complement of Yamanaka Factors that transduced a given cell based on recovery of unique barcodes, even though the Yamanaka Factor transgenes are only transiently expressed. We refer to these vectors as LTV-S-BC, LTV-O-BC, LTV-K-BC, and LTV-M-BC where BC indicates a 3’ expressed barcode.
For multipotent reprogramming experiments in myogenic cells, we cloned the Msx1 cDNA with a separable mCherry tag Msx1-T2A-mCherry into TRE3G driven lentiviral vector with a 3’ CMV driven eGFP reporter. We refer to this vector as LTV-Msx1.
We used four different tetracycline-transactivator lentiviral vectors, all harboring the same Tet3G tetracycline transactivator allele flanked by different reporters to allow compatibility with different reprogramming vectors. For polycistronic reprogramming in muscle MSCs, we used a CMV driven Tet3G-T2A-mCerulean vector (LTV-Tet3G-mCerulean; adapted from VectorBuilder cat. VB180123- 1018bxq by Gibson assembly). For polycistronic reprogramming in adipogenic cells, we used a vector harboring an EF1a driven Tet3G ORF followed by a CMV driven eGFP-T2A-PuromycinR reporter (LTV-Tet3G-eGFP; VectorBuilder cat. VB900088- 2774nkq). For pooled screening, we used a vector harboring a CMV driven Tet3G followed by a 3’ WPRE and CMV driven mCherry reporter (LTV-Tet3G-mCherry; VectorBuilder cat. VB900088-2776tfj). For generation of an inducible reprogramming C2C12 cell line, we used a CMV driven Tet3G-T2A-Hydro vector (LTV-Tet3G-Hygro; VectorBuilder cat. VB180123-1018bxq).
We packaged lentivirus by transfecting HEK293T cells with LV-MAX lentiviral packaging plasmids (Gibco) and the LTR-containing transfer vector of interest using Lipofectamine 3000 (Gibco). We collected supernatant from the viral packaging cells 24 and 48 hours after transfection, cleared supernatant by centrifugation, filtered supernatant with 0.45 mm filters, and concentrated the cleared supernatant using PEG-it precipitation reagent (SystemBio). Lentiviral constructs were titered by transducing HEK293T cells with serial dilutions of virus and measuring fluorescent reporter expression frequency after 72 hours. We also prepared lentiviral particles through commercial vendors that use similar protocols.
Lentiviral transduction
We performed lentiviral transductions using a standard spinfection method for all cell types. For adipogenic cells, we mixed the appropriate viral titer with complete growth media supplemented with [8 mg/mL] polybrene and replaced growth media with the viral suspension. We centrifuged cells in well plates at 2000 x g for 1 hour and incubated overnight before exchanging growth media. For MSCs, we similarly added viral suspension, centrifuged cells at 2000 x g for 1 hour, and incubated cells overnight before exchanging media. For primary myogenic cells, we added viral suspension and transduced by spinfection for one hour at 1,500 x g with a polybrene adjuvant ([8 mg/mL] in media). We transduced C2C12 myoblasts using the same protocol used for primary myogenic cells.
Partial reprogramming with polycistronic vectors
We performed partial reprogramming in adipogenic cells using LTV-Y4TF and LTV-Tet3G-eGFP. We transduced at MOI 30 in growth media containing polybrene ([8 mg/mL]) by spinfection using a 1 hour centrifugation at 2000 3 g. We replaced growth media after an overnight incubation and began a three day pulse of Dox ([4 mg/mL]), exchanging media every 24 hours, followed by a three day chase. We performed the same protocol for muscle-derived MSCs using LTV-Y4Tf and LTV-Tet3G-mCerulean each at MOI 10. For muscle-derived MSCs, we also included a control group transduced with virus, but never exposed to Dox. Following the chase period, cells were dissociated and we sorted cells expressing both LTV reporters (LTV-Y4TF : mCherry, LTV-Tet3G : eGFP or mCerulean) against all other cells by FACS. This sorting allowed us to enrich for transduced cells and profile treated (dual transgene positive) and untreated (missing at least one transgene) conditions to individual cells from within the same well, serving as an in situ control. To evaluate the duration of partial reprogramming effects, we performed three independent long-term chase experiments using a 3 day Dox pulse/10 day chase schedule. After cell sorting, cells from these longer duration experiments were fixed by incubation in ice-cold 80% methanol for 30 minutes at -20o C and stored at -80o C prior to library preparation.
Screening Yamanaka Factor subsets
We performed a screen of Yamanaka Factor subsets in two distinct experiments in MSCs. In the first experiment, we seeded cells in 24 well plates and delivered each combination of Yamanaka Factors to one well of the plate at high MOI (MOI = 8) for each virus. We delivered a tetracycline-transactivator virus (LTV-Tet3G-mCherry) with all combinations (MOI = 8). Before sequencing, we labeled the complexity of each perturbation (e.g. number of unique factors) with a unique cholesterol modified oligo to compare to our in silico demultiplexing. We also included a well of doxycycline untreated and six wells of doxycycline treated, untransduced cells as additional sources of non-treated controls. In the second experiment, we seeded cells in 6 well plates and delivered all Yamanaka Factors in a pool to each well. We transduced one well per animal at both low and moderate MOIs (MOI = 3, 6) and pooled cells prior to sequencing. Two MOIs were used to increase representation of more complex perturbations. In both experiments, we pulsed Yamanaka factors by introducing doxycycline [4 mg/mL] for three days and chased for 3 days. We likewise performed pooled screening in two distinct experiments for adipogenic cells. In the first experiment, we isolated cells from two young and two aged animals and transduced with MOI 8. In the second, we isolated cells from three young and three aged (30 months old) animals and trasduced with MOI 8. Adipogenic screens were performed with 100,000 cells/well in 6 well format.
Partial multipotent reprogramming in myogenic cells
We performed partial reprogramming with Msx1 in myogenic cells in two independent experiments. In both experiments, we seeded 100,000 myogenic cells/well in a 6 well plate in myogenic growth media. We transduced cells at MOI 15 with LTV-Msx1 and LTVTet3G-mCherry. We incubated cells for 12 hours then exchanged viral suspension with fresh growth media containing [4 mg/mL] Dox. We replaced media daily for three days with Dox, then began a chase period where Dox-free media was used. We used a three day chase for the first experiment and a four day chase for the second. We sorted for cells expressing fluorescent reporters for both transgenes against other cells by FACS. This sorting allowed us to enrich for transduced cells and profile treated (dual transgene positive) and untreated (missing at least one transgene) conditions to individual cells from within the same well, serving as an in situ control.
Single cell RNA-seq
We performed cell hashing with cholesterol-modified oligos (CMOs) to label cells from individual animals with unique barcodes. We used a unique group of 1-2 CMOs (Integrated DNA Technologies) for cells from each animal and labeled cells following the MULTIseq protocol (McGinnis et al., 2019). Single cell RNA-seq libraries were prepared using the Single Cell Gene Expression v3.1 chemistry (10x Genomics, Pleasanton, CA). Single Cell Gene Expression v3 chemistry was used for 10d chase experiments in adipogenic cells. Cells were emulsified using the Chromium controller (10x Genomics) and libraries were subsequently prepared following the library kit protocol. We ran young and aged cells in separate lanes of the 10x instrument for all experiments to provide the highest fidelity sample demultiplexing. For all experiments except the pooled screens, we additionally ran transgene-positive cells carrying both the reprogramming vector and a tetracycline-transactivator and transgene-negative cells (lacking at least one vector) in separate lanes. We added the MULTI-seq additive primer during the cDNA amplification step to allow for CMO barcode amplification, as described in the MULTI-seq protocol. For myogenic reprogramming experiments, we combined myogenic cells with highly distinct cell types prepared for unrelated experiments in the same lane and extracted myogenic cells in silico for analysis. We also pooled myogenic cells derived from different animals without CMO barcoding to avoid cell loss for this rare cell type. Fixed cells from long-term chase experiments were first resuspended in rehydration buffer (0.4% bovine serum albumin, 1 mM DTT, 0.2 U/mL NxGen RNase inhibitor in 3X SSC buffer) and pooled across animals by age and treatment condition for library preparation.
Read alignment and cellular demultiplexing
We pseudoaligned reads to the mm10 reference genome using ‘‘kallisto’’ and performed cell barcode demultiplexing and UMI read aggregation using the ‘‘kallisto — bustools’’ workflow (Bray et al., 2016; Melsted et al., 2019). For experiments with transgenic constructs, we modified the mm10 reference to include lentiviral genomes as additional chromosomes in the reference genome. This allowed us to detect the presence of transgenic transcripts in our sequencing data.
We assigned MULTI-seq library reads to cell barcodes using ‘‘kite’’ (Gehring, 2021). For pooled screening experiments, we additionally quantified the number of lentiviral transgene barcodes in each cell using ‘‘kite’’.
mRNA profile denoising, latent variable inference, and data integration
We denoised mRNA abundance profiles using scVI (Lopez et al., 2018) models with 64 latent variables. We fit scVI models for 1000 epochs using early stopping to minimize the evidence lower-bound (ELBO). For MSC pooled screening experiments, we integrated our arrayed and pooled transduction experiments by injecting batch covariates into the scVI model. Similarly, we used batch covariate injection to integrate across independent adipogenic pooled screening experiments and myogenic multipotent reprogramming experiments. For the latter application, we reduced the latent space size to 32 variables to avoid overfitting to a smaller dataset. For myogenic experiments, we additionally employed ‘‘harmony’’ integration on a PCA decomposition of the scVI latent space to account for batch effects across independent experiments (Korsunsky et al., 2019). To construct a shared latent space between adipogenic cells and MSCs, we fit an scVI model to both cell populations simultaneously. To integrate our adipogenic cell experiments with different chase periods, we fit an scVI model with batch covariate injection and a 16-dimensional latent space, using only the 3,000 most highly variable genes as input to improve integration performance (Luecken et al., 2022). We separately fit a model using all genes as input for differential expression analysis. For all experiments, we constructed a nearest neighbor graph in the scVI latent space and projected this graph into two dimensions for visualization with UMAP.
MULTI-seq demultiplexing
We used the ‘‘hashsolo‘‘ approach to demultiplex MULTI-seq read counts in both adipogenic and MSC experiments with a prior distribution of ½0:02; 0:94; 0:04 for negative, singlets, and doublets respectively (Bernstein et al., 2020). We manually verified ‘‘hashsolo’’ classifications by clustering cell profiles using CMO counts and inspecting read distributions. We adjusted cell labels based on manual inspection and thresholding where appropriate. We only used confident classification calls for animal-specific analysis.
Bulk RNA-seq of young and aged MSCs
We performed bulk RNA-seq on young and aged MSCs in three independent experiments. Cells were collected from three young (3-4 months) and three aged (20-24 months) animals in each experiment. In the first and second experiment, freshly-isolated cells were cultured for 11-14 days before RNA collection In the third experiment, cells from the second fresh cell experiment were cryopreserved in Recovery Cell Culture Freezing Media (Gibco), then thawed and cultured for 7 days before RNA collection. RNA was isolated from all cells using a Zymo Quick RNA kit (Zymo Research) and RNA-seq libraries were prepared with NEBNext Ultra II Directional RNA Library Prep Kit (New England Biolabs). Libraries were sequenced on an Illumina NovaSeq. We pseudoaligned reads with kallisto (Bray et al., 2016) and quantified differential expression with sleuth (Pimentel et al., 2017). We extracted an aging gene signature by selecting genes that changed in the same direction with age under a loose FDR threshold (q < 0:4) across both freshly isolated cell experiments and cryogenically preserved cell experiments. This procedure yielded a signature of 198 aging genes which we used to score aging gene expression in our MSC SOKM reprogramming experiment.
Estimating the contribution of covariates to total variation
We estimated the contribution of experimental covariates (age, transgene treatment) to the total variation observed in our experiments using ANOVA. We treated the scVI latent encoding as a set of response variables and fit linear models for each variable of the form zj age + treatment + age : treatment where zj is a latent variable. We also assessed the contribution of the animal-of-origin for each cell using animal-specific labels derived from MULTI-seq demultiplexing. We performed ANOVA as above with animal-specific labels using models of the form zj age treatment + animal, where ‘‘animal’’ is a label specific to each ‘‘age:treatment’’ combination because we ran ‘‘age:treatment’’ combinations in separate library preparation reactions. We included ‘‘negative’’ classifications from ‘‘hashsolo‘‘ in this regression in case a CMO hashing failure captured structured variation.
Differential expression
We performed differential expression testing across binary contrasts by estimating log-fold change distributions for each gene using Monte Carlo approximations from the scVI posterior distribution (Boyeau et al., 2019). We estimated the false discovery rate (FDR) of differential expression as the fraction of Monte Carlo samples that did not show a log-fold change above a minimum threshold (jlog2a=bjR0:5). To test continuous covariates and cell type:reprogramming interaction effects, we used logistic/Gaussian hurdle models inspired by MAST (Finak et al., 2015) for individual genes and logistic models for gene program scores on the unit interval [0,1], as previously described (Kimmel et al., 2020a). We performed FDR control for MAST models with the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995). We performed differential expression testing on pseudotime covariates using models of the form‘‘Gene ^ Pseudotime’’. We tested cell type:reprogramming interactions with models of the form ‘‘Gene Age + Cell Type + Treatment + Cell Type : Treatment’’.
Gene set enrichment analysis
We performed enrichment analysis for Gene Ontology terms using Enrichr (Kuleshov et al., 2016). We performed rank-based Gene Set Enrichment Analysis (GSEA) using the ‘‘fast GSEA’’ (Kuleshov et al., 2016) implementation of the GSEA algorithm (Subramanian et al., 2005) and gene sets extracted from MSigDB (Liberzon et al., 2015).
Generalized additive modeling of gene expression over pseudotime
We fit generalized additive models (GAMs) to express relationships between genes or gene programs and continuous pseudotime coordinates. We fit Gaussian GAMs of the form ‘‘Gene Pseudotime’’ using six cubic splines through the PyGAM framework (Serve´net al., 2018). We represent expression trends over pseudotime using the mean value of GAM predictions along with the 95% confidence interval.
Estimating aging magnitudes in transcriptional space
We estimated an aging magnitude as the difference between young and aged cell populations. We estimated this difference between two populations using the maximum mean discrepancy (MMD) computed on scVI latent variables. scVI latent variables capture an expressive, low-dimensional representation of gene expression and are therefore useful for this task. We computed the MMD our previously described ‘‘scmmd’’ package over a series of bootstrap random samples to ensure robustness (Kimmel et al., 2020a). Briefly, we sampled n = 300 cells from each population and computed an MMD at each of 500 iterations. We performed these heterochronic MMD comparisons to compare young, untreated cells to both aged, untreated cells and aged, reprogrammed cells. We assessed the significance of changes in the MMD using the Wilcoxon Rank Sum test across bootstrap iterations. We also performed isochronic comparisons across animals of the age (i.e. young cells to young cells, aged cells to aged cells) as a control. For these comparisons, we subset to the samples from each age that had the highest confidence MULTI-seq labels as a conservative measure. For young cells, we used Tg- cells, and for aged cells we used Tg+ cells, as these had the highest confidence MULTI-seq labels.
RNA velocity analysis
We counted estimated the fractions of spliced and unspliced reads for each transcript based on the alignment of reads to exons and introns using ‘‘kallisto — bustools’’ to an mm10 reference genome from Ensembl (Bray et al., 2016; Melsted et al., 2019). We estimated RNA velocity (La Manno et al., 2018) using the stochastic model implemented in ‘‘scvelo’’ (Bergen et al., 2020).
Pseudotemporal trajectory inference
We inferred pseudotime trajectories for adipogenic cells and MSCs using ‘‘scvelo’’. Briefly, we constructed a nearest neighbor graph in the embedding space and weighted edges in the graph based on the directionality of RNA velocity vectors such that neighbors in the path of the RNA velocity vector received higher transition probabilities. We then paired this matrix with the diffusion pseudotime inference method (Haghverdi et al., 2016). We inferred pseudotime trajectories for myogenic cells using diffusion pseudotime analysis on a nearest neighbor graph constructed in diffusion component space. We selected root cells from within the top decile of Snai2 (primitive marker gene) expression.
To confirm the robustness of our trajectory inference results to the choice of pseudotime algorithm, we also performed pseudotime analysis with ‘‘Palantir’’ (Setty et al., 2019). We chose the Palantir method based on reports of strong performance and because it does not use any RNA velocity information during inference, providing an orthogonal approach to scvelo to confirm the robustness of our results. We inferred pseudotime trajectories for adipogenic cells and MSCs using Palantir in a manner similar to scvelo. Palantir applies a diffusion analysis to measure the distance from a root node to cells along the nearest neighbor graph, assigning each cell both a coordinate of distance from the root and a probability of falling within one of an inferred number of distinct fate trajectories. We chose root cells in each cell type from a cluster at the end of the reprogramming trajectory as projected with UMAP, then reversed the inferred coordinate assignments so that higher numbers indicated more reprogrammed cells (note: the direction of the coordinate numbers is arbitrary and does not effect analysis). Palantir inferred two trajectories in adipogenic cells, mapping to the adipogenic and secretory basal states. We used the adipogenic cell trajectory for marker gene analysis by selecting cells with ¿50% probability of falling into the adipogenic basal fate. Palantir inferred only a single trajectory for MSCs, matching our observation of a single basal state. We also confirmed robustness using the ‘‘latent pseudotime’’ inferred purely from RNA velocity measurements by scvelo.
Phase simulations in RNA velocity fields
We performed phase simulations in RNA velocity fields using our previously described ‘‘velodyn’’ package (Kimmel et al., 2020a). For both adipogenic cells and MSCs, we initialized n = 1000 phase points within the transiently reprogrammed cell population at positions x0 and evolved their positions based on the RNA velocity vectors of their neighbors for t = 500 timesteps with a step size of h = 0:5. We used an update rule
Cellular age classification
We trained semi-supervised variational autoencoder models using the scANVI framework to discriminate cell age (Xu et al., 2019). Prior to model training, we split our data into a training set (80%) and held-out test set (20%) with stratification. We trained scANVI models for 100 unsupervised epochs (reconstruction loss only) and 300 semi-supervised epochs (reconstruction and classification losses) using 10% of the training set as validation data for early stopping. We evaluated model performance based on the accuracy of classifications in the held-out test set, unseen during model training.
This article is excerpted from the Cell Systems 13, 574–587.e1–e11, July 20, 2022 by Wound World.