Abstarct

This tutorial will walk you to perform a complete analysis with MOSClip R package.

MOSClip is a method to combines survival analysis and graphical model theory to test the survival association of pathways or of their connected components that we called modules in a multi-omic framework. Multi-omic gene measurements are tested as covariates of a Cox proportional hazard model after dimensionality reduction of data. The final goal is to find the biological processes impacting the patient’s survival.

MOSClip has a modular structure, allowing the use of one or multiple different omics, as well as different data reduction strategies and tests.

In this tutorial we will focus on the integration of four omics: methylome, transcriptome, genomic mutations and genomic copy number variations, testing if these omics can be sinergically involved in pathways with survival prognostication power.

Furthermore, in MOSClip multiple efforts have been dedicated to the implementation of specific graphical tools to browse, manage and provide help in the interpretation of results. In this tutorial we will also exploit these tools to represent analysis results.

Multi-omics TCGA ovarian cancer data analyses

We used some of the MOSClip functions to identify pathways (pathway analysis) and modules (module analysis) describing the pathway associated with Progression Free Survival (PFS)

Prepare the multi-omic dataset and pathways

First of all, we load the necessary libraries.

library(org.Hs.eg.db)
library(pheatmap)
library(RColorBrewer)
library(EDASeq)
library(graphite)
library(houseOfClipUtility)
library(MOSClip)

Data retrieving and data analysis will depend on your computational resources but generally need time and disk space. To speed up this tutorial, we preprocessed the data for you. Details on data preparation are available in Material and Method session of the paper.

Thus, download “TCGA-OV-hg38-preprocessed-externalSurv.RData” available at github.com/cavei/example-datasets/.

The provided dataset includes matrices, genes per patients, of methylation status, somatic mutations, CNVs, and transcript expression levels of the TCGA ovarian cancer samples.

Move the downloaded file inside a directory called downloadTCGA in your working directory.

Now, we can load the preprocessed data.

load("downloadTCGA/TCGA-OV-hg38-preprocessed-externalSurv.RData")

We create a directory to store the data results, checking if the directory already exists.

dirname = "MOSresults"
if (!file.exists(dirname)) {
    dir.create(dirname)
}

The TCGA dataset came along with the survival annotations from the table by Liu et al, Cell, 2018. Among the data loaded we can find the object ‘fup’ (short of ‘followup’) that represents our survival annotations. We extract the slot for the ‘progression free survival’ (pfs) and we save it in survAnnotations.

Next step is modifiy all the multi-omics matrices assigning the type of gene identifier. Here, we will work with Entrez Gene ID, indicated with the prefix tag “ENTREZID:” compliant to Bioconductor org.dbi as used in the latest graphite R package version. This will allow us to be able to match the omics data to graphite pathways.

survAnnotations <- fup$pfs

expression <- rnaSeq
row.names(expression) <- paste0("ENTREZID:", row.names(expression))
mutation <- mutations$data
row.names(mutation) <- paste0("ENTREZID:", row.names(mutation))
names(metClustValues$eMap) <- paste0("ENTREZID:", row.names(metClustValues$eMap))
row.names(cnv) <- paste0("ENTREZID:", row.names(cnv))

Now, we are about to intersect different datasets and annotations to select those patient samples having all the four omics profiled.

We also filter genes in expression matrix keeping only those genes with at least 100 counts in at least one patients, to avoid data sparsity.

After the selection of patients and genes, we perform expression normalization and log2 of the counts+1 transformation. This will ensure us to work with expression data aproximately close to a normal distribution, the most suitable distribution for the subsequent MOSClip tests.

survAnnot <- na.omit(survAnnotations)
patients <- row.names(survAnnot)
patients <- intersect(patients, colnames(expression))
patients <- intersect(patients, colnames(metClustValues$MET_Cancer_Clustered))
patients <- intersect(patients, colnames(mutation))
patients <- intersect(patients, colnames(cnv))

survAnnot <- survAnnot[patients, , drop = F]

if (!file.exists("survAnnot.RData")) {
    file = paste0(dirname, "/survAnnot-", as.character(Sys.Date()), ".RData")
    link = "survAnnot.RData"
    save(survAnnot, file = file)
    file.symlink(file, link)
}

expression <- expression[, patients, drop = F]
keep = apply(expression >= 100, 1, any)
expNorm <- betweenLaneNormalization(expression[keep, , drop = F], which = "upper")
pseudoExpNorm <- log2(expNorm + 1)

methylation <- metClustValues
methylation$MET_Cancer_Clustered <- methylation$MET_Cancer_Clustered[, patients, 
    drop = F]

mutation <- mutation[, patients, drop = F]
cnv <- cnv[, patients, drop = F]

At this point, we need the pathway knowledge-base. We will use the Reactome pathway graphs available at graphite R package. Reactome is natively distributed in Uniprot, our analysis will be in Entrez gene ID, thus we need to convert the pathway identifiers.

Since the identifiers convertion of all the Reactome pathways takes time, once we have the converted pathways we will save them.

if (file.exists("reactome-entrez.RData")) {
    load("reactome-entrez.RData")
} else {
    reactome <- pathways("hsapiens", "reactome")
    reactome <- convertIdentifiers(reactome, "entrez")
    file = paste0(dirname, "/reactome-entrez-", as.character(Sys.Date()), ".RData")
    link = "reactome-entrez.RData"
    save(reactome, file = file)
    file.symlink(file, link)
}

Since Reactome database is not flat but it has a hierarchical structure, in which pathways are hierarchically organized, we are going to analyze only a subsets of them cutting the hierarchy on the basis of the pathway size.

For the pathway analysis, we are going to use reactHuge, a subset that have all the pathways with more (or equal) than 10 nodes; for module analysis, we are going to use reactSmall, so the pathways that are bigger than 20 nodes but smaller that 100 nodes. This will ensure a sufficient level of specificity of the pathway/modules and will avoid unusefull reaction redundancies.

nodesLength <- sapply(reactome, function(g) {
    length(intersect(graphite::nodes(g), row.names(pseudoExpNorm)))
})
reactSmall <- reactome[nodesLength >= 20 & nodesLength <= 100]
reactHuge <- reactome[nodesLength >= 10]

We can also download the pathway hierarchy of Reactome from reactome website (reactome.org), that can be used for summary plots at the end of the analysis.

pathHierarchy <- houseOfClipUtility::downloadPathwayRelationFromReactome()
pathHierarchyGraph <- igraph::graph.data.frame(d = pathHierarchy, directed = TRUE)

Prepare the input data: the Omics object

At this step we are almost ready to run MOSClip analysis, we just need to help MOSClip to organize the data in input. We do that creating the Omics object to wrap together expression, methylation, mutation and CNV matrices.

Given the nature of the methylation data, we expect to have more than a CpG cluster associated to a gene. For this reason, MOSClip provide the possibility to include a dictionary to associate the methylation level of multiple CpG clusters to a single gene. Thus, in the methylation specific arguments you need to provide the dictionary to convert cluster names into genes.

For each matrix we need to indicate also the data reduction strategy we want to apply.

In this tutorial we chose to use PCA for expression data, cluster analysis for methylation data, vote counting for mutations and CNVs (for detail see MOSClip paper). This data transformations are easily applied calling MOSClip functions, thus here we need only to provide the name of the needed function.

multiOmics <- Omics(data = list(expr = pseudoExpNorm, met = methylation$MET_Cancer_Clustered, 
    mut = mutation, cnv = cnv), methods = c("summarizeWithPca", "summarizeInCluster", 
    "summarizeToNumberOfEvents", "summarizeToNumberOfDirectionalEvents"), specificArgs = list(pcaArgs = list(name = "exp", 
    shrink = "FALSE", method = "sparse", maxPCs = 3), clusterArgs = list(name = "met", 
    dict = methylation$eMap, max_cluster_number = 3), countEvent = list(name = "mut", 
    min_prop = 0.05), cnvAgv = list(name = "cnv", min_prop = 0.05)))

save(multiOmics, file = "multiOmics.RData")

As you can see, in this object we specified the four omic data, we chose four methods for data reduction, one for each omic, and four lists of additional parameters (as described in the help of each reduction function).

MOSClip analysis can be performed at the pathway or at the module level, where modules are subpart of pathways. A pathway analysis give a general overview of the involved processes, the module analysis can highlights more precisely the mechanism involved.

MOSClip module analysis

We are going to perform the multi-omics survival analysis on pathway modules.

The analysis is quite long so we also save the analysis results for future usage. Here, we perform an a priori filter on those genes that have at least expression data, but it’s not mandatory.

genesToConsider <- row.names(pseudoExpNorm)

if (file.exists("multiOmicsReactome.RData")) {
    load("multiOmicsReactome.RData")
} else {
    multiOmicsReactome <- lapply(reactSmall, function(g) {
        print(g@title)  # uncomment this to see the progression along pathways
        set.seed(1234)
        fcl = multiOmicsSurvivalModuleTest(multiOmics, g, survAnnot, useThisGenes = genesToConsider)
        fcl
    })
    file = paste0(dirname, "/multiOmicsReactome-", as.character(Sys.Date()), ".RData")
    link = "multiOmicsReactome.RData"
    save(multiOmicsReactome, file = file)
    file.symlink(file, link)
}

The analysis is done now. MOSClip has plenty of functions to visually explore the results. In the following paragraphs we will show you some examples.

Module summary

Using the function multiPathwayModuleReport() we can plot the tabular summary of the top 10 modules selected by p-value of the Cox proportional hazard model.

moduleSummary <- multiPathwayModuleReport(multiOmicsReactome)

Top 10 modules by overall pvalue
	module	cnvNEG	cnvPOS	expPC1	expPC2	expPC3	met2k2	met3k2	met3k3	mut
Metabolism of polyamines	3	NA	NA	0.04	NA	NA	0	NA	NA	NA
Interleukin-12 family signaling	1	0.01	0.22	0.02	0	0.11	0.75	NA	NA	0.83
Netrin-1 signaling	10	NA	0.02	0.72	NA	NA	0	NA	NA	NA
Glycosphingolipid metabolism	22	0.12	NA	0	NA	NA	0.61	NA	NA	NA
Sphingolipid metabolism	20	0.12	NA	0	NA	NA	0.61	NA	NA	NA
Interferon alpha/beta signaling	3	NA	0.01	0	0.18	NA	NA	0.15	0.42	NA
Interleukin-12 signaling	10	NA	0.11	0.14	0	0.03	0.09	NA	NA	0.7
Gamma carboxylation, hypusine formation and arylsulfatase activation	3	NA	0.17	0	NA	NA	0	NA	NA	NA
Interleukin-12 family signaling	2	0.03	0.2	0.05	0	0.72	0.13	NA	NA	0.61
Collagen degradation	9	NA	0.02	0.01	NA	NA	NA	0.32	0	NA

Resampling strategy on modules test to prioritize modules

So far, we have identified some modules that are significant. We set up a resampling procedure, modules are prioritized if they show a resampling success score greater than 80 up to 100 resamplings. We applied the resampling procedure to the modules with p-value \(<= 0.05\).

useThisPathways <- unique(moduleSummary$pathway[moduleSummary$pvalue <= 0.05])
sModule <- moduleSummary[moduleSummary$pathway %in% useThisPathways, , drop = T]

if (file.exists("perms.RData")) {
    load("perms.RData")
} else {
    perms <- resampling(fullMultiOmics = multiOmics, survAnnot, reactSmall, nperm = 100, 
        pathwaySubset = useThisPathways)
    file = paste0(dirname, "/perms-", as.character(Sys.Date()), ".RData")
    link = "perms.RData"
    save(perms, file = file)
    file.symlink(file, link)
}

Once the resampling analyses are completed, we can explore the results.

stableModulesSummary <- selectStablePathwaysModules(perms = perms, moduleSummary = sModule, 
    success = 80)

resampligSuccessCount <- getPathwaysModulesSuccess(perms = perms, moduleSummary = sModule)

moduleSummary <- multiPathwayModuleReport(multiOmicsReactome)
moduleSummary <- addResamplingCounts(moduleSummary, resampligSuccessCount)

Graphical exploration of MOSClip module results

If we are interested in the pathway called “Activation of Matrix Metalloproteinases” we can visualize the table of module test results.

plotModuleReport(multiOmicsReactome[["Activation of Matrix Metalloproteinases"]], 
    fontsize_row = 14, MOcolors = c(exp = "red", met = "green", cnv = "yellow", mut = "blue"))

As you can see, among others, the module #2 of this pathway is significant. We can infer the omics that drives this survival behaviour from the pvalues of the model covariates: “PC1” (Expression) and “met2k2” (Methylation).

We can have a look at the pathway graph and the module position in the pathway using plotModuleInGraph()

plotModuleInGraph(multiOmicsReactome[["Activation of Matrix Metalloproteinases"]], 
    moduleNumber = 2, legendLabels = c("expr", "methylation", "cnv"), paletteNames = c(exp = "red", 
        met = "green", mut = "blue", cnv = "yellow"))

## 'select()' returned 1:1 mapping between keys and columns

Then, we can check the differences in survival using Kaplan Meier curves dividing patients in groups with different omics patterns.

plotModuleKM(multiOmicsReactome[["Activation of Matrix Metalloproteinases"]], 2, 
    formula = "Surv(days, status) ~ met2k + expPC1", paletteNames = "Paired", risk.table = T, 
    size = 2, inYears = T)

Finally, we can look at the predictive genes using heatmap and patient additional annotations.

additionalA <- survAnnot
additionalA$status[additionalA$status == 1] <- "event"
additionalA$status[additionalA$status == 0] <- "no_event"
additionalA$PFS <- as.factor(additionalA$status)
additionalA$status <- NULL
additionalA$years <- round(additionalA$days/365.24, 0)
additionalA$days <- NULL

plotModuleHeat(multiOmicsReactome[["Activation of Matrix Metalloproteinases"]], 2, 
    paletteNames = c(exp = "red", met = "green", cnv = "yellow"), additionalAnnotations = additionalA, 
    additionalPaletteNames = list(PFS = "yellow", years = "teal"), sortBy = c("met2k", 
        "expPC1", "PFS", "years"), withSampleNames = F)

## 'select()' returned 1:1 mapping between keys and columns
## 'select()' returned 1:1 mapping between keys and columns

Activation of Matrix Metalloproteinases Pathway

Super Exact test

To provide efficient computation of the statistical distributions of multi-omic pathway/module set intersections, we used the theoretical framework implemented in R package . A circle plot is returned with the frequency of all significant omic combinations and their significance levels.

mi <- runSupertest(moduleSummary, pvalueThr = 0.05, zscoreThr = 0.05, resampligThr = 80, 
    excludeColumns = c("pathway", "module", "resamplingCount"))

supertest plot

As you can see, the combination of expression and methylation is significant, this means that they co-regulate the same pathway modules more often than what expected by chance.

Frequency distribution with radial plot.

This plot shows the frequency distribution of pathways aggregated into macro categories, using Reactome or KEGG hierarchical structure, separately for each omic combinations. This plot suggests prognostic biological processes that may be impacted by the omics and their cross-talk.

omicsClasses2pathways <- computeOmicsIntersections(moduleSummary, pvalueThr = 0.05, 
    zscoreThr = 0.05, resampligThr = 80, excludeColumns = c("pathway", "module", 
        "resamplingCount"))

omicsClasses2pathways <- lapply(omicsClasses2pathways, stripModulesFromPathways)

omicsClasses2fathers <- lapply(omicsClasses2pathways, annotePathwayToFather, graphiteDB = reactome, 
    hierarchy = pathHierarchyGraph)

correspondence <- lapply(names(omicsClasses2pathways), function(omicClass) {
    data.frame(path = omicsClasses2pathways[[omicClass]], father = omicsClasses2fathers[[omicClass]], 
        stringsAsFactors = F)
})
MOMfreqDataframe <- houseOfClipUtility::computeFreqs(omicsClasses2fathers)

At this point, we have computed the frequencies. In the table of frequencies, we find pathways grouped together. As a fist example, we check where the combination of two or more omics points.

combiClass <- grep(";", MOMfreqDataframe$class)
MOMfreqDataframe.multi <- MOMfreqDataframe[combiClass, , drop = F]

colors <- c(`cnv;exp` = "#ff5722", `cnv;exp;met` = "#FFAB00", `cnv;met` = "#A5F037", 
    `cnv;mut` = "#30B30D", `exp;met` = "#6c5b07", `exp;mut` = "#A200AD", `met;mut` = "#01B7BF")

houseOfClipUtility::plotFrequencies(MOMfreqDataframe.multi, minSize = 6, maxSize = 12, 
    width = 9, manualColors = colors, lineSize = 2)

We confirmed that signaling pathways are mainstream mechanisms in our dataset, even at module level. But now, we can see that also cell cycle, extracellular matrix organization and metabolism are involved. We can also take a look at the single omics.

MOMfreqDataframe.single <- MOMfreqDataframe[-combiClass, , drop = F]
houseOfClipUtility::plotFrequencies(MOMfreqDataframe.single, manualColors = c(exp = "#E33402", 
    met = "#04B880", mut = "#3871CC", cnv = "#E5C002"), minSize = 5, maxSize = 12, 
    width = 9)

plot frequencies of pathways with only one omic

And we can look also to all the classes together.

houseOfClipUtility::plotFrequencies(MOMfreqDataframe)

plot frequencies of pathways with all omics

MOSClip pathway analysis

In the following commands the analysis of pathways.

In pathway test the pathway topology (the in and out connections of the pathway genes) can be exploited to guide the data reduction step. To do that we suggest to use the topological PCA instead of the sparse PCA changing the settings in the omics object.

Than we can run the analysis using the function multiOmicsSurvivalPathwayTest().

multiOmics@specificArgs$pcaArgs$method = "topological"
multiOmics@specificArgs$pcaArgs$shrink = TRUE
genesToConsider <- row.names(pseudoExpNorm)

if (file.exists("multiOmicsFullHuge.RData")) {
    load("multiOmicsFullHuge.RData")
} else {
    
    multiOmicsFullHuge <- lapply(reactHuge, function(g) {
        print(g@title)
        set.seed(1234)
        fcl = multiOmicsSurvivalPathwayTest(multiOmics, g, survAnnot, useThisGenes = genesToConsider)
        fcl
    })
    file = paste0(dirname, "/multiOmicsFullHuge-", as.character(Sys.Date()), ".RData")
    link = "multiOmicsFullHuge.RData"
    save(multiOmicsFullHuge, file = file)
    file.symlink(file, link)
}

Graphical exploration of MOSClip pathway results

MOSClip has multiple functions to explore the pathway data. We provide here some examples.

Pathway summary

We can plot a report of the first 10 pathways, sorted by pvalue of the Cox proportional hazard model.

plotMultiPathwayReport(multiOmicsFullHuge, 10, MOcolors = c(exp = "red", mut = "blue", 
    cnv = "yellow", met = "green"))

The heatmap summarized the \(p-value\) of the Cox proportional hazard model using all the covariates. The first column is the \(p-value\) of the model, the others columns contain the \(p-value\) of the single covariates. Colorscale is used to graphically visulize the \(p-values\). The \(p-value\) heatmap helps to figure out the involvement of different omics.

Pathway Heatmap

We can explore also the most important genes of the pathway and their original pattens using an heatmap plot of the original values. The most important genes are selected as described in Martini et al.

plotPathwayHeat(multiOmicsFullHuge[["VEGFA-VEGFR2 Pathway"]], sortBy = c("expPC2", 
    "met3k"), paletteNames = c(exp = "red", met = "green", mut = "blue", cnv = "yellow"), 
    nrowsHeatmaps = 2, withSampleNames = F, fontsize_row = 6)

## 'select()' returned 1:1 mapping between keys and columns
## 'select()' returned 1:1 mapping between keys and columns
## 'select()' returned 1:1 mapping between keys and columns

Pathway Kaplan Meier

Then we can also check differences in survival using Kaplan Meier curves dividing patients in groups with different omics patterns (e.g. patients with different methylation pattern and high and low levels of PC2 in expression).

plotPathwayKM(multiOmicsFullHuge[["VEGFA-VEGFR2 Pathway"]], formula = "Surv(days, status) ~ met3k + expPC2", 
    paletteNames = "Paired")

Resampling strategy on pathway test to prioritize pathways

As we did for the analysis of modules, MOSClip gives the possibility to prioritize most important and stable pathway results, running the resampling procedure as in the following commands.

pathwaySummaryHuge <- multiPathwayReport(multiOmicsFullHuge)
useThisPathwaysFM <- row.names(pathwaySummaryHuge[pathwaySummaryHuge$pvalue <= 0.05, 
    ])
sPathwayM <- pathwaySummaryHuge[useThisPathwaysFM, , drop = T]

if (file.exists("permsPathwaysHuge.RData")) {
    load("permsPathwaysHuge.RData")
} else {
    
    permsPathwaysHuge <- MOSClip:::resamplingPathway(fullMultiOmics = multiOmics, 
        survAnnot, reactHuge, nperm = 100, pathwaySubset = useThisPathwaysFM)
    
    file = paste0(dirname, "/permsPathwaysHuge-", as.character(Sys.Date()), ".RData")
    link = "permsPathwaysHuge.RData"
    
    save(permsPathwaysHuge, file = file)
    file.symlink(file, link)
}

resampligSuccessCount <- getPathwaysModulesSuccess(perms = permsPathwaysHuge, moduleSummary = sPathwayM)
pathwaySummaryHuge <- addResamplingCounts(pathwaySummaryHuge, resampligSuccessCount)

Top 10 pathways by overall pvalue
	pvalue	cnvNEG	cnvPOS	expPC1	expPC2	expPC3	met2k2	met3k2	met3k3	mut	resamplingCount
Interleukin-35 Signalling	0.00	0.07	0.59	0.14	0.00	0.06	0.15	NA	NA	0.25	100
Constitutive Signaling by AKT1 E17K in Cancer	0.00	0.17	0.38	0.22	0.05	0.06	0.20	NA	NA	0.00	100
Nucleobase catabolism	0.00	0.04	0.19	0.64	0.56	0.02	0.28	NA	NA	0.00	100
Ca2+ pathway	0.00	0.64	0.36	0.59	0.11	0.00	0.16	NA	NA	0.04	100
AKT phosphorylates targets in the cytosol	0.00	NA	0.55	0.74	0.01	0.42	0.09	NA	NA	0.00	100
DSCAM interactions	0.00	0.10	0.16	0.74	0.51	0.25	0.00	NA	NA	0.50	100
Binding and Uptake of Ligands by Scavenger Receptors	0.00	0.65	0.13	0.01	0.15	0.00	0.88	NA	NA	0.28	73
Erythrocytes take up carbon dioxide and release oxygen	0.01	NA	0.11	0.45	0.01	0.01	NA	0.78	0.50	NA	100
Cobalamin (Cbl, vitamin B12) transport and metabolism	0.01	NA	0.12	0.88	0.14	0.30	NA	0.61	0.03	0.07	100
Purine catabolism	0.01	0.48	0.03	0.49	0.01	0.16	0.24	NA	NA	0.02	100

Plot Pathway Summary of resampling results

Alternatively, we can visualize the table as an heatmap of \(p-value\). In this case, we plot only those pathways with a resampling success score greater than 80 (or equal to 80).

resamplingTrue <- names(which(resampligSuccessCount >= 80))
plotMultiPathwayReport(multiOmicsFullHuge[resamplingTrue], MOcolors = c(exp = "red", 
    met = "green", mut = "blue", cnv = "yellow"), top = 40, fontsize = 10, fontsize_number = 0, 
    fontsize_row = 8, fontsize_col = 8)

Super test

In second instance, we can ask if two or more omics are significant in the same pathway simultaneously and if this omic interaction is more frequent than those expected by chance. To perform this test we use the runSupertest (SuperExactTest R package).

moInt <- MOSClip::runSupertest(pathwaySummaryHuge, pvalueThr = 0.05, zscoreThr = 0.05, 
    resampligThr = 80, excludeColumns = c("resamplingCount"))

As you can see from the -log10 pvalues, we found that there are 6 pathways shearing expression and methylation omics that are both significant as coefficients.

Frequency plot

As, we did for modules we can group pathway by hierarchy and representing the contribution of different omics to survival prediction in different biological processes.

omicsClasses2pathways <- computeOmicsIntersections(pathwaySummaryHuge, resampligThr = 80, 
    excludeColumns = c("resamplingCount"))
omicsClasses2fathers <- lapply(omicsClasses2pathways, annotePathwayToFather, graphiteDB = reactome, 
    hierarchy = pathHierarchyGraph)
freqDataframe <- houseOfClipUtility::computeFreqs(omicsClasses2fathers)

Now we have a table of frequencies, class and category to use to have some hits of the combinations of the differents omics tested. As a first example, we check where the combination of two or more omics points.

colors = c(cnv = "#E5C002", `cnv;exp` = "#ff5722", `cnv;exp;mut` = "#6B213A", `cnv;met` = "#A5F037", 
    exp = "#E33402", `exp;met` = "#6c5b07", `exp;mut` = "#A200AD", met = "#04B880", 
    mut = "#3871CC")

plotFrequencies(freqDataframe, minSize = 6, maxSize = 12, width = 10, manualColors = colors, 
    lineSize = 3)

We have discovered that our pathway analysis tells us that “Cell cycle” is the most involved category for mutation and methylation.

Kaplan Meier curves

Maybe now we are interested to see the interaction in relation to the survival. We peak the pathway “FGFR1 mutant receptor activation” and we plot the Kaplan Meier curves exploriong the interaction between mutation profile and expression (PC2).

plotPathwayKM(multiOmicsFullHuge[["FGFR1 mutant receptor activation"]], formula = "Surv(days, status) ~ mut + expPC2", 
    paletteNames = "Paired", risk.table = T, size = 2, inYears = T)

Kaplain-Meier of FGFR1 mutant receptor activation

Predictive genes heatmap

Finally, we can plot the heatmap of the most predictive gene in the pathway. When looking to predictive gene it is extremely useful to correlate survival behavior (death event or survival time) to predictive genes as well as other annotation such as tumor grade. That’s why, we allows the user to plot heatmap with additional custom annotations. The data frame that we produced can be used in the heatmap of the predictive genes.

plotPathwayHeat(multiOmicsFullHuge[["FGFR1 mutant receptor activation"]], sortBy = c("expPC2", 
    "mut", "PFS"), paletteNames = c(exp = "red", met = "green", mut = "blue", cnv = "yellow"), 
    additionalAnnotations = additionalA, additionalPaletteNames = list(PFS = "yellow", 
        years = "teal"), nrowsHeatmaps = 2, withSampleNames = F)

## 'select()' returned 1:1 mapping between keys and columns
## 'select()' returned 1:1 mapping between keys and columns
## 'select()' returned 1:1 mapping between keys and columns

Heatmap of Negative regulation of the PI3K/AKT network

Conclusion

In last ten years we have witnessed a dramatic change in the clinical treatment of patients thanks to molecular and personalized medicine. In fact, many medical institutes are starting to adopt routine genome wide screening to complement and help diagnosis and treatment choices. As the number of datasets grows, we need to adapt and improve the methods to cope with the complexity, amount and multi-level structure of available information. That is why we need analytical methods that effectively integrate multi-omic dimensions of this issue.

MOSClip can deal with this complexity, allowing multi-omic data integration through survival pathway analyses. In brief, MOSClip comprises three main components: pathway topology, multi-omic data and survival analysis.

MOSClip is freely available as an R package, with tutorials, guidelines and best practice provided to fully exploit the potential of MOSClip and to guide the user in data analysis and interpretation of results. Effort was dedicated to visualization tools, thus giving the user the opportunity to dissect every single aspect of the results.

MOSClip Tutorial