This tutorial provide a short example about performing module survival analysis using additional clinical covariates with MOSClip R package.

Multi-omics TCGA ovarian cancer data analyses

The basilar steps are identical to the full analysis. Here, for sake of semplicity and time, we are going to move fast through the preliminary steps and focus into the correct format to add supplementary variables.

Prepare the multi-omic dataset and pathways

First of all, we load the necessary libraries.

library(org.Hs.eg.db)
library(pheatmap)
library(RColorBrewer)
library(EDASeq)
library(graphite)
library(houseOfClipUtility)
library(MOSClip)

As seen in the main tutorial download “TCGA-OV-hg38-preprocessed-externalSurv.RData” available at the example datasets page. Move the downloaded file inside a directory called downloadTCGA in your working directory.

Now, we can load the preprocessed data.

load("downloadTCGA/TCGA-OV-hg38-preprocessed-externalSurv.RData")

We create a directory to store the data results, checking if the directory already exists.

dirname = "MOSresults"
if (!file.exists(dirname)) {
    dir.create(dirname)
}

The object ‘fup’ (short of ‘followup’) contains the survival annotations from the table by Liu et al, Cell, 2018. We extract the slot for the ‘progression free survival’ (pfs) and we save it in survAnnotations.

Next step is modifiy all the multi-omics matrices assigning the type of gene identifier. Here, we will work with Entrez Gene ID, indicated with the prefix tag “ENTREZID:” compliant to Bioconductor org.dbi as used in the latest graphite R package version. This will allow us to be able to match the omics data to graphite pathways.

survAnnotations <- fup$pfs

expression <- rnaSeq
row.names(expression) <- paste0("ENTREZID:", row.names(expression))
mutation <- mutations$data
row.names(mutation) <- paste0("ENTREZID:", row.names(mutation))
names(metClustValues$eMap) <- paste0("ENTREZID:", row.names(metClustValues$eMap))
row.names(cnv) <- paste0("ENTREZID:", row.names(cnv))

We are going to load some additional annotation from the table by Liu et al, Cell, 2018. You can download panCancerAnnotation.RData here. This RData once loaded contains all the suplementary clinical variables for all cancer types. Please select OV.

load("external-resources/panCancerAnnotation.RData")
additional_sample_annotations_OV <- panCancerAnnotation$OV

We designed MOSClip to accept additional variables. Additional variables should be included in the survival annotation data frame. As for events and days in the survival data frame, all the patients must have a meaningfull annotation in the corresponding additional variable.

So we perform serial intersection to identify the patients that have all the desired information.

survAnnot <- na.omit(survAnnotations)
patients <- row.names(survAnnot)
patients <- intersect(patients, colnames(expression))
patients <- intersect(patients, colnames(metClustValues$MET_Cancer_Clustered))
patients <- intersect(patients, colnames(mutation))
patients <- intersect(patients, colnames(cnv))

We have selected all the patiens with all the omics and survival annotation. No we need to extract those with the additional clinical variable of interest. We create this example including both discrete and continous variables: “TCGA Subtype” as the discrete example; Macrophages M0, M1, M2 as continous examples.

valid <- additional_sample_annotations_OV$`TCGA Participant Barcode` %in% patients
asa_OV <- additional_sample_annotations_OV[valid, , drop = F]
add_me <- data.frame(subtype = asa_OV$`TCGA Subtype`, m0 = as.numeric(gsub(",", 
    ".", asa_OV$`Macrophages M0`)), m1 = as.numeric(gsub(",", ".", asa_OV$`Macrophages M1`)), 
    m2 = as.numeric(gsub(",", ".", asa_OV$`Macrophages M2`)), row.names = asa_OV$`TCGA Participant Barcode`, 
    stringsAsFactors = F)

add_me <- add_me[patients, , drop = F]
add.omit <- na.omit(add_me)

patients <- intersect(patients, row.names(add.omit))
survAnnot <- survAnnot[patients, , drop = F]

Since not all the patients have an annotation in this variables the list of valid patients shrinked. Now we select patients and genes from the datasets.

expression <- expression[, patients, drop = F]
keep = apply(expression >= 100, 1, any)
expNorm <- betweenLaneNormalization(expression[keep, , drop = F], which = "upper")
pseudoExpNorm <- log2(expNorm + 1)

methylation <- metClustValues
methylation$MET_Cancer_Clustered <- methylation$MET_Cancer_Clustered[, patients, 
    drop = F]

mutation <- mutation[, patients, drop = F]
cnv <- cnv[, patients, drop = F]

At this point, we need the pathway knowledge-base. We will use the Reactome pathway graphs available at graphite R package.

Since the identifiers convertion of all the Reactome pathways takes time, once we have the converted pathways we will save them (you should have the saved version from the main tutorial).

if (file.exists("reactome-entrez.RData")) {
    load("reactome-entrez.RData")
} else {
    reactome <- pathways("hsapiens", "reactome")
    reactome <- convertIdentifiers(reactome, "entrez")
    file = paste0(dirname, "/reactome-entrez-", as.character(Sys.Date()), ".RData")
    link = "reactome-entrez.RData"
    save(reactome, file = file)
    file.symlink(file, link)
}

Now we select the pathway with more than 19 nodes and less than 101 to perform module analysis.

nodesLength <- sapply(reactome, function(g) {
    length(intersect(graphite::nodes(g), row.names(pseudoExpNorm)))
})
reactSmall <- reactome[nodesLength >= 20 & nodesLength <= 100]

Prepare the input data: the Omics object

At this step we are almost ready to run MOSClip analysis, we just need to create the Omics object to wrap together expression, methylation, mutation and CNV matrices.

We followed the same rules applied to the main tutorial.

multiOmics_ac <- Omics(data = list(expr = pseudoExpNorm, met = methylation$MET_Cancer_Clustered, 
    mut = mutation, cnv = cnv), methods = c("summarizeWithPca", "summarizeInCluster", 
    "summarizeToNumberOfEvents", "summarizeToNumberOfDirectionalEvents"), specificArgs = list(pcaArgs = list(name = "exp", 
    shrink = "FALSE", method = "sparse", maxPCs = 3), clusterArgs = list(name = "met", 
    dict = methylation$eMap, max_cluster_number = 3), countEvent = list(name = "mut", 
    min_prop = 0.05), cnvAgv = list(name = "cnv", min_prop = 0.05)))

save(multiOmics_ac, file = "multiOmics_ac.RData")

MOSClip module analysis with additional covariates

We are going to perform the multi-omics survival analysis on pathway modules using additional clinical variable of interest.

This analysis is quite long.

As you can see from the code below, we created an addicted representation of the survAnnot data.frame that contains the additional annotations. To run the analysis usign additional annotation the include_from_annot parameter must be set as TRUE.

genesToConsider <- row.names(pseudoExpNorm)
survAnnot_addicted <- cbind(survAnnot, add.omit)

if (file.exists("multiOmicsReactome_additional.RData")) {
    load("multiOmicsReactome_additional.RData")
} else {
    multiOmicsReactome <- lapply(reactSmall, function(g) {
        # print(g@title) # uncomment this to see the progression along pathways
        set.seed(1234)
        fcl = multiOmicsSurvivalModuleTest(multiOmics_ac, g, survAnnot_addicted, 
            useThisGenes = genesToConsider, include_from_annot = T)
        fcl
    })
    
    save(multiOmicsReactome, file = "multiOmicsReactome_additional.RData")
}

Module summary

Using the function multiPathwayModuleReport() we can plot the tabular summary of the modules selected by p-value of the Cox proportional hazard model. Since it can be difficult to isolate desired covariates we set up a parameter that gives the priority order of the covariates. We give priority to omic derived covariates.

moduleSummary <- multiPathwayModuleReport(multiOmicsReactome, priority_to = c("exp", 
    "met", "cnv", "mut"))
head(moduleSummary)

Graphical exploration of MOSClip module results

Now, as demonstration, we are going to plot the module report for the pathway called “PI3K/AKT Signaling in Cancer”. To control the color of the omics, we set a grey scale for the additional clinical variables and from the MOSClip color palette we pick the omic colors.

addColors <- RColorBrewer::brewer.pal(9, "Greys")[1:6]
names(addColors) <- c("m0", "m1", "m2", "subtypeOVCA.Immunoreactive", "subtypeOVCA.Mesenchymal", 
    "subtypeOVCA.Proliferative")

plotModuleReport(multiOmicsReactome[["PI3K/AKT Signaling in Cancer"]], fontsize_row = 14, 
    MOcolors = c(exp = "red", met = "green", cnv = "yellow", mut = "blue", addColors))
PI3K/AKT Signaling in Cancer module 10

PI3K/AKT Signaling in Cancer module 10

As a further demonstration we take a look to “Regulation of TNFR1 signaling” and using the … argument we pass to the heatmap an additional gap through gaps_col argument.

plotModuleReport(multiOmicsReactome[["Regulation of TNFR1 signaling"]], fontsize_row = 14, 
    MOcolors = c(exp = "red", met = "green", cnv = "yellow", mut = "blue", addColors), 
    gaps_col = c(1, 10))
Regulation of TNFR1 signaling

Regulation of TNFR1 signaling

Good, now we can move to plot KM curves for the interesting covariates of a modules. To handle the additional covariates we set up two parameters: ‘additional_continous’ that can handle the continous variables passed through surv annot data frame and ‘additional_discrete’ that can handle discrete variables. The parameters can be set up in the same call to the function.

Lets see an example for continous variables.

plotModuleKM(multiOmicsReactome[["PI3K/AKT Signaling in Cancer"]], 10, formula = "Surv(days, status) ~ cnvPOS + m2", 
    additional_continous = c("m2"))
PI3K/AKT Signaling in Cancer module 10

PI3K/AKT Signaling in Cancer module 10

And now a discrete variable.

plotModuleKM(multiOmicsReactome[["Regulation of TNFR1 signaling"]], 1, formula = "Surv(days, status) ~ expPC1 + subtype", 
    additional_discrete = c("subtype"), risk.table = F)
Regulation of TNFR1 signaling module 1

Regulation of TNFR1 signaling module 1

To conclude we are going to add the additional variables to enrich the heatmap of the gene that were selected as the most important in the determionation of the survival inside the module.

additionalA <- survAnnot_addicted
additionalA$status[additionalA$status == 1] <- "event"
additionalA$status[additionalA$status == 0] <- "no_event"
additionalA$PFS <- as.factor(additionalA$status)
additionalA$status <- NULL
additionalA$years <- round(additionalA$days/365.24, 0)
additionalA$days <- NULL
additionalA$m0 <- NULL
additionalA$m1 <- NULL
additionalA$subtype <- as.factor(additionalA$subtype)

Once created the annotation we can plot.

plotModuleHeat(multiOmicsReactome[["PI3K/AKT Signaling in Cancer"]], 10, paletteNames = c(exp = "red", 
    met = "green", cnv = "yellow"), additionalAnnotations = additionalA, additionalPaletteNames = list(PFS = "yellow", 
    years = "teal", subtype = "teal", m2 = "violet"), sortBy = c("cnvPOS", "subtype", 
    "PFS", "years"), withSampleNames = F)
## 'select()' returned 1:1 mapping between keys and columns
## 'select()' returned 1:1 mapping between keys and columns
PI3K/AKT Signaling in Cancer module 10

PI3K/AKT Signaling in Cancer module 10