Let us load the libraries requires for the various analyses described in this document

##libraries for "tidy" manipulation of data 
suppressMessages(library(tidyverse))

##libraries for "tidy" manipulation of data 
suppressMessages(library(magrittr))

##library used for normalizing gene expression data and then perform statistical association of gene expression with tumor vs normal comparison of bladder cancer samples
suppressMessages(library(DESeq2))

##library used for generating a Volcano Plot
suppressMessages(library(EnhancedVolcano))

##library to illustrate the use of Over Representation Analyses (ORA) and Gene Set Enrichment Analyses (GSEA) with gene permutation
suppressMessages(library(clusterProfiler))

##library to illustrate the use of Simulataneous Enrichment Analyses (SEA)
suppressMessages(library(rSEA))

##library to illustrate the use of Significance Analysis of Function and Expression (SAFE), Pathway Analysis with Down-weighting of Overlapping Genes (PADOG) and Gene Set Enrichment Analyses (GSEA) with sample permutation
suppressMessages(library(GSEABenchmarkeR))

Scientific Question

What are the biological pathways/gene sets differerntially regulated between the tumor and normal tissues in bladder cancer patients?

Data

The gene expression we will work with are assayed using RNA-seq in the tumor and normal tissues drawn from 19 subjects with bladder cancer. These data are derived from The Cancer Genome Atlas (TCGA).

Methods

The methods we will use to answer the scientific question are described below:

Load the gene expression data and understand the study design
Perform differential expression analyses
Run six different enrichment analyses methods.

Note: In normal practice we may run only one or at most two methods to answer our question. However, our purpose here is to illustrate the use of different methods, higlight and interpret their results in the context of the associated assumptions of each method. The choice of the methods we use will depend on …

… the nature of our hypothesis, i.e., are we interested in a very specific biochemical pathway? or
… are we agnostic of the nature of the biochemical pathways we discover to be asssociated with what we are studying?,
… if we want to interpret the resulting p-values as measures of reproducibility of our enriched pathways by other research groups using data derived from new bladder cancer patient samples?
… whether the assay we are using is a genome-wide assay or a very targeted assay focusing on a specific group of genes or proteins

Analyses

Load the data and understand the experimental data

The gene expression data will be loaded as a SummarizedExperiment object in an RDS file.

tcga <- readRDS("bladder_cancer_tcga_summarized_experiment.rds")

##short summary of tcga. Note the 12,264 rownames represent the gene names as Entrez IDs
tcga

## class: SummarizedExperiment 
## dim: 12264 38 
## metadata(3): annotation dataId dataType
## assays(1): exprs
## rownames(12264): 2 144568 ... 23140 26009
## rowData names(0):
## colnames(38): TCGA-K4-A3WV-01A-11R-A22U-07 TCGA-BT-A20W-01A-21R-A14Y-07
##   ... TCGA-GC-A6I3-11A-11R-A31N-07 TCGA-GD-A2C5-11A-11R-A180-07
## colData names(4): sample type GROUP BLOCK

print("Short summary of the RNA-seq samples")

## [1] "Short summary of the RNA-seq samples"

##quick summary of 38 samples. Note the variable GROUP refers to tumor vs normal assignment while the variable BLOCK refers to the patient. From each of the 19 patients, tumor and normal tissue are derived and assayed for gene expression 
colData(tcga)

## DataFrame with 38 rows and 4 columns
##                                                    sample     type     GROUP
##                                               <character> <factor> <numeric>
## TCGA-K4-A3WV-01A-11R-A22U-07 TCGA-K4-A3WV-01A-11R-A22U-07     BLCA         1
## TCGA-BT-A20W-01A-21R-A14Y-07 TCGA-BT-A20W-01A-21R-A14Y-07     BLCA         1
## TCGA-K4-A5RI-01A-11R-A28M-07 TCGA-K4-A5RI-01A-11R-A28M-07     BLCA         1
## TCGA-BT-A20N-01A-11R-A14Y-07 TCGA-BT-A20N-01A-11R-A14Y-07     BLCA         1
## TCGA-BL-A13J-01A-11R-A277-07 TCGA-BL-A13J-01A-11R-A277-07     BLCA         1
## ...                                                   ...      ...       ...
## TCGA-BT-A2LB-11A-11R-A18C-07 TCGA-BT-A2LB-11A-11R-A18C-07     BLCA         0
## TCGA-K4-A54R-11A-11R-A26T-07 TCGA-K4-A54R-11A-11R-A26T-07     BLCA         0
## TCGA-GC-A3WC-11A-11R-A22U-07 TCGA-GC-A3WC-11A-11R-A22U-07     BLCA         0
## TCGA-GC-A6I3-11A-11R-A31N-07 TCGA-GC-A6I3-11A-11R-A31N-07     BLCA         0
## TCGA-GD-A2C5-11A-11R-A180-07 TCGA-GD-A2C5-11A-11R-A180-07     BLCA         0
##                                     BLOCK
##                               <character>
## TCGA-K4-A3WV-01A-11R-A22U-07 TCGA-K4-A3WV
## TCGA-BT-A20W-01A-21R-A14Y-07 TCGA-BT-A20W
## TCGA-K4-A5RI-01A-11R-A28M-07 TCGA-K4-A5RI
## TCGA-BT-A20N-01A-11R-A14Y-07 TCGA-BT-A20N
## TCGA-BL-A13J-01A-11R-A277-07 TCGA-BL-A13J
## ...                                   ...
## TCGA-BT-A2LB-11A-11R-A18C-07 TCGA-BT-A2LB
## TCGA-K4-A54R-11A-11R-A26T-07 TCGA-K4-A54R
## TCGA-GC-A3WC-11A-11R-A22U-07 TCGA-GC-A3WC
## TCGA-GC-A6I3-11A-11R-A31N-07 TCGA-GC-A6I3
## TCGA-GD-A2C5-11A-11R-A180-07 TCGA-GD-A2C5

##turn the GROUP and BLOCK variables to categorical variables
tcga$GROUP <- as.factor(tcga$GROUP)
tcga$BLOCK <- as.factor(tcga$BLOCK)

print("Look at the read counts of 4 genes for a 5 samples")

## [1] "Look at the read counts of 4 genes for a 5 samples"

(assays(tcga))$exprs[1:4,1:5]

##        TCGA-K4-A3WV-01A-11R-A22U-07 TCGA-BT-A20W-01A-21R-A14Y-07
## 2                              2133                        26508
## 144568                         1124                           60
## 53947                          2619                          769
## 8086                           3621                         1914
##        TCGA-K4-A5RI-01A-11R-A28M-07 TCGA-BT-A20N-01A-11R-A14Y-07
## 2                             18641                         3828
## 144568                          264                         1241
## 53947                          2723                          424
## 8086                           2910                         1239
##        TCGA-BL-A13J-01A-11R-A277-07
## 2                             23443
## 144568                         1444
## 53947                           544
## 8086                           1217

Differential expression analyses

##create a DESeq data object
dds.bc <- DESeqDataSet(tcga, design = ~ GROUP + BLOCK)

##estimate normalization/size-factors and dispersions
dds.bc %<>% DESeq(.)


##variance stabilizing transformation to view the normalize data
vsd.bc <- dds.bc %>%
  vst(., blind=TRUE)

##generate the PCA plot using the normalized data. Note the clustering of the samples by the tumor versus normal comparisons
vsd.bc %>%
  plotPCA(., intgroup=c("GROUP"))

##differential expression association for tumor versus normal differences controlling for patient specific differences
diff.res <- dds.bc %>% 
  results(., contrast = c("GROUP", "1", "0"), pAdjustMethod="bonferroni")

##visualize the results using a Volcano Plot
diff.res %>%
  as.data.frame() %>%
  EnhancedVolcano(.,
    lab = rownames(.),
    x = 'log2FoldChange',
    y = 'padj',
    xlim = c(-5, 8))

##output the results
diff.res %>%
  as.data.frame() %>%
  rownames_to_column('Gene') %>%
  write.csv(., "bladder_cancer_diff_exp_results.csv", row.names = FALSE)

Load the gene set/pathway databases of interest

We will load the Gene Ontology and WikiPathways databases. Note, an additional database called PFOCR is also loaded. We will ignore this database during this workshop.

##load the pathway gene set data-bases
database_lists <- load("databases.RData")#has wp, pfocr, go

##WikiPathways annotation is a data frame that links genes (in terms of their Entrez IDs) to each of the WikiPathways (annotated by their names and IDs)
head(wp_annotation)

##                                name set_id   gene
## 1           FABP4 in ovarian cancer WP4400 574413
## 2           FABP4 in ovarian cancer WP4400   2167
## 3 B Cell Receptor Signaling Pathway   WP23   4690
## 4 B Cell Receptor Signaling Pathway   WP23   5781
## 5 B Cell Receptor Signaling Pathway   WP23  11184
## 6 B Cell Receptor Signaling Pathway   WP23   6195

##WikiPathways list is a list of character vectors of Entrez IDs representing genes associated with each pathway
head(wp_list)

## $WP100
##  [1] "728441"    "91227"     "290"       "26873"     "221357"    "92086"    
##  [7] "3417"      "2878"      "2944"      "2877"      "2678"      "2876"     
## [13] "2953"      "2687"      "102724197" "2730"      "2938"      "2729"     
## [19] "2937"      "2936"      "2946"      "2539"      "2879"     
## 
## $WP106
##  [1] "189"  "443"  "2875" "445"  "435"  "18"   "2572" "2571" "2806" "5091"
## [11] "2805" "1615"
## 
## $WP107
##  [1] "8893"      "102466854" "8894"      "8891"      "101930123" "8892"     
##  [7] "8890"      "29904"     "1975"      "2107"      "1974"      "1973"     
## [13] "5610"      "1938"      "1937"      "1936"      "1979"      "1978"     
## [19] "1977"      "1933"      "8662"      "8663"      "8661"      "8666"     
## [25] "3692"      "8667"      "8664"      "8665"      "100302143" "1984"     
## [31] "1983"      "1981"      "9669"      "3646"      "8672"      "23708"    
## [37] "9086"      "27102"     "7458"      "1917"      "8668"      "8669"     
## [43] "1915"      "26986"     "23277"     "9451"      "728689"    "1965"     
## [49] "1964"      "10209"     "10605"     "8637"      "1968"      "1967"     
## 
## $WP111
##   [1] "4694"      "4695"      "4696"      "1340"      "1337"      "4728"     
##   [7] "4729"      "27089"     "513"       "514"       "55967"     "4720"     
##  [13] "515"       "516"       "517"       "4722"      "4723"      "518"      
##  [19] "4724"      "4725"      "4726"      "1339"      "1351"      "1350"     
##  [25] "1349"      "1347"      "1346"      "521"       "1345"      "522"      
##  [31] "29796"     "4697"      "4698"      "4731"      "93974"     "9481"     
##  [37] "102465669" "27109"     "4508"      "4509"      "498"       "1355"     
##  [43] "1353"      "539"       "374291"    "10476"     "10632"     "9377"     
##  [49] "7352"      "7351"      "9016"      "6389"      "7350"      "4519"     
##  [55] "4512"      "4513"      "4514"      "6391"      "6390"      "6392"     
##  [61] "10975"     "9551"      "4540"      "4541"      "10063"     "6834"     
##  [67] "4535"      "4536"      "4537"      "4538"      "4539"      "7385"     
##  [73] "7384"      "7386"      "9167"      "100616403" "7388"      "291"      
##  [79] "292"       "293"       "7381"      "4705"      "4706"      "4707"     
##  [85] "4708"      "4709"      "4700"      "4701"      "4702"      "4704"     
##  [91] "6341"      "1327"      "4716"      "4717"      "100500805" "4718"     
##  [97] "4719"      "4710"      "506"       "4711"      "4712"      "4713"     
## [103] "1329"      "509"       "4714"      "4715"     
## 
## $WP117
##  [1] "26716"     "9620"      "128674"    "30817"     "6752"      "3363"     
##  [7] "54112"     "154"       "56413"     "2149"      "341416"    "83873"    
## [13] "27202"     "23284"     "51289"     "3356"      "3355"      "138883"   
## [19] "1815"      "1814"      "4923"      "81050"     "5737"      "8387"     
## [25] "5032"      "57191"     "53829"     "9038"      "1909"      "26648"    
## [31] "144124"    "2911"      "2833"      "2798"      "1268"      "26245"    
## [37] "887"       "2918"      "53831"     "2837"      "1901"      "4935"     
## [43] "9289"      "9287"      "29929"     "9288"      "4992"      "393046"   
## [49] "8390"      "1880"      "23266"     "2692"      "2492"      "59340"    
## [55] "8392"      "254786"    "6608"      "26494"     "401428"    "1952"     
## [61] "1951"      "135"       "3579"      "2841"      "1234"      "3577"     
## [67] "2840"      "26333"     "26212"     "221395"    "26211"     "11245"    
## [73] "341276"    "2925"      "283383"    "64582"     "100616112" "59352"    
## [79] "29933"     "1131"      "27239"     "9290"      "140"       "59350"    
## [85] "118442"    "1129"      "10888"     "84658"     "79541"     "84539"    
## [91] "146"       "2532"      "84059"     "4994"     
## 
## $WP12
##  [1] "8792"  "3690"  "8111"  "5155"  "8600"  "6696"  "4982"  "9550"  "56302"
## [10] "1513"  "3456"  "3454"  "7965"  "5599"  "6548"  "54"

Illustration of different enrichment analyses methods

Over Representation Analyses (ORA)

The input to this analyses is a list of genes of interest (here it would be the list of genes deemed differentially expressed between the tumor and normal samples) and also the universe of genes from which the former list of genes were derived.

We will use a function in the clusterProfiler library to perform this analysis.

##Choose set of differential expressed genes
##pick the differentially expressed genes using 0.05 threshold
diff_genes <- diff.res %>%
    as.data.frame() %>%
    rownames_to_column('gene') %>%
    filter(padj < 0.05) %>%
    .$gene 
##important to pick the universe of genes. We will use all genes for which we have gene counts
universe_genes <- diff.res %>%
  as.data.frame() %>%
  rownames_to_column('gene') %>%
  .$gene

##run the ORA analyses
res_ora <- enricher(
  gene = diff_genes,
  universe = universe_genes,
  pAdjustMethod = "BH",
  pvalueCutoff = 1, #p.adjust cutoff
  qvalueCutoff = 1,
  minGSSize = 1,
  maxGSSize = 100000,
  TERM2GENE = wp_annotation[,c("set_id","gene")],
  TERM2NAME = wp_annotation[,c("set_id","name")])

res_ora <- res_ora@result

## view the first few rows of the results
head(res_ora)

##            ID                                    Description GeneRatio  BgRatio
## WP2446 WP2446                  Retinoblastoma Gene in Cancer   47/1022  86/4839
## WP466   WP466                                DNA Replication   26/1022  41/4839
## WP2361 WP2361                       Gastric Cancer Network 1   16/1022  22/4839
## WP179   WP179                                     Cell Cycle   48/1022 115/4839
## WP45     WP45                     G1 to S cell cycle control   30/1022  61/4839
## WP289   WP289 Myometrial Relaxation and Contraction Pathways   46/1022 120/4839
##              pvalue     p.adjust       qvalue
## WP2446 6.082885e-12 3.011028e-09 2.817336e-09
## WP466  4.881223e-09 1.208103e-06 1.130388e-06
## WP2361 2.877882e-07 4.208137e-05 3.937438e-05
## WP179  3.400515e-07 4.208137e-05 3.937438e-05
## WP45   9.280214e-07 9.187412e-05 8.596409e-05
## WP289  9.770180e-06 8.060399e-04 7.541894e-04
##                                                                                                                                                                                                                                          geneID
## WP2446 25/54443/890/891/9133/898/9134/993/8318/8317/983/1017/1019/81620/1111/1786/1869/1870/2189/24137/4173/4175/4176/2956/4609/4998/5111/10733/5426/5427/5557/5591/5928/5947/5983/5984/5985/6119/6241/6502/10592/3925/7027/7153/7272/7298/7465
## WP466                                                                                                     8318/990/8317/1017/81620/10926/55388/4171/4173/4174/4175/4176/4998/23594/5111/23649/5424/5426/5427/5557/5558/5982/5983/5984/5985/6119
## WP2361                                                                                                                                                   86/6790/1063/144455/1894/56992/9585/286826/4173/4605/57122/8607/64094/7153/22974/11065
## WP179   25/699/9184/890/891/9133/894/898/9134/991/993/995/8318/990/8317/983/1017/1019/1028/1111/11200/10926/1869/1870/9700/4616/2932/3066/10459/4171/4173/4174/4175/4176/4609/4998/23594/5111/9088/5347/5591/9232/5933/6502/7027/7043/7272/7465
## WP45                                                                                        891/894/898/9134/993/8318/983/1017/1019/1028/90993/1869/1870/4171/4173/4174/4175/4176/4609/4998/23594/5111/23649/5426/5427/5557/5558/6119/7027/7465
## WP289         58/59/70/108/196883/111/115/408/467/489/800/817/1264/2353/2791/55970/54331/2788/2869/3488/3489/3569/3708/1902/23764/4846/5142/5144/11142/5331/5336/5577/5579/5590/10267/10266/10268/5996/8786/10287/5997/8490/8787/6262/6263/6546
##        Count
## WP2446    47
## WP466     26
## WP2361    16
## WP179     48
## WP45      30
## WP289     46

#GeneRatio: Proportion of differentially expressed in each WikiPathway
#BgRatio: Proportion of all genes that are association with at least WikiPathway that is associated with each WikiPathway

##Estimate the odds ratio
#k: total number of differentially expressed genes annotated to at least one WikiPathway that are also part of each gene set 
k <- sapply(res_ora$GeneRatio, function(x) as.numeric(strsplit(x, "/")[[1]][1]))
#n: total number of differentially expressed genes annotated to at least one WikiPathway
n <- sapply(res_ora$GeneRatio, function(x) as.numeric(strsplit(x, "/")[[1]][2]))
#M: total number of genes in each gene set
M <- sapply(res_ora$BgRatio, function(x) as.numeric(strsplit(x, "/")[[1]][1]))
#N: total number of genes assigned to at least one WikiPathway. Note, this number will be less than or equal to the total number of genes for which you have count data in the RNA-seq (gene expression) data set
N <- sapply(res_ora$BgRatio, function(x) as.numeric(strsplit(x, "/")[[1]][2]))
odds_ratio <- (k*(N-M-n+k))/((M-k)*(n-k))

res_ora %<>% mutate(odds_ratio=odds_ratio)

## view the first few rows of the results
head(res_ora)

##       ID                                    Description GeneRatio  BgRatio
## 1 WP2446                  Retinoblastoma Gene in Cancer   47/1022  86/4839
## 2  WP466                                DNA Replication   26/1022  41/4839
## 3 WP2361                       Gastric Cancer Network 1   16/1022  22/4839
## 4  WP179                                     Cell Cycle   48/1022 115/4839
## 5   WP45                     G1 to S cell cycle control   30/1022  61/4839
## 6  WP289 Myometrial Relaxation and Contraction Pathways   46/1022 120/4839
##         pvalue     p.adjust       qvalue
## 1 6.082885e-12 3.011028e-09 2.817336e-09
## 2 4.881223e-09 1.208103e-06 1.130388e-06
## 3 2.877882e-07 4.208137e-05 3.937438e-05
## 4 3.400515e-07 4.208137e-05 3.937438e-05
## 5 9.280214e-07 9.187412e-05 8.596409e-05
## 6 9.770180e-06 8.060399e-04 7.541894e-04
##                                                                                                                                                                                                                                     geneID
## 1 25/54443/890/891/9133/898/9134/993/8318/8317/983/1017/1019/81620/1111/1786/1869/1870/2189/24137/4173/4175/4176/2956/4609/4998/5111/10733/5426/5427/5557/5591/5928/5947/5983/5984/5985/6119/6241/6502/10592/3925/7027/7153/7272/7298/7465
## 2                                                                                                    8318/990/8317/1017/81620/10926/55388/4171/4173/4174/4175/4176/4998/23594/5111/23649/5424/5426/5427/5557/5558/5982/5983/5984/5985/6119
## 3                                                                                                                                                   86/6790/1063/144455/1894/56992/9585/286826/4173/4605/57122/8607/64094/7153/22974/11065
## 4  25/699/9184/890/891/9133/894/898/9134/991/993/995/8318/990/8317/983/1017/1019/1028/1111/11200/10926/1869/1870/9700/4616/2932/3066/10459/4171/4173/4174/4175/4176/4609/4998/23594/5111/9088/5347/5591/9232/5933/6502/7027/7043/7272/7465
## 5                                                                                      891/894/898/9134/993/8318/983/1017/1019/1028/90993/1869/1870/4171/4173/4174/4175/4176/4609/4998/23594/5111/23649/5426/5427/5557/5558/6119/7027/7465
## 6        58/59/70/108/196883/111/115/408/467/489/800/817/1264/2353/2791/55970/54331/2788/2869/3488/3489/3569/3708/1902/23764/4846/5142/5144/11142/5331/5336/5577/5579/5590/10267/10266/10268/5996/8786/10287/5997/8490/8787/6262/6263/6546
##   Count odds_ratio
## 1    47   4.669717
## 2    26   6.616600
## 3    16  10.102054
## 4    48   2.758283
## 5    30   3.693418
## 6    46   2.383944

res_ora %>%
  write.csv(., "bladder_cancer_WikiPathways_ora.csv", row.names = FALSE)

Simultaneous Enrichment Analyses (SEA)

These analyses require as input the (unadjusted) p-values associated with differential expression for each gene.

# ##get estimates of the overall proportion of genes asssociated with the tumor vs normal comparison
TDPestimate_full <- setTDP(diff.res$pvalue, universe_genes, alpha = 0.05)

TDPestimate_full

## $TDP.bound
## [1] 0.3419765
## 
## $TDP.estimate
## [1] 0.5371005

##run rSEA method
res_rSEA <- SEA(diff.res$pvalue, universe_genes, pathlist = wp_list)


##add additional column named Name so that these results can be merged with the wp_annotation data frame
res_rSEA %<>% mutate(set_id=Name)
##get pathway names
wp_id_2_names <- wp_annotation %>%
  select(1,2) %>%
  unique()

res_rSEA %<>% merge(wp_id_2_names,.)

##View the first few rows of the results. Note: SC.adjP represents the adjusted p-value for the significance of self-contained null hypothesis while Comp.adjP represents the adjusted p-values for the significance of the competitive null hypothesis
res_rSEA %>% 
  dplyr::slice(order(Comp.adjP)) %>%
          head()

##   set_id                                                            name  ID
## 1 WP1600                                             Nicotine Metabolism  38
## 2 WP2276                                      Glial Cell Differentiation  88
## 3 WP4030                   SCFA and skeletal muscle substrate metabolism 332
## 4  WP334                                    GPCRs, Class B Secretin-like 210
## 5 WP1991 SRF and miRs in Smooth Muscle Differentiation and Proliferation  60
## 6  WP206                                      Fatty Acid Omega Oxidation  76
##     Name Size Coverage TDP.bound TDP.estimate      SC.adjP    Comp.adjP
## 1 WP1600    6     0.17 1.0000000          1.0 3.395060e-24 3.395060e-24
## 2 WP2276    8     0.62 0.4000000          0.4 7.833862e-26 5.704479e-23
## 3 WP4030    6     0.33 0.5000000          0.5 7.990408e-20 7.990408e-20
## 4  WP334   24     0.17 0.5000000          0.5 3.065793e-21 1.609559e-18
## 5 WP1991   13     0.69 0.8888889          1.0 5.372582e-24 1.169009e-14
## 6  WP206   15     0.33 0.6000000          0.8 2.940210e-38 1.319281e-14

res_rSEA %>% dplyr::slice(order(Comp.adjP)) %>%
  write.csv(., "bladder_cancer_WikiPathways_rSEA.csv", row.names = FALSE)

Significance Analyses of Function and Expression (SAFE)

These analyses require as input the normalized expression matrix of gene expression across all genes over all the 38 samples. The estimation of the significance of the association of a given gene set with the tumor vs normal comparison is based on permutation of the sample (tumor or normal) labels per subject.

##We will use the GSEABenchmarkeR package to run this analyses. The function requires as input a list of SummarizedExperiment objects which includes additional rowData giving the differential expression results

tcga.de <- readRDS("bladder_cancer_tcga_summarized_experiment_w_de_results.rds")

##Note the function runEA takes the raw data, normalizes the expression data using the vst function in DESeq2 that generates the variance stabilized transformed normalized data which is then used as input to the SAFE method
##We will not run the analyses here because the 1000 permutations will take some time to complete
# res_safe_sample_perm <- runEA(tcga.de, method="safe", gs=wp_list, perm=1000)
# res_safe <- res_safe_sample_perm$safe[[1]]$ranking %>% as.data.frame()
# res_safe %<>% mutate(set_id=GENE.SET)
# res_safe %<>% merge(wp_id_2_names,.) %>% slice(order(PVAL))
# res_safe %>%
#   write.csv(., "bladder_cancer_WikiPathways_safe_sample_perm.csv", row.names = FALSE)

##let us just read-in the results
res_safe <- read.csv("bladder_cancer_WikiPathways_safe_sample_perm.csv", header = TRUE)
##View the first few rows of the results
head(res_safe)

##   set_id                                                            name
## 1 WP1991 SRF and miRs in Smooth Muscle Differentiation and Proliferation
## 2 WP2023                           Cell Differentiation - Index expanded
## 3 WP1602                       Nicotine Activity on Dopaminergic Neurons
## 4 WP2029                                    Cell Differentiation - Index
## 5 WP3996                        Ethanol effects on histone modifications
## 6  WP497                       Urea cycle and metabolism of amino groups
##   GENE.SET GLOB.STAT NGLOB.STAT  PVAL
## 1   WP1991     39300       4370 0.001
## 2   WP2023     43100       3920 0.001
## 3   WP1602     40000       3630 0.002
## 4   WP2029     25100       4180 0.007
## 5   WP3996     86700       3100 0.008
## 6    WP497     52400       3280 0.008

Pathway Analysis with Down-weighting of Overlapping Genes (PADOG)

tcga.de <- readRDS("bladder_cancer_tcga_summarized_experiment_w_de_results.rds")

##Note the function runEA takes the raw data, normalizes the expression data using the vst function in DESeq2 that generates the variance stabilized transformed normalized data which is then used as input to the SAFE method
##We will not run the analyses here because the 1000 permutations will take some time to complete

# res_padog_sample_perm <- runEA(tcga.de, method="padog", gs=wp_list, perm=1000)
# res_padog <- res_padog_sample_perm$padog[[1]]$ranking %>% as.data.frame()
# res_padog %<>% mutate(set_id=GENE.SET)
# res_padog %<>% merge(wp_id_2_names,.) %>% slice(order(PVAL))
# res_padog %>%
#   write.csv(., "bladder_cancer_WikiPathways_padog_sample_perm.csv", row.names = FALSE)

##let us just read-in the results
res_padog <- read.csv("bladder_cancer_WikiPathways_padog_sample_perm.csv", header = TRUE)
##View the first few rows of the results
head(res_padog)

##   set_id                                                            name
## 1 WP2023                           Cell Differentiation - Index expanded
## 2 WP2029                                    Cell Differentiation - Index
## 3 WP1991 SRF and miRs in Smooth Muscle Differentiation and Proliferation
## 4 WP2355               Corticotropin-releasing hormone signaling pathway
## 5 WP2361                                        Gastric Cancer Network 1
## 6 WP4300        Extracellular vesicles in the crosstalk of cardiac cells
##   GENE.SET MEAN.ABS.T0 PADOG0 P.MEAN.ABS.T    PVAL
## 1   WP2023        4.65  3.970      0.00200 0.00001
## 2   WP2029        4.49  3.780      0.00001 0.00001
## 3   WP1991        5.77  4.980      0.00200 0.00100
## 4   WP2355        2.65  0.539      0.01600 0.00500
## 5   WP2361        5.83  5.940      0.02000 0.01200
## 6   WP4300        2.98  1.790      0.02500 0.01300

Gene Set Enrichment Analyses (GSEA) with sample permutation

tcga.de <- readRDS("bladder_cancer_tcga_summarized_experiment_w_de_results.rds")

##Note the function runEA takes the raw data, normalizes the expression data using the vst function in DESeq2 that generates the variance stabilized transformed normalized data which is then used as input to the SAFE method
##We will not run the analyses here because the 1000 permutations will take some time to complete

# res_gsea_sample_perm <- runEA(tcga.de, method="gsea", gs=wp_list, perm=1000)
# res_gsea <- res_gsea_sample_perm$gsea[[1]]$ranking %>% as.data.frame()
# res_gsea %<>% mutate(set_id=GENE.SET)
# res_gsea %<>% merge(wp_id_2_names,.) %>% slice(order(PVAL))
# res_gsea %>%
#   write.csv(., "bladder_cancer_WikiPathways_gsea_sample_perm.csv", row.names = FALSE)

##let us just read-in the results
res_gsea <- read.csv("bladder_cancer_WikiPathways_gsea_sample_perm.csv", header = TRUE)
##View the first few rows of the results
head(res_gsea)

##   set_id
## 1  WP289
## 2 WP3414
## 3  WP706
## 4   WP98
## 5 WP3981
## 6  WP536
##                                                                      name
## 1                          Myometrial Relaxation and Contraction Pathways
## 2 Initiation of transcription and translation elongation at the HIV-1 LTR
## 3             Sudden Infant Death Syndrome (SIDS) Susceptibility Pathways
## 4                                  Prostaglandin Synthesis and Regulation
## 5                  miRNA regulation of prostate cancer signaling pathways
## 6                                  Calcium Regulation in the Cardiac Cell
##   GENE.SET     ES   NES    PVAL
## 1    WP289 -0.593 -1.73 0.00000
## 2   WP3414 -0.584 -1.75 0.00000
## 3    WP706 -0.521 -1.74 0.00000
## 4     WP98 -0.667 -1.65 0.00000
## 5   WP3981 -0.528 -1.76 0.00191
## 6    WP536 -0.561 -1.70 0.00192

Gene Set Enrichment Analyses (GSEA) with gene permutation

These analyses require as input a score for each gene. The larger the absolute value of the score for a gene is the more the evidence of the strength of the association of the expression of the gene with the tumor vs normal comparison. The estimation of the significance of the association of a given gene set with the tumor vs normal comparison is based on permutation of the gene labels.

##generate a score for each gene that is equal to -log10(pvalue) in absolute value and whose sign is equal to that of the log FC - positive for up-regulated genes while negative for down-regulated genes
gene_list <- diff.res %>%
  as.data.frame() %>%
  rownames_to_column('Gene') %>%
  mutate(Score = sign(as.numeric(log2FoldChange)) * - log10(as.numeric(as.character(pvalue)))) %>%
  select(c("Score","Gene")) %>%
  arrange(desc(Score))

gene_list <- unlist(split(gene_list[, 1], gene_list[, 2]))
gene_list = sort(gene_list[unique(names(gene_list))], decreasing = TRUE)

head(gene_list)

##     4320    54058    55083     7516     1301     6493 
## 31.86522 25.39709 25.11739 24.17312 23.57965 22.07034

tail(gene_list)

##      5348    286133      7123      1675    146556    221476 
## -49.63643 -51.47116 -52.89205 -57.51413 -67.19160 -86.68870

##run the gene perm version of gsea
res_gsea_gene_perm <- clusterProfiler::GSEA(
  gene_list,
  pAdjustMethod="BH",
  TERM2GENE = wp_annotation[,c("set_id","gene")],
  TERM2NAME = wp_annotation[,c("set_id","name")]    ,
  minGSSize = 1,
  maxGSSize = 100000,
  pvalueCutoff = 1,
  verbose=FALSE)

res_gsea_gene_perm <- res_gsea_gene_perm@result

##view the first few rows of the results
head(res_gsea_gene_perm)

##            ID                                    Description setSize
## WP3888 WP3888                 VEGFA-VEGFR2 Signaling Pathway     403
## WP4172 WP4172                     PI3K-Akt Signaling Pathway     252
## WP3932 WP3932 Focal Adhesion-PI3K-Akt-mTOR-signaling pathway     243
## WP2882 WP2882                 Nuclear Receptors Meta-Pathway     222
## WP382   WP382                         MAPK Signaling Pathway     195
## WP306   WP306                                 Focal Adhesion     173
##        enrichmentScore       NES      pvalue   p.adjust    qvalues rank
## WP3888      -0.4627917 -1.492921 0.001041667 0.03638752 0.02744793 2506
## WP4172      -0.5049345 -1.588244 0.001088139 0.03638752 0.02744793 2187
## WP3932      -0.5152809 -1.616262 0.001094092 0.03638752 0.02744793 2187
## WP2882      -0.5290426 -1.648267 0.001116071 0.03638752 0.02744793 1655
## WP382       -0.5817051 -1.798170 0.001129944 0.03638752 0.02744793 1531
## WP306       -0.5635541 -1.724448 0.001144165 0.03638752 0.02744793 2687
##                          leading_edge
## WP3888 tags=29%, list=20%, signal=24%
## WP4172 tags=31%, list=18%, signal=26%
## WP3932 tags=29%, list=18%, signal=24%
## WP2882 tags=22%, list=13%, signal=19%
## WP382  tags=23%, list=12%, signal=21%
## WP306  tags=40%, list=22%, signal=32%
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       core_enrichment
## WP3888 6154/9209/2185/5867/301/468/6129/5970/2746/4673/2887/6595/1466/2022/6093/4641/23291/9444/3688/57758/9261/5784/4303/3142/9734/26058/6778/4855/5170/3791/8828/5578/6461/4773/10628/154796/4792/6648/4790/1432/25759/5803/2152/355/80031/3490/5587/8665/27289/5743/23189/4736/3725/51574/5908/152273/2549/6125/7414/5906/1397/2078/9252/4846/5563/11080/6886/6386/4772/22943/2308/56999/25/2534/6546/9365/781/3690/57326/154/1847/7220/4629/1839/9759/6401/5592/51309/1003/326624/6347/596/1465/1901/1827/32/10014/22899/114789/857/91624/84952/274/6722/4208/9079/1958/5579/7111/1960/9510/4929/57381/7148/10231/8013/81575/3164
## WP4172                                                                                                                                                                                                                    5728/9180/3716/55012/5170/57521/7424/3791/5578/3672/4254/1299/10161/3566/1436/7057/4790/1435/3574/3910/2323/10681/1026/7099/6446/4170/55970/3717/1975/7450/54331/4846/5563/284/7010/1291/4609/1288/5156/10000/3570/3690/4915/3913/3678/3563/5521/3680/90993/9586/5516/3815/1292/3082/3479/5525/80310/2260/3679/1286/627/596/1902/894/2690/10319/2258/3908/1440/2252/4804/2247/5649/2788/2791/8516/3569/7148
## WP3932                                                                                                                                                                                                                                                5728/9180/3716/55012/5170/57521/7424/3791/4254/2034/10161/3566/8660/1436/7057/51719/23216/1435/3910/81617/1026/55970/3717/1975/7450/54331/4846/5563/2308/284/7010/1288/5156/10000/3570/3690/3913/3678/3563/5521/3680/6515/90993/9586/5516/3815/1292/3082/3479/5525/80310/2260/3679/1286/1902/2690/10319/2258/10891/3908/1440/2252/4804/64344/2247/5649/2788/6517/2791/8516/7148
## WP2882                                                                                                                                                                                                                                                                                                                                                                      4240/8824/4780/60482/6594/5743/3725/5465/11214/34/2040/2308/4609/8850/5552/2908/2099/80315/3726/5244/6515/4616/1028/1066/1839/5243/330/3082/10486/23764/89795/6347/2258/10891/2289/1831/2042/2949/7049/5142/1958/6649/7048/3727/6517/10252/2878/5997/5166
## WP382                                                                                                                                                                                                                                                                                                                                                                                                  355/1845/1844/4137/3725/5908/5602/55970/5906/9252/4772/775/7043/4609/10000/781/4915/6237/408/785/783/2260/1850/1326/5532/627/3306/9020/2316/8912/2258/2252/5533/6722/4208/120892/2247/10235/2353/8605/7048/2318/3727/1843/3164
## WP306                                                                                                                                                                                                                                                                     5159/2268/858/387/2889/7409/9475/7423/3915/7408/6093/3688/1793/5728/394/5170/7424/83660/3791/5578/5500/3672/2909/7057/60/4659/25759/3910/23396/54776/87/3725/5908/5602/7414/5906/7450/7791/7094/1288/2534/3611/55742/5156/10000/3690/3913/3678/3680/1292/330/3082/3479/80310/3679/1286/596/894/2316/10319/3908/857/4660/10398/4638/5649/2318/5579/8516/7148

res_gsea_gene_perm %>%
  write.csv(., "bladder_cancer_WikiPathways_gsea_gene_perm.csv", row.names = FALSE)

Statistics of Enrichment Analyses

Reuben Thomas

8/12/2021