finish up part 1 slides
1
intro-r-data-analysis/.Rprofile
Normal file
|
|
@ -0,0 +1 @@
|
|||
source("renv/activate.R")
|
||||
412
intro-r-data-analysis/Intro_to_R_data_analysis_part_1.Rmd
Normal file
|
|
@ -0,0 +1,412 @@
|
|||
---
|
||||
title: "Introduction to R Data Analysis - Part 1"
|
||||
author: "Natalie Elphick"
|
||||
date: "January 22nd"
|
||||
knit: (function(input, ...) {
|
||||
rmarkdown::render(
|
||||
input,
|
||||
output_dir = "../docs"
|
||||
)
|
||||
})
|
||||
output:
|
||||
revealjs::revealjs_presentation:
|
||||
theme: simple
|
||||
css: style.css
|
||||
---
|
||||
|
||||
```{r, setup, include=FALSE}
|
||||
library(tidyverse)
|
||||
```
|
||||
|
||||
##
|
||||
|
||||
<center>*Press the ? key for tips on navigating these slides*</center>
|
||||
|
||||
## Introductions
|
||||
|
||||
**Natalie Elphick**
|
||||
Bioinformatician I
|
||||
*Bioinformatics Core*
|
||||
|
||||
## Poll 1
|
||||
|
||||
**What is your level of experience with coding/data analysis?**
|
||||
|
||||
1. I am fluent in another data analysis programming language (Python, Matlab etc.)
|
||||
2. I am use Excel to do linear regression
|
||||
3. I know some R
|
||||
4. All of the above
|
||||
5. None of the above
|
||||
|
||||
## Part 1:
|
||||
|
||||
1. What is R and why should you use it?
|
||||
2. The RStudio interface
|
||||
3. File types
|
||||
4. Error messages
|
||||
5. Variables
|
||||
6. Types & data structures
|
||||
|
||||
*10 min break*
|
||||
|
||||
7. Math and logic operations
|
||||
8. Functions and libraries
|
||||
9. Reading data into R
|
||||
|
||||
# What is R?
|
||||
|
||||
## R
|
||||
|
||||
- An open source language developed for statistical computing by **R**oss Ihaka and **R**obert Gentleman
|
||||
- Inspired by the **S** language developed at Bell labs in 1976 to make interactive data analysis easier
|
||||
- The first official version was released in 2000
|
||||
|
||||
## Why use R for data analysis?
|
||||
|
||||
- R is and will always be free
|
||||
- Can easily implement any statistical analysis
|
||||
- Code serves as a record which enables reproducibility
|
||||
with minimal effort
|
||||
- As of March 2023, there were over 19,000 open source packages to extend its
|
||||
functionality
|
||||
- Highly customizable graphics ([ggplot2](https://ggplot2-book.org/))
|
||||
- Analysis reports ([knitr](https://cran.r-project.org/web/packages/knitr/index.html))
|
||||
- RNA-seq analysis ([DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html))
|
||||
|
||||
## How does it work?
|
||||
|
||||
<section class="shrink">
|
||||

|
||||
</section>
|
||||
|
||||
# RStudio
|
||||
|
||||
## RStudio
|
||||
- RStudio is an integrated development
|
||||
environment (IDE)
|
||||
- It makes R code easier to write by providing a
|
||||
feature rich graphical user interface (GUI)
|
||||
|
||||
<br>
|
||||
</br>
|
||||
|
||||
<section class="shrink">
|
||||

|
||||
</section>
|
||||
|
||||
|
||||
## Layout
|
||||
|
||||

|
||||
|
||||
|
||||
## File types
|
||||
|
||||
- **Rscript** files that end in `.R`
|
||||
- The most basic, a file that contains R code
|
||||
- **RMarkdown** files that end in `.Rmd`
|
||||
- Let's create a blank Rscript to see how they work, open RStudio and click:
|
||||
- File -\> New File -\> R Script
|
||||
|
||||
## R Markdown
|
||||
|
||||
- A file format combining `R` code with [Markdown](https://www.markdownguide.org/basic-syntax/) for text formatting.
|
||||
- Designed for creating reproducible research reports in various formats (HTML, PDF, Word).
|
||||
- Let's create an `Rmd` file in `RStudio` to explore the basics of how they work:
|
||||
- File -\> New File -\> R Markdown
|
||||
|
||||
## R Markdown Advanced Usage
|
||||
|
||||
- **Presentations:** Creating slides (like these) with [revealjs](https://github.com/rstudio/revealjs).
|
||||
- **Publications:** Authoring online books that combine narrative, code, and output with [bookdown](https://bookdown.org/).
|
||||
- **Interactive Documents:** Developing interactive tutorials or dashboards with [learnR](https://rstudio.github.io/learnr/) and other embedded applications.
|
||||
|
||||
# Variables
|
||||
|
||||
## Variable definition
|
||||
- Variables store information that is referenced and manipulated
|
||||
in a computer program
|
||||
- In contrast to the mathematical definition of a variable,
|
||||
variables in computer science are _mutable_
|
||||
- There are 3 ways to define variables in R, but one is preferred:
|
||||
```{r}
|
||||
x <- 1 # Preferred way
|
||||
x = 1
|
||||
1 -> x
|
||||
print(x)
|
||||
```
|
||||
|
||||
## Variable naming
|
||||
|
||||
- Variables names must start with a letter and can contain
|
||||
underscores and periods
|
||||
- It is best practice to use descriptive variable names and stick
|
||||
to one style of names
|
||||
|
||||
```{r}
|
||||
# Snake case
|
||||
dog_breeds <- c("Labrador Retriever","Akita", "Bulldog")
|
||||
|
||||
# Period separated
|
||||
dog.breeds <- c("Labrador Retriever","Akita", "Bulldog")
|
||||
|
||||
# Camel case
|
||||
DogBreeds <- c("Labrador Retriever","Akita", "Bulldog")
|
||||
```
|
||||
|
||||
## Poll 2
|
||||
|
||||
**Which variable name is not valid in R?**
|
||||
|
||||
1. cat_dog
|
||||
2. CatDOG
|
||||
3. cat.dog
|
||||
4. catD0g
|
||||
|
||||
|
||||
## Excercise 1
|
||||
- Open Rscript file part_1.R in Rstudio
|
||||
|
||||
# Data Types and Structures
|
||||
|
||||
## Data Types
|
||||
- Integer
|
||||
- Whole numbers
|
||||
- Numeric
|
||||
- Decimal numbers
|
||||
- Logical
|
||||
- Boolean (TRUE, FALSE)
|
||||
- NA
|
||||
- Character
|
||||
- Letters and strings of letters
|
||||
- "A", "Labrador Retriever"
|
||||
|
||||
## Data Structures
|
||||
- Vectors
|
||||
- Atomic vectors - one dimensional lists that store values of
|
||||
the **same type**
|
||||
- Lists - can be multidimensional and contain **different types/structures** (ex. nested lists)
|
||||
- Factors
|
||||
- Ordered list with assigned levels
|
||||
- Matrix
|
||||
- Columns and rows of the **same type**
|
||||
- Data frames
|
||||
- Columns and rows of **mixed types**
|
||||
|
||||
##
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
## Exercise 2: Data Types and Structures
|
||||
- Reopen Rscript file part_1.R in Rstudio
|
||||
|
||||
|
||||
# 10 min break
|
||||
|
||||
<center>
|
||||
|
||||
```{r, echo=FALSE}
|
||||
|
||||
countdown::countdown(minutes = 10,
|
||||
seconds = 0,
|
||||
color_border = "black",
|
||||
padding = "50px",
|
||||
margin = "5%",
|
||||
font_size = "5em",
|
||||
style = "position: relative; width: min-content;")
|
||||
```
|
||||
|
||||
</center>
|
||||
|
||||
# Math and Logic Operations
|
||||
|
||||
## Math & Logic
|
||||
|
||||
- Built in functions to get common mathematical summaries of data (eg. mean( ), median( ), mode( ) )
|
||||
- Relational comparison operators to compare values
|
||||
|
||||
```{r, eval=FALSE}
|
||||
x == y # Equal to
|
||||
x != y # Not equal to
|
||||
x < y # Less than
|
||||
x > y # Greater than
|
||||
x <= y # Less than or equal to
|
||||
x >= y # Greater than or equal to
|
||||
|
||||
x %in% y # Is x in this vector y?
|
||||
```
|
||||
|
||||
## Logical Operators
|
||||
- Logical operators can compare TRUE or FALSE values
|
||||
```{r, eval=FALSE}
|
||||
x <- TRUE
|
||||
y <- FALSE
|
||||
|
||||
!x # Not x
|
||||
x | y # x or y
|
||||
x & y # x and y
|
||||
|
||||
```
|
||||
|
||||
## Conditional execution
|
||||
|
||||
- Relational and logical operations allow for conditional
|
||||
execution of code
|
||||
|
||||
```{r}
|
||||
if ("Akita" %in% dog_breeds) {
|
||||
print("dog_breeds already contains Akita")
|
||||
} else {
|
||||
dog_breeds <- c("Akita", dog_breeds)
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
# Functions
|
||||
|
||||
## Functions
|
||||
|
||||
- A function is block of organized, reusable code that is used to
|
||||
perform a single action
|
||||
- R has many built in functions, these are called **base R** functions
|
||||
- Not all arguments are required and some have default values
|
||||
|
||||

|
||||
|
||||
## Defining a function
|
||||
- To define a function we use the function keyword, the output is specified with the **return** keyword:
|
||||
```{r}
|
||||
add_dog <- function(dog_to_add,
|
||||
input_vector) {
|
||||
if (dog_to_add %in% input_vector) {
|
||||
|
||||
message("Already contains this dog")
|
||||
|
||||
} else {
|
||||
|
||||
output <- c(dog_to_add, input_vector)
|
||||
return(output)
|
||||
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Example
|
||||
```{r}
|
||||
add_dog(dog_to_add = "Akita",
|
||||
input_vector = dog_breeds)
|
||||
```
|
||||
|
||||
```{r}
|
||||
add_dog(dog_to_add = "German Shepard",
|
||||
input_vector = dog_breeds)
|
||||
```
|
||||
|
||||
# Packages
|
||||
|
||||
## Packages
|
||||
- Packages are collections of functions that are specialized to a specific task (plotting, data manipulation etc.)
|
||||
- The tidyverse is a collection of commonly used data analysis
|
||||
packages
|
||||
- Learning curve is less steep
|
||||
- Lots of useful packages for data analysis
|
||||
|
||||
##
|
||||
|
||||

|
||||
|
||||
# End of Part 1
|
||||
|
||||
## Workshop survey
|
||||
- Please fill out our [workshop survey](https://www.surveymonkey.com/r/F75J6VZ) so we can continue to improve these workshops
|
||||
|
||||
## Upcoming Workshops
|
||||
|
||||
1. [Introduction to Statistics, Experimental Design, and Hypothesis Testing](https://gladstone.org/index.php/events/introduction-statistics-experimental-design-and-hypothesis-testing-0)
|
||||
- Jan 25, 2024 (Session 1 - 10am–12pm) (Session 2 - 1pm–3pm)
|
||||
- Jan 26, 2024 (Session 3 - 10am–12pm)
|
||||
|
||||
2. [Intermediate RNA-Seq Analysis Using R](https://gladstone.org/index.php/events/intermediate-rna-seq-analysis-using-r-4)
|
||||
- Feb 1, 2024 (9:30am-12:00pm)
|
||||
|
||||
|
||||
# ChatGPT Tips for R
|
||||
|
||||
## General Tips
|
||||
|
||||
- Always confirm ChatGPT's outputs are correct
|
||||
- Provide as much detail as possible about the problem in the 1st prompt
|
||||
- Use separate chats for separate tasks/projects
|
||||
- Try the 'Custom Instructions' function that adds additional information to every prompt
|
||||
- Can visit webpages (GPT 4 only), which can help get more specific answers
|
||||
|
||||
## Code Tips
|
||||
|
||||
- Commented R code yields better responses in my experience
|
||||
- Provide the code and error message in the same prompt
|
||||
- ChatGPT can work well to convert syntax and improve your code:
|
||||
- "Turn this loop into a function : [your code]"
|
||||
- "Is there a better way to do this : [your code]"
|
||||
- Check out the file: `example_code/1_convert_syntax_example.R` for an example use case
|
||||
|
||||
# Finding R Packages
|
||||
|
||||
## Key Questions
|
||||
|
||||
- What assay was the package designed for?
|
||||
- When was the last release?
|
||||
- Is it maintained (frequent updates)?
|
||||
- Does it work on all operating systems?
|
||||
- Are other people using it? (citations)
|
||||
- Do they respond to github issues?
|
||||
- Is there a benchmarking paper?
|
||||
|
||||
## BioConductor and CRAN
|
||||
|
||||
- Both of these have stringent requirements for packages they host (eg. for BioConductor they have to run on all major operating systems)
|
||||
|
||||
- Prefer BioConductor packages if available over CRAN
|
||||
|
||||
- Prefer CRAN packages over ones only hosted on GitHub
|
||||
|
||||
## Start with the Assay
|
||||
|
||||
- Click [here](https://www.bioconductor.org/packages/release/BiocViews.html#___Sequencing) to go to BioC views
|
||||
- Pick the assay you want to analyse
|
||||
- Pick the type of analysis you want to do
|
||||
- Find a package that does it
|
||||
- Find benchmarking papers to narrow the list of packages down
|
||||
- Find the vignette on the package page and refer to the manual for any questions not covered by it
|
||||
|
||||
|
||||
# Additional Resources
|
||||
|
||||
## R
|
||||
|
||||
- [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/how-to-read-this-book.html) : Excellent R markdown reference
|
||||
|
||||
- [R for Data Science](https://r4ds.hadley.nz/)
|
||||
|
||||
- [ggplot2: elegant graphics for data analysis](https://ggplot2-book.org/)
|
||||
|
||||
- [Advanced R](https://adv-r.hadley.nz/)
|
||||
|
||||
## Statistics
|
||||
|
||||
- [Data Analysis in R](https://bookdown.org/steve_midway/DAR) : This book has more statistics details than *R for Data Science*
|
||||
- [Generalized Linear Models](https://bookdown.org/steve_midway/DAR/glms-generalized-linear-models.html)\
|
||||
- [Random Effects](https://bookdown.org/steve_midway/DAR/random-effects.html)
|
||||
|
||||
## RNA-seq Analysis
|
||||
|
||||
- [RNA-seqlopedia](https://rnaseq.uoregon.edu/) : Everything you need to know about RNA-seq experiments
|
||||
- [RNA-seq Expression Units](https://luisvalesilva.com/datasimple/rna-seq_units.html) : Blog post on understanding common units
|
||||
- [Introduction to Single-Cell Analysis with Bioconductor](https://bioconductor.org/books/3.17/OSCA.intro/index.html) : Covers the basics of scRNA-seq analysis in R
|
||||
|
||||
## Dimensional Reduction
|
||||
|
||||
- [Tutorial on PCA](https://uw.pressbooks.pub/appliedmultivariatestatistics/chapter/pca/) : PCA explained with R code examples
|
||||
- [Understanding UMAP](https://pair-code.github.io/understanding-umap/) : Short explanation with great visualizations, mainly useful for scRNA-seq analysis
|
||||
|
||||
|
|
@ -0,0 +1,118 @@
|
|||
# Converting loops to map with ChatGPT ------------------------------------
|
||||
# Author: Natalie Elphick
|
||||
|
||||
# ** Important: **
|
||||
# Before running this code, set your working directory to the same location as this file:
|
||||
# Click Session -> Set Working Directory -> To Source File Location
|
||||
|
||||
# Run the code line by line with Ctrl+Enter or Cmd+Enter on Mac
|
||||
|
||||
# First we will create a simulated example and then use ChatGPT to convert the
|
||||
# loop into tidyverse style code that uses map()
|
||||
|
||||
library(tidyverse)
|
||||
|
||||
# Gene names
|
||||
gene_names <- paste("Gene", 1:10)
|
||||
|
||||
# Simulate some data
|
||||
data <- data.frame(
|
||||
Gene = rep(gene_names, each = 20),
|
||||
Treatment = rep(rep(c("Treatment", "Control"), each = 10), times = length(gene_names)),
|
||||
Sample = rep(1:20, times = length(gene_names)),
|
||||
Log_CPM = c(
|
||||
rnorm(10, mean = 5, sd = 1),
|
||||
rnorm(10, mean = 6, sd = 1)
|
||||
) + rnorm(20 * length(gene_names), mean = 0, sd = 1) # Add sample-specific variation
|
||||
)
|
||||
|
||||
# Create a directory to save plots
|
||||
dir.create("gene_boxplots", showWarnings = FALSE)
|
||||
|
||||
# Iterate through gene names and create boxplots
|
||||
for (gene_name in gene_names) {
|
||||
# Subset data for the current gene
|
||||
gene_data <- data[data$Gene == gene_name, ]
|
||||
|
||||
# Create a boxplot comparing treatment vs. control for Log CPM using ggplot2
|
||||
plot_filename <- paste("gene_boxplots/", gene_name, "_boxplot.png", sep = "")
|
||||
p <- ggplot(data = gene_data, aes(x = Treatment, y = Log_CPM, fill = Treatment)) +
|
||||
geom_boxplot() +
|
||||
labs(
|
||||
title = paste("Log CPM for", gene_name),
|
||||
x = "Treatment",
|
||||
y = "Log CPM"
|
||||
) +
|
||||
scale_fill_manual(values = c("#009E73", "#D55E00")) +
|
||||
theme_minimal()
|
||||
ggsave(plot = p,plot_filename, width = 6, height = 4)
|
||||
|
||||
# Print the plot filename
|
||||
cat("Boxplot saved as:", plot_filename, "\n")
|
||||
}
|
||||
|
||||
|
||||
|
||||
# Converting the syntax ---------------------------------------------------
|
||||
|
||||
# Paste the following prompt into ChatGPT:
|
||||
# Convert the following code into tidyverse style code that uses purr::map() :
|
||||
# <paste in the code above>
|
||||
|
||||
# Put the result below and run it
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# Here is what I got ------------------------------------------------------
|
||||
|
||||
library(purrr) # *This part was incorrect, it should still be library(tidyverse)*
|
||||
|
||||
# Gene names
|
||||
gene_names <- paste("Gene", 1:10)
|
||||
|
||||
# Simulate some data
|
||||
data <- tibble(
|
||||
Gene = rep(gene_names, each = 20),
|
||||
Treatment = rep(rep(c("Treatment", "Control"), each = 10), times = length(gene_names)),
|
||||
Sample = rep(1:20, times = length(gene_names)),
|
||||
Log_CPM = c(
|
||||
rnorm(10, mean = 5, sd = 1),
|
||||
rnorm(10, mean = 6, sd = 1)
|
||||
) + rnorm(20 * length(gene_names), mean = 0, sd = 1) # Add sample-specific variation
|
||||
)
|
||||
|
||||
# Create a directory to save plots
|
||||
dir.create("gene_boxplots", showWarnings = FALSE)
|
||||
|
||||
# Function to create and save a boxplot for a given gene
|
||||
save_gene_plot <- function(gene_name) {
|
||||
gene_data <- data %>% filter(Gene == gene_name)
|
||||
plot_filename <- paste("gene_boxplots/", gene_name, "_boxplot.png", sep = "")
|
||||
p <- ggplot(data = gene_data, aes(x = Treatment, y = Log_CPM, fill = Treatment)) +
|
||||
geom_boxplot() +
|
||||
labs(
|
||||
title = paste("Log CPM for", gene_name),
|
||||
x = "Treatment",
|
||||
y = "Log CPM"
|
||||
) +
|
||||
scale_fill_manual(values = c("#009E73", "#D55E00")) +
|
||||
theme_minimal()
|
||||
ggsave(plot = p, filename = plot_filename, width = 6, height = 4)
|
||||
|
||||
# Print the plot filename
|
||||
cat("Boxplot saved as:", plot_filename, "\n")
|
||||
}
|
||||
|
||||
# Iterate through gene names and create boxplots using map
|
||||
map(gene_names, save_gene_plot)
|
||||
|
||||
|
||||
|
||||
|
||||
169
intro-r-data-analysis/Intro_to_R_workshop_materials/part_1.R
Executable file
|
|
@ -0,0 +1,169 @@
|
|||
# Exercise 1: Rscript Files -----------------------------------------------
|
||||
|
||||
# This is an Rscript file
|
||||
# - It contains lines of code that can be executed using Ctrl+Enter
|
||||
#
|
||||
# - Lines that begin with # are called comments and are ignored by the compiler,
|
||||
# these can be used to document what the code does
|
||||
#
|
||||
# - The whole file can be executed by pressing Source above
|
||||
# *This will error unless you comment out line 49*
|
||||
#
|
||||
# - Refer back to the slides at the end of each Exercise
|
||||
|
||||
# Use Ctrl+Enter to execute each of these lines
|
||||
|
||||
1 + 5 # Addition
|
||||
|
||||
70 - 23 # Subtraction
|
||||
|
||||
64 / 8 # Division
|
||||
|
||||
8 * 8 # Multiplication
|
||||
|
||||
sqrt(25) # Square root
|
||||
|
||||
5 ^ 2 # Exponent
|
||||
|
||||
pi # Built in variable for pi
|
||||
|
||||
# Rstudio's tab completion feature is very useful, type "sq" and press tab
|
||||
|
||||
|
||||
# **Check your understanding**
|
||||
# Calculate the area of a circle with radius = 5cm
|
||||
|
||||
|
||||
|
||||
# End of Exercise 1 -------------------------------------------------------
|
||||
|
||||
|
||||
# Exercise 2: Data Types and Structures -----------------------------------
|
||||
|
||||
##### 2.1 Data Types #####
|
||||
|
||||
# Different types and structures have can have different operations and
|
||||
# functions done to them. Use class() to check the type of a variable
|
||||
|
||||
# Integer
|
||||
x <- 4L
|
||||
class(x)
|
||||
|
||||
# Numeric
|
||||
y <- 1.5
|
||||
class(y)
|
||||
|
||||
# When possible R will coerce types, for example:
|
||||
class(x + y)
|
||||
|
||||
# It will not coerce all types though, this will throw an error
|
||||
1 + "2"
|
||||
|
||||
# You can change the type of a variable using "as" functions
|
||||
1 + as.numeric("2")
|
||||
|
||||
# What do you think the output of this line will be?
|
||||
4 + TRUE
|
||||
|
||||
# What about this line?
|
||||
class(c(TRUE, FALSE, TRUE, 4))
|
||||
|
||||
# NAs are a special case of logical, used most often to represent missing data
|
||||
class(NA)
|
||||
|
||||
# Math operations with NAs return NA
|
||||
2 + NA
|
||||
mean(c(1,2,3,NA))
|
||||
|
||||
##### 2.2 Data Structures #####
|
||||
|
||||
# Defining an atomic vector
|
||||
abc <- c("a", "b", "c")
|
||||
|
||||
# Accessing elements of a vector
|
||||
abc[2]
|
||||
|
||||
# Range of elements
|
||||
abc[1:2]
|
||||
|
||||
# Position of an element
|
||||
which(abc == "b")
|
||||
|
||||
# Easily specify a vector of numbers
|
||||
num <- 1:10
|
||||
num
|
||||
|
||||
# Defining an unnamed list
|
||||
abc <- list("a", "b", 3)
|
||||
|
||||
# Accessing elements of unnamed list
|
||||
abc[[1]]
|
||||
|
||||
# Defining a named list
|
||||
abc <- list(a = 1, b = "2", c = 3)
|
||||
|
||||
# Elements of a named list can be accessed 3 ways
|
||||
abc$b # Name with dollar sign
|
||||
abc[["b"]] # Name with brackets
|
||||
abc[[2]] # Numeric index with brackets
|
||||
|
||||
# Range of elements
|
||||
abc[2:3]
|
||||
|
||||
|
||||
# Defining a matrix
|
||||
numeric_matrix <- matrix(1:9, nrow = 3, ncol = 3)
|
||||
numeric_matrix
|
||||
|
||||
# Elements are accessed using row,column indexes
|
||||
numeric_matrix[2, 3]
|
||||
numeric_matrix[2, ] # Whole 2nd row
|
||||
numeric_matrix[1:2, ] # First 2 rows
|
||||
|
||||
# Rows and columns of a matrix can have names
|
||||
numeric_matrix <- matrix(data = 1:9,
|
||||
nrow = 3,
|
||||
dimnames = list(
|
||||
c("X", "Y", "Z"),
|
||||
c("A", "B", "C")
|
||||
)
|
||||
)
|
||||
numeric_matrix
|
||||
|
||||
# Elements can be accessed both by numeric index and column and row names
|
||||
numeric_matrix[2, 3]
|
||||
numeric_matrix["Y", "C"]
|
||||
numeric_matrix[1:2, "C"]
|
||||
|
||||
# Defining a data frame
|
||||
mixed_df <- data.frame(
|
||||
numbers = 1:3,
|
||||
letters = c("A", "B", "C"),
|
||||
bool = c(TRUE, FALSE, NA)
|
||||
)
|
||||
mixed_df
|
||||
|
||||
# Accessing elements of a data frame
|
||||
|
||||
mixed_df$numbers # Returns the vector
|
||||
mixed_df[["numbers"]]
|
||||
|
||||
mixed_df["letters"] # Returns the column as a single column data frame
|
||||
|
||||
# Count rows and columns in data frames or matrices
|
||||
nrow(mixed_df)
|
||||
ncol(mixed_df)
|
||||
|
||||
# Similar to matrices, elements in a data frame can also be accessed by
|
||||
# combinations of numeric index and column and row names.
|
||||
|
||||
# **Check your understanding**
|
||||
# Write 1 line of code that accesses only elements B, C, FALSE and NA of
|
||||
# mixed_df
|
||||
#
|
||||
# Hint: line 134
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
281
intro-r-data-analysis/Intro_to_R_workshop_materials/part_2.Rmd
Executable file
|
|
@ -0,0 +1,281 @@
|
|||
---
|
||||
title: "Intro to R Data Analysis: Part 2"
|
||||
output: html_document # knitr report document type
|
||||
date: "`r Sys.Date()`" # This will update the date everytime you knit the doc
|
||||
---
|
||||
|
||||
```{r setup, include=FALSE}
|
||||
knitr::opts_chunk$set(echo = TRUE)
|
||||
|
||||
# Load packages
|
||||
library(dplyr) # tidyverse data frame manipulation package
|
||||
library(tidyr) # functions to help clean data
|
||||
library(magrittr) # this package provides the pipe operator %>%
|
||||
library(readxl) # read excel files
|
||||
library(ggplot2) # highly customizable plots
|
||||
```
|
||||
|
||||
## R Markdown
|
||||
|
||||
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>. Guide to markdown syntax <https://www.markdownguide.org/basic-syntax/>.
|
||||
|
||||
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
|
||||
|
||||
```{r}
|
||||
# Simulates 100 observations from a normal distribution
|
||||
# and plots a histogram
|
||||
val <- rnorm(n = 100)
|
||||
hist(val, breaks = 20)
|
||||
```
|
||||
|
||||
```{r}
|
||||
# The ggplot version of the same plot
|
||||
|
||||
ggplot(data = tibble(values = val),
|
||||
mapping = aes(x = values))+
|
||||
geom_histogram(bins = 20)
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
**Important: before running the code below, click Session -> Set Working Directory -> To Source File Location**
|
||||
|
||||
|
||||
## Exercise 3: Reading in Data
|
||||
|
||||
The data we will be analyzing is from the PanTHERIA database which is "a global species-level data set of key life-history, ecological and geographical traits of all known extant and recently extinct mammals (PanTHERIA) developed for a number of macroecological and macroevolutionary research projects."
|
||||
|
||||
```{r}
|
||||
# The data is spread across 3 sheets in an excel file. We need to
|
||||
# combine these data into one table/data frame.
|
||||
|
||||
# na = "NA" tells read_xlsx how missing values appear in the data
|
||||
# the default is empty cells. Run "?read_xlsx" for more info
|
||||
|
||||
sheet1 <- read_xlsx(path = "PanTHERIA.xlsx", sheet = 1, na = "NA")
|
||||
sheet2 <- read_xlsx(path = "PanTHERIA.xlsx", sheet = 2, na = "NA")
|
||||
sheet3 <- read_xlsx(path = "PanTHERIA.xlsx", sheet = 3, na = "NA")
|
||||
|
||||
|
||||
colnames(sheet1) == colnames(sheet2) & colnames(sheet2) == colnames(sheet3)
|
||||
|
||||
# rbind (row-bind) combines data frames by row
|
||||
|
||||
pantheria <- rbind(sheet1, sheet2, sheet3)
|
||||
```
|
||||
|
||||
|
||||
```{r}
|
||||
# How many rows and columns are there?
|
||||
nrow(pantheria)
|
||||
ncol(pantheria)
|
||||
```
|
||||
|
||||
|
||||
```{r}
|
||||
# What does the data look like?
|
||||
head(pantheria)
|
||||
```
|
||||
|
||||
|
||||
## Exercise 4: Filtering and Reformatting Data
|
||||
|
||||
We will exploring adult body mass from these mammals as it relates to their trophic level using `dpylr` and `ggplot2`. Download the cheatsheets for these packages at the following links:
|
||||
|
||||
* [dplyr cheatsheet](https://posit.co/wp-content/uploads/2022/10/data-transformation-1.pdf)
|
||||
* [ggplot2 cheatsheet](https://posit.co/wp-content/uploads/2022/10/data-visualization-1.pdf)
|
||||
|
||||
Let's start by subsetting the data with `select()`
|
||||
```{r}
|
||||
# Pipes (%>%) work by passing the data in front of the pipe to the first argument
|
||||
# of the function after it, this prevents a lot of nested function calls and makes
|
||||
# code easier to read.
|
||||
|
||||
pantheria <- pantheria %>%
|
||||
select(Order,
|
||||
Family,
|
||||
Genus,
|
||||
Species,
|
||||
TrophicLevel,
|
||||
AdultBodyMass_g) %>%
|
||||
drop_na() %>%
|
||||
distinct()
|
||||
|
||||
pantheria <- select(pantheria,
|
||||
Order,
|
||||
Family,
|
||||
Genus,
|
||||
Species,
|
||||
TrophicLevel,
|
||||
AdultBodyMass_g)
|
||||
pantheria <- drop_na(pantheria)
|
||||
pantheria <- distinct(pantheria)
|
||||
|
||||
|
||||
|
||||
|
||||
```
|
||||
|
||||
Data is almost never clean, for example there should be only 3 trophic levels:
|
||||
```{r}
|
||||
# unique elements of a vector
|
||||
unique(pantheria$TrophicLevel)
|
||||
```
|
||||
|
||||
Let's fix the TrophicLevel column using `mutate()`
|
||||
|
||||
```{r}
|
||||
# mutate allows us to add columns or modify existing ones
|
||||
pantheria <- mutate(pantheria,TrophicLevel = tolower(TrophicLevel))
|
||||
|
||||
pantheria %>%
|
||||
mutate(AdultBodyMass_g = gsub("\\.", "-",AdultBodyMass_g))
|
||||
```
|
||||
|
||||
## Exercise 5: Summarizing data
|
||||
|
||||
Now we can summarize the adult body mass by trophic level by computing standard metrics like mean and standard deviation.
|
||||
|
||||
```{r}
|
||||
pantheria %>%
|
||||
group_by(TrophicLevel) %>%
|
||||
summarize(Mean = mean(AdultBodyMass_g),
|
||||
Median = median(AdultBodyMass_g),
|
||||
`Standard Deviation` = sd(AdultBodyMass_g),
|
||||
Min = min(AdultBodyMass_g),
|
||||
Max = max(AdultBodyMass_g))
|
||||
```
|
||||
|
||||
|
||||
## Exercise 6: Plotting Data
|
||||
|
||||
According to the table above, body masses have a really wide range across trophic levels. Let's visualize the distribution of adult body masses.
|
||||
|
||||
```{r}
|
||||
# ggplot2 constructs graphics in layers, each layer is separated by "+"
|
||||
# x and y values are supplied in aes(), the type of plot is specified using
|
||||
# the "geom" functions
|
||||
ggplot(data = pantheria, mapping = aes(x = log10(AdultBodyMass_g))) +
|
||||
geom_histogram(bins = 50, fill ="#CE3274") +
|
||||
xlab(label = "log10 Adult Body Mass") +
|
||||
ylab(label = "Frequency") +
|
||||
ggtitle(label = "Histogram of Adult Body Mass") +
|
||||
theme_classic() +
|
||||
theme(plot.title = element_text(hjust = .5, face = "bold"),
|
||||
axis.title.x = element_text(face = "bold"))
|
||||
|
||||
```
|
||||
|
||||
The data looks skewed even after log10 transformation. Let's view the distribution by trophic level.
|
||||
|
||||
```{r}
|
||||
pantheria %>%
|
||||
ggplot(mapping = aes(x = log10(AdultBodyMass_g),fill = TrophicLevel)) +
|
||||
geom_histogram(bins = 50) +
|
||||
facet_grid(rows = vars(TrophicLevel)) +
|
||||
xlab(label = "log10 Adult Body Mass") +
|
||||
ylab(label = "Frequency") +
|
||||
ggtitle(label = "Histogram of Adult Body Mass") +
|
||||
theme_classic() +
|
||||
theme(plot.title = element_text(hjust = .5))
|
||||
```
|
||||
|
||||
It is clear that trophic level does have an impact on the distribution of adult body mass, carnivores tend to be smaller which makes sense because carnivores have higher metabolic demands and so there might be a selection pressure towards smaller carnivores. If we wanted to confirm this by fitting a model, we could use the `lm()` function to fit a linear model.
|
||||
|
||||
## Exercise 7.1: Hands on coding
|
||||
|
||||
An important caveat of the data is that some Orders of mammals are more biodiverse than others and are therefore over represented in the dataset. Using the `dplyr` cheatsheet, write code generates a table of Orders and what percentage of the data they are. Scroll down to see the hint if you are having trouble.
|
||||
|
||||
```{r}
|
||||
# Your code
|
||||
pantheria %>%
|
||||
group_by(Order) %>%
|
||||
summarise(n = (n()/nrow(pantheria)) * 100) %>%
|
||||
arrange(desc(n))
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
*Exercise 7.1 hint: group_by + summarize(n = n()) + arrange*
|
||||
|
||||
## Exercise 7.2: Hands on coding
|
||||
|
||||
Now that we see what the over represented Orders are, we can plot their body masses by trophic level to see if they are skewing the overall distributions.
|
||||
|
||||
|
||||
```{r}
|
||||
|
||||
# Character vector of the top 2 Orders from above
|
||||
top_orders <- c("Rodentia", "Chiroptera")
|
||||
# filter uses a conditional to select rows from the data
|
||||
|
||||
pantheria %>%
|
||||
filter(Order %in% top_orders) %>%
|
||||
ggplot(mapping = aes(x = log10(AdultBodyMass_g), fill = Order)) +
|
||||
geom_histogram(bins = ) +
|
||||
facet_grid(rows = vars(TrophicLevel),
|
||||
cols = vars(Order))
|
||||
|
||||
```
|
||||
|
||||
It looks like one of them is mostly made up of small carnivores. Let's remove it and redo the plot of body mass distribution by trophic level.
|
||||
|
||||
```{r}
|
||||
pantheria %>%
|
||||
filter(Order != "Chiroptera") %>%
|
||||
ggplot(mapping = aes(x = log10(AdultBodyMass_g),fill = TrophicLevel)) +
|
||||
geom_histogram(bins = 40 ) +
|
||||
facet_grid(rows = vars(TrophicLevel))
|
||||
|
||||
pantheria %>%
|
||||
ggplot(mapping = aes(x = log10(AdultBodyMass_g),fill = TrophicLevel))
|
||||
```
|
||||
|
||||
We can see now that body mass of carnivorous mammals is much less skewed than the initial plots show. There is still an effect of trophic level on body mass, but the effect size is likely much smaller than we would estimate by including all `r sum(pantheria$Order == "Chiroptera")` *Chiroptera*. Now that we have generated these plots, we can generate a full report that contains all of the text and code, click `Knit` to render the HTML report.
|
||||
|
||||
## End of workshop exercises
|
||||
|
||||
Hopefully this workshop has provided a good foundation for you to learn R. If you would like some additional practice, check out the resources on the [workshop wiki](https://github.com/gladstone-institutes/Bioinformatics-Workshops/wiki/Introduction-to-R-for-Data-Analysis). R also contains many built in datasets you can use for practice:
|
||||
```{r, echo=FALSE}
|
||||
available_datasets <- data()
|
||||
available_datasets$results %>%
|
||||
as_tibble() %>%
|
||||
select(-LibPath) %>% knitr::kable()
|
||||
```
|
||||
|
||||
|
||||
|
||||
BIN
intro-r-data-analysis/assets/RStudio_layout.png
Normal file
|
After Width: | Height: | Size: 1.8 MiB |
BIN
intro-r-data-analysis/assets/R_and_RStudio.png
Normal file
|
After Width: | Height: | Size: 81 KiB |
BIN
intro-r-data-analysis/assets/R_lang_hierarchy.png
Normal file
|
After Width: | Height: | Size: 281 KiB |
BIN
intro-r-data-analysis/assets/data structures.png
Normal file
|
After Width: | Height: | Size: 111 KiB |
BIN
intro-r-data-analysis/assets/functions.png
Normal file
|
After Width: | Height: | Size: 737 KiB |
BIN
intro-r-data-analysis/assets/tidyverse.png
Normal file
|
After Width: | Height: | Size: 223 KiB |
13
intro-r-data-analysis/intro-r-data-analysis.Rproj
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
Version: 1.0
|
||||
|
||||
RestoreWorkspace: Default
|
||||
SaveWorkspace: Default
|
||||
AlwaysSaveHistory: Default
|
||||
|
||||
EnableCodeIndexing: Yes
|
||||
UseSpacesForTab: Yes
|
||||
NumSpacesForTab: 2
|
||||
Encoding: UTF-8
|
||||
|
||||
RnwWeave: Sweave
|
||||
LaTeX: pdfLaTeX
|
||||
1
intro-r-data-analysis/lesson_0/.gitignore
vendored
Normal file
|
|
@ -0,0 +1 @@
|
|||
*.html
|
||||
BIN
intro-r-data-analysis/lesson_0/images/ggplot_output.png
Normal file
|
After Width: | Height: | Size: 41 KiB |
BIN
intro-r-data-analysis/lesson_0/images/rstudio_screenshot.png
Normal file
|
After Width: | Height: | Size: 38 KiB |
86
intro-r-data-analysis/lesson_0/lesson_0.Rmd
Normal file
|
|
@ -0,0 +1,86 @@
|
|||
---
|
||||
title: "Pre-Workshop Setup"
|
||||
tutorial:
|
||||
id: "intro-r-data-analysis_lesson_0"
|
||||
version: 1.0
|
||||
output:
|
||||
learnr::tutorial:
|
||||
theme: "lumen"
|
||||
progressive: true
|
||||
allow_skip: true
|
||||
runtime: shiny_prerendered
|
||||
description: >
|
||||
Learn how to set up R and RStudio on your machine. We will also demonstrate
|
||||
how to install R packages from CRAN, and install the tidyverse package.
|
||||
---
|
||||
|
||||
```{r setup, include=FALSE}
|
||||
library(learnr)
|
||||
learnr::tutorial_options(exercise.timelimit = 10)
|
||||
```
|
||||
|
||||
## Introduction
|
||||
|
||||
This guide will help you get set up for <ins>Intro to R Data Analysis</ins>. There are just a few steps to make sure you'll have the necessary software installed and ready to go on day 1. **Please ensure that you've completed each step by running the validation test prior to the start of the workshop**.
|
||||
|
||||
This guide will walk you through how to install R, RStudio, and some additional tools that we’ll be using in the course. By rough analogy to a car, R is like the car’s engine and RStudio is like the dashboard. More precisely, R is a programming language and Rstudio is an ‘integrated development environment’ (IDE), which is basically a nice software interface for interacting with R. For our purposes, you will only ever interact directly with RStudio, but it needs to have R installed to work (like a car needing its engine).
|
||||
|
||||
Please complete the following steps (must be done in this order). If you already have R and Rstudio installed you can skip ahead. Make sure you complete step 5 though!
|
||||
|
||||
1. [Install R](#install-r)
|
||||
2. [Install RStudio](#install-rstudio)
|
||||
3. [Check you have recent versions of R](#check-you-have-a-recent-version-of-r)
|
||||
4. [Install required packages](#install-required-packages)
|
||||
5. [Run verification test](#run-verification-test)
|
||||
|
||||
Please consult the links provided for additional tips, and feel free to reach out for help by email [me](mailto:natalie.elphick@gladstone.ucsf.edu) if you get stuck.
|
||||
|
||||
## Install R
|
||||
|
||||
Please watch this quick video guide on how to install R, and then use the link below.
|
||||
|
||||
**NOTE MacOS users: With the new MacOS updates, updating R might require you to re-install your packages. While not in issue for many people, you have been warned**
|
||||

|
||||
https://cloud.r-project.org/
|
||||
|
||||
## Install RStudio
|
||||
Please watch this quick video on how to install Rstudio, and then use the link below.
|
||||

|
||||
https://www.rstudio.com/products/rstudio/download/
|
||||
|
||||
|
||||
## Install Required Packages
|
||||
Many of the tools we will want to use do not come prepackaged with R, but rather need to be installed as ‘packages’. There are a few key packages we will be using. Watch the following video on how to install packages in Rstudio:
|
||||

|
||||
You can also refer to [this site](https://moderndive.netlify.app/1-getting-started.html#packages) for more info on what packages are and how to install them (including a ‘GUI’ installation method if you prefer that).
|
||||
Please install the [`tidyverse`](https://www.tidyverse.org/) R package, which we’ll be relying on extensively throughout the course. No need to worry about what exactly this is yet, but you can read more [here](https://www.tidyverse.org/) if you like.
|
||||
|
||||
## Run verification test
|
||||
Now it’s time to make sure you have everything installed properly!
|
||||
First, open Rstudio (remember, you want to open Rstudio, not R). You should see a window that looks something like this:
|
||||
|
||||
<center>
|
||||
<img src="images/rstudio_screenshot.png" width="40%">
|
||||
</center>
|
||||
|
||||
The pane with the ‘>’ symbol is the Console. This is where you enter R commands.
|
||||
Copy and paste the following code into your Rstudio console and hit return.
|
||||
|
||||
```{r, eval=F}
|
||||
R_version <- as.numeric(R.version['major']$major)
|
||||
if (R_version >= 4) {
|
||||
library(ggplot2)
|
||||
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
|
||||
geom_point()
|
||||
} else {
|
||||
print('R version is too old')
|
||||
}
|
||||
```
|
||||
You should see a plot that looks like this appear:
|
||||
|
||||
<center>
|
||||
<img src="images/ggplot_output.png" width="40%">
|
||||
</center>
|
||||
|
||||
|
||||
If you see an error that says “R version is too old” that means you need to update your R version. The update process is the same as the installation process. It will update your R installation. If you see an error that says “There is no package called ggplot2” that means you need to install the tidyverse package (see above).
|
||||
|
|
@ -0,0 +1,10 @@
|
|||
name: lesson_0
|
||||
title:
|
||||
username: natalie-elphick
|
||||
account: natalie-elphick
|
||||
server: shinyapps.io
|
||||
hostUrl: https://api.shinyapps.io/v1
|
||||
appId: 10952169
|
||||
bundleId: 8154915
|
||||
url: https://natalie-elphick.shinyapps.io/lesson_0/
|
||||
version: 1
|
||||
1805
intro-r-data-analysis/renv.lock
Normal file
7
intro-r-data-analysis/renv/.gitignore
vendored
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
library/
|
||||
local/
|
||||
cellar/
|
||||
lock/
|
||||
python/
|
||||
sandbox/
|
||||
staging/
|
||||
1180
intro-r-data-analysis/renv/activate.R
Normal file
19
intro-r-data-analysis/renv/settings.json
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
{
|
||||
"bioconductor.version": null,
|
||||
"external.libraries": [],
|
||||
"ignored.packages": [],
|
||||
"package.dependency.fields": [
|
||||
"Imports",
|
||||
"Depends",
|
||||
"LinkingTo"
|
||||
],
|
||||
"ppm.enabled": null,
|
||||
"ppm.ignored.urls": [],
|
||||
"r.version": null,
|
||||
"snapshot.type": "implicit",
|
||||
"use.cache": true,
|
||||
"vcs.ignore.cellar": true,
|
||||
"vcs.ignore.library": true,
|
||||
"vcs.ignore.local": true,
|
||||
"vcs.manage.ignores": true
|
||||
}
|
||||
|
|
@ -1,34 +0,0 @@
|
|||
#This script perform some basic calculations in R.
|
||||
#To run this script you may select all and hit the Run button on top right of this pane ...
|
||||
#... or go Cmd+A followed by Cmd+Enter on Mac. If you use Windows, ...
|
||||
#... you can also go Ctrl+A followed by Ctrl+Enter.
|
||||
|
||||
#The following command will add 2 to 3 and store the value in variable named 'a'.
|
||||
a <- sum(2,3)
|
||||
|
||||
#The following command will get the product of 2 and 3 and store it in 'b'.
|
||||
b <- prod(2,3)
|
||||
|
||||
#Next, we check if a and b have equal values.
|
||||
a == b
|
||||
|
||||
#Next, we check if a and b are not equal.
|
||||
a != b
|
||||
#Conclusion: Summing numbers is not the same as multiplying them!
|
||||
#Time to write a paper? I think we can go to Nature or Science with this discovery.
|
||||
|
||||
|
||||
#Check if two conditions are simultaneously true.
|
||||
(a == b) & (sqrt(3) == 5)
|
||||
|
||||
#Check if any one of given two conditions are true.
|
||||
#Vertical line is how we say 'or' in R.
|
||||
(a == b) | (sqrt(3) == 5)
|
||||
|
||||
#Variables can also store characters.
|
||||
name <- "Homo sapiens"
|
||||
|
||||
#Is Homo sapiens equal to human being?
|
||||
name == "Human being"
|
||||
|
||||
#Apparently not?
|
||||
|
|
@ -1,50 +0,0 @@
|
|||
#Vectors.
|
||||
#This is a numeric vector.
|
||||
a <- c(3, 5, 2.5, 0, 9)
|
||||
mean(a)
|
||||
|
||||
#Vectors (as well as other data objects) may contain NA.
|
||||
#NA stands for Not Available.
|
||||
a <- c(3, 5, 6.8, NA)
|
||||
mean(a)
|
||||
|
||||
#To ignore NA while taking mean.
|
||||
mean(a, na.rm = TRUE)
|
||||
|
||||
#Vectors are ordered sequences.
|
||||
#This is a character vector.
|
||||
b <- c("Homo sapiens", "Martians", "Blue.Whales", "Homo sapiens")
|
||||
table(b)
|
||||
|
||||
#----------
|
||||
#Data frame
|
||||
#Tabular data possibly with mix of numeric, character, factor or logical type entries.
|
||||
#Example: Iris setosa data that we worked with.
|
||||
df <- data.frame(a = 1:4,
|
||||
name = b)
|
||||
|
||||
#---------
|
||||
#Matrix
|
||||
#Matrices are tabular like data frames but store only one type of entries.
|
||||
df_matrix <- as.matrix(df)
|
||||
|
||||
#May convert between data structures.
|
||||
df_numeric <- as.numeric(df_matrix)
|
||||
df_character <- as.character(df_numeric)
|
||||
|
||||
#----------
|
||||
#List
|
||||
#Lists are flexible data structures.
|
||||
#Lists have named fields which can contain arbitrary data—vectors, other lists, strings, functions, and anything else.
|
||||
#We may wish to keep related data together in one place.
|
||||
#Say you search for a sequence in genome using an R library.
|
||||
#Your ideal output might have mixed entries. For example,
|
||||
#if sequence is found on a chromosome, you want table containing start and end loci on the
|
||||
#chromosome and matched length.
|
||||
#If not found, you want a string that says "Not found."
|
||||
result <- list(chr1 = data.frame(Start = c(5, 100, 200),
|
||||
End = c(70, 150, 230),
|
||||
Length = c(66, 50, 30)
|
||||
),
|
||||
chr2 = "Not found."
|
||||
)
|
||||
|
|
@ -1,24 +0,0 @@
|
|||
>J01859.1 Escherichia coli 16S ribosomal RNA, complete sequence
|
||||
AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGT
|
||||
AACAGGAAGAAGCTTGCTCTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATG
|
||||
GAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTTCG
|
||||
GGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACG
|
||||
ATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGG
|
||||
CAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTT
|
||||
CGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCG
|
||||
CAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAAT
|
||||
TACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAAC
|
||||
TGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGT
|
||||
AGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCG
|
||||
TGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCC
|
||||
TTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACT
|
||||
CAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCT
|
||||
TACCTGGTCTTGACATCCACGGAAGTTTTCAGAGATGAGAATGTGCCTTCGGGAACCGTGAGACAGGTGC
|
||||
TGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCT
|
||||
TTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGA
|
||||
CGTCAAGTCATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGA
|
||||
CCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATG
|
||||
AAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCG
|
||||
CCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTT
|
||||
TGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTT
|
||||
A
|
||||
|
|
@ -1,151 +0,0 @@
|
|||
"Sepal.Length";"Sepal.Width";"Petal.Length";"Petal.Width";"Species"
|
||||
"1";5.1;3.5;1.4;0.2;"setosa"
|
||||
"2";4.9;3;1.4;0.2;"setosa"
|
||||
"3";4.7;3.2;1.3;0.2;"setosa"
|
||||
"4";4.6;3.1;1.5;0.2;"setosa"
|
||||
"5";5;3.6;1.4;0.2;"setosa"
|
||||
"6";5.4;3.9;1.7;0.4;"setosa"
|
||||
"7";4.6;3.4;1.4;0.3;"setosa"
|
||||
"8";5;3.4;1.5;0.2;"setosa"
|
||||
"9";4.4;2.9;1.4;0.2;"setosa"
|
||||
"10";4.9;3.1;1.5;0.1;"setosa"
|
||||
"11";5.4;3.7;1.5;0.2;"setosa"
|
||||
"12";4.8;3.4;1.6;0.2;"setosa"
|
||||
"13";4.8;3;1.4;0.1;"setosa"
|
||||
"14";4.3;3;1.1;0.1;"setosa"
|
||||
"15";5.8;4;1.2;0.2;"setosa"
|
||||
"16";5.7;4.4;1.5;0.4;"setosa"
|
||||
"17";5.4;3.9;1.3;0.4;"setosa"
|
||||
"18";5.1;3.5;1.4;0.3;"setosa"
|
||||
"19";5.7;3.8;1.7;0.3;"setosa"
|
||||
"20";5.1;3.8;1.5;0.3;"setosa"
|
||||
"21";5.4;3.4;1.7;0.2;"setosa"
|
||||
"22";5.1;3.7;1.5;0.4;"setosa"
|
||||
"23";4.6;3.6;1;0.2;"setosa"
|
||||
"24";5.1;3.3;1.7;0.5;"setosa"
|
||||
"25";4.8;3.4;1.9;0.2;"setosa"
|
||||
"26";5;3;1.6;0.2;"setosa"
|
||||
"27";5;3.4;1.6;0.4;"setosa"
|
||||
"28";5.2;3.5;1.5;0.2;"setosa"
|
||||
"29";5.2;3.4;1.4;0.2;"setosa"
|
||||
"30";4.7;3.2;1.6;0.2;"setosa"
|
||||
"31";4.8;3.1;1.6;0.2;"setosa"
|
||||
"32";5.4;3.4;1.5;0.4;"setosa"
|
||||
"33";5.2;4.1;1.5;0.1;"setosa"
|
||||
"34";5.5;4.2;1.4;0.2;"setosa"
|
||||
"35";4.9;3.1;1.5;0.2;"setosa"
|
||||
"36";5;3.2;1.2;0.2;"setosa"
|
||||
"37";5.5;3.5;1.3;0.2;"setosa"
|
||||
"38";4.9;3.6;1.4;0.1;"setosa"
|
||||
"39";4.4;3;1.3;0.2;"setosa"
|
||||
"40";5.1;3.4;1.5;0.2;"setosa"
|
||||
"41";5;3.5;1.3;0.3;"setosa"
|
||||
"42";4.5;2.3;1.3;0.3;"setosa"
|
||||
"43";4.4;3.2;1.3;0.2;"setosa"
|
||||
"44";5;3.5;1.6;0.6;"setosa"
|
||||
"45";5.1;3.8;1.9;0.4;"setosa"
|
||||
"46";4.8;3;1.4;0.3;"setosa"
|
||||
"47";5.1;3.8;1.6;0.2;"setosa"
|
||||
"48";4.6;3.2;1.4;0.2;"setosa"
|
||||
"49";5.3;3.7;1.5;0.2;"setosa"
|
||||
"50";5;3.3;1.4;0.2;"setosa"
|
||||
"51";7;3.2;4.7;1.4;"versicolor"
|
||||
"52";6.4;3.2;4.5;1.5;"versicolor"
|
||||
"53";6.9;3.1;4.9;1.5;"versicolor"
|
||||
"54";5.5;2.3;4;1.3;"versicolor"
|
||||
"55";6.5;2.8;4.6;1.5;"versicolor"
|
||||
"56";5.7;2.8;4.5;1.3;"versicolor"
|
||||
"57";6.3;3.3;4.7;1.6;"versicolor"
|
||||
"58";4.9;2.4;3.3;1;"versicolor"
|
||||
"59";6.6;2.9;4.6;1.3;"versicolor"
|
||||
"60";5.2;2.7;3.9;1.4;"versicolor"
|
||||
"61";5;2;3.5;1;"versicolor"
|
||||
"62";5.9;3;4.2;1.5;"versicolor"
|
||||
"63";6;2.2;4;1;"versicolor"
|
||||
"64";6.1;2.9;4.7;1.4;"versicolor"
|
||||
"65";5.6;2.9;3.6;1.3;"versicolor"
|
||||
"66";6.7;3.1;4.4;1.4;"versicolor"
|
||||
"67";5.6;3;4.5;1.5;"versicolor"
|
||||
"68";5.8;2.7;4.1;1;"versicolor"
|
||||
"69";6.2;2.2;4.5;1.5;"versicolor"
|
||||
"70";5.6;2.5;3.9;1.1;"versicolor"
|
||||
"71";5.9;3.2;4.8;1.8;"versicolor"
|
||||
"72";6.1;2.8;4;1.3;"versicolor"
|
||||
"73";6.3;2.5;4.9;1.5;"versicolor"
|
||||
"74";6.1;2.8;4.7;1.2;"versicolor"
|
||||
"75";6.4;2.9;4.3;1.3;"versicolor"
|
||||
"76";6.6;3;4.4;1.4;"versicolor"
|
||||
"77";6.8;2.8;4.8;1.4;"versicolor"
|
||||
"78";6.7;3;5;1.7;"versicolor"
|
||||
"79";6;2.9;4.5;1.5;"versicolor"
|
||||
"80";5.7;2.6;3.5;1;"versicolor"
|
||||
"81";5.5;2.4;3.8;1.1;"versicolor"
|
||||
"82";5.5;2.4;3.7;1;"versicolor"
|
||||
"83";5.8;2.7;3.9;1.2;"versicolor"
|
||||
"84";6;2.7;5.1;1.6;"versicolor"
|
||||
"85";5.4;3;4.5;1.5;"versicolor"
|
||||
"86";6;3.4;4.5;1.6;"versicolor"
|
||||
"87";6.7;3.1;4.7;1.5;"versicolor"
|
||||
"88";6.3;2.3;4.4;1.3;"versicolor"
|
||||
"89";5.6;3;4.1;1.3;"versicolor"
|
||||
"90";5.5;2.5;4;1.3;"versicolor"
|
||||
"91";5.5;2.6;4.4;1.2;"versicolor"
|
||||
"92";6.1;3;4.6;1.4;"versicolor"
|
||||
"93";5.8;2.6;4;1.2;"versicolor"
|
||||
"94";5;2.3;3.3;1;"versicolor"
|
||||
"95";5.6;2.7;4.2;1.3;"versicolor"
|
||||
"96";5.7;3;4.2;1.2;"versicolor"
|
||||
"97";5.7;2.9;4.2;1.3;"versicolor"
|
||||
"98";6.2;2.9;4.3;1.3;"versicolor"
|
||||
"99";5.1;2.5;3;1.1;"versicolor"
|
||||
"100";5.7;2.8;4.1;1.3;"versicolor"
|
||||
"101";6.3;3.3;6;2.5;"virginica"
|
||||
"102";5.8;2.7;5.1;1.9;"virginica"
|
||||
"103";7.1;3;5.9;2.1;"virginica"
|
||||
"104";6.3;2.9;5.6;1.8;"virginica"
|
||||
"105";6.5;3;5.8;2.2;"virginica"
|
||||
"106";7.6;3;6.6;2.1;"virginica"
|
||||
"107";4.9;2.5;4.5;1.7;"virginica"
|
||||
"108";7.3;2.9;6.3;1.8;"virginica"
|
||||
"109";6.7;2.5;5.8;1.8;"virginica"
|
||||
"110";7.2;3.6;6.1;2.5;"virginica"
|
||||
"111";6.5;3.2;5.1;2;"virginica"
|
||||
"112";6.4;2.7;5.3;1.9;"virginica"
|
||||
"113";6.8;3;5.5;2.1;"virginica"
|
||||
"114";5.7;2.5;5;2;"virginica"
|
||||
"115";5.8;2.8;5.1;2.4;"virginica"
|
||||
"116";6.4;3.2;5.3;2.3;"virginica"
|
||||
"117";6.5;3;5.5;1.8;"virginica"
|
||||
"118";7.7;3.8;6.7;2.2;"virginica"
|
||||
"119";7.7;2.6;6.9;2.3;"virginica"
|
||||
"120";6;2.2;5;1.5;"virginica"
|
||||
"121";6.9;3.2;5.7;2.3;"virginica"
|
||||
"122";5.6;2.8;4.9;2;"virginica"
|
||||
"123";7.7;2.8;6.7;2;"virginica"
|
||||
"124";6.3;2.7;4.9;1.8;"virginica"
|
||||
"125";6.7;3.3;5.7;2.1;"virginica"
|
||||
"126";7.2;3.2;6;1.8;"virginica"
|
||||
"127";6.2;2.8;4.8;1.8;"virginica"
|
||||
"128";6.1;3;4.9;1.8;"virginica"
|
||||
"129";6.4;2.8;5.6;2.1;"virginica"
|
||||
"130";7.2;3;5.8;1.6;"virginica"
|
||||
"131";7.4;2.8;6.1;1.9;"virginica"
|
||||
"132";7.9;3.8;6.4;2;"virginica"
|
||||
"133";6.4;2.8;5.6;2.2;"virginica"
|
||||
"134";6.3;2.8;5.1;1.5;"virginica"
|
||||
"135";6.1;2.6;5.6;1.4;"virginica"
|
||||
"136";7.7;3;6.1;2.3;"virginica"
|
||||
"137";6.3;3.4;5.6;2.4;"virginica"
|
||||
"138";6.4;3.1;5.5;1.8;"virginica"
|
||||
"139";6;3;4.8;1.8;"virginica"
|
||||
"140";6.9;3.1;5.4;2.1;"virginica"
|
||||
"141";6.7;3.1;5.6;2.4;"virginica"
|
||||
"142";6.9;3.1;5.1;2.3;"virginica"
|
||||
"143";5.8;2.7;5.1;1.9;"virginica"
|
||||
"144";6.8;3.2;5.9;2.3;"virginica"
|
||||
"145";6.7;3.3;5.7;2.5;"virginica"
|
||||
"146";6.7;3;5.2;2.3;"virginica"
|
||||
"147";6.3;2.5;5;1.9;"virginica"
|
||||
"148";6.5;3;5.2;2;"virginica"
|
||||
"149";6.2;3.4;5.4;2.3;"virginica"
|
||||
"150";5.9;3;5.1;1.8;"virginica"
|
||||
|
|
@ -1,172 +0,0 @@
|
|||
#Reading data file.
|
||||
dat <- read.table("GSE60450_Lactation-GenewiseCounts.txt")
|
||||
|
||||
#Let us examine how our data looks.
|
||||
View(dat)
|
||||
|
||||
#Seems like all data points are there. Can we improve appearance?
|
||||
#Examine the details of read.table command.
|
||||
?read.table
|
||||
|
||||
#Looks like we can inform read.table about separator type and presence of header.
|
||||
dat <- read.table("GSE60450_Lactation-GenewiseCounts.txt",
|
||||
header= TRUE, sep = "\t")
|
||||
|
||||
#Let us examine how data looks now.
|
||||
View(dat)
|
||||
|
||||
#What are the observations represented in our data?
|
||||
#colnames gives the names of columns of data.
|
||||
colnames(dat)
|
||||
|
||||
#To check the number of rows and columns in table.
|
||||
dim(dat)
|
||||
|
||||
#To check first few rows of table.
|
||||
head(dat)
|
||||
|
||||
#To check last few rows of table.
|
||||
tail(dat)
|
||||
|
||||
#To check basic stats for each column.
|
||||
summary(dat)
|
||||
|
||||
#Let us extract a column of data.
|
||||
#For example, EntrezGeneID.
|
||||
geneIds <- dat$EntrezGeneID
|
||||
|
||||
#Check what kind of variable geneIds is.
|
||||
class(geneIds)
|
||||
|
||||
#geneIds should be string type.
|
||||
dat$EntrezGeneID <- as.character(dat$EntrezGeneID)
|
||||
|
||||
#Check the class of gene ids again
|
||||
class(dat$EntrezGeneID)
|
||||
|
||||
#Information about samples is in another file.
|
||||
phenotype_info <- read.table("targets.csv",
|
||||
header = TRUE,
|
||||
sep = ",")
|
||||
|
||||
#The column named GEO in the table represents sample id on GEO.
|
||||
#Status and CellType are factor levels for statistical analysis.
|
||||
phenotype_info$CellType <- as.factor(phenotype_info$CellType)
|
||||
|
||||
#Check class of CellType. It is a factor variable now.
|
||||
class(phenotype_info$CellType)
|
||||
|
||||
#Currently Status and CellType has repetition of the same values.
|
||||
#Get unique values.
|
||||
celltypes <- unique(phenotype_info$CellType)
|
||||
|
||||
#Perhaps, no point in keeping spcs as factor in the above object.
|
||||
#Convert celltypes to character variable.
|
||||
celltypes <- as.character(celltypes)
|
||||
|
||||
#Checking if a text is present in a character variable?
|
||||
"cardiomyocyte" %in% celltypes
|
||||
|
||||
#Let us say we want subset of data corresponding to B cells.
|
||||
which_B <- phenotype_info$CellType == "B"
|
||||
phenotype_info_B <- phenotype_info[which_B, ]
|
||||
|
||||
#Class of which_B? Logical
|
||||
class(which_B)
|
||||
|
||||
#Alternative way to subset data.
|
||||
phenotype_info_B <- subset(phenotype_info,
|
||||
CellType == "B")
|
||||
|
||||
#Check mean counts for a sample.
|
||||
mean(dat$MCL1.DG_BC2CTUACXX_ACTTGA_L002_R1)
|
||||
|
||||
#Check mean counts for all genes in the B cell samples.
|
||||
clnames_dat <- colnames(dat)[-2:-1]
|
||||
clnames_dat <- strsplit(clnames_dat, split = "_")
|
||||
clnames_dat <- data.frame(clnames_dat)
|
||||
clnames_dat <- t(clnames_dat)
|
||||
rownames(clnames_dat) <- NULL
|
||||
clnames_dat <- clnames_dat[, 1]
|
||||
which_B <- which(clnames_dat %in% phenotype_info_B$X)
|
||||
cnts_B <- dat[, which_B + 2]
|
||||
|
||||
#Estimate median counts for casein protein.
|
||||
median(unlist(dat[dat$EntrezGeneID == "12992", -2:-1]))
|
||||
median(unlist(dat[dat$EntrezGeneID == "12992", which_B + 2]))
|
||||
|
||||
#Estimate standard deviation.
|
||||
sd(unlist(dat[dat$EntrezGeneID == "12992", -2:-1]))
|
||||
sd(unlist(dat[dat$EntrezGeneID == "12992", which_B + 2]))
|
||||
|
||||
#Check histograms of data.
|
||||
hist(log2(1 + dat$MCL1.DG_BC2CTUACXX_ACTTGA_L002_R1))
|
||||
hist(log2(1 + dat$MCL1.LC_BC2CTUACXX_GCCAAT_L001_R1))
|
||||
|
||||
#These histograms are not easy to compare.
|
||||
#Perhaps, we can fix the axis limits.
|
||||
hist(log2(1 + dat$MCL1.DG_BC2CTUACXX_ACTTGA_L002_R1),
|
||||
xlim = c(0, 20), ylim = c(0, 15000))
|
||||
hist(log2(1 + dat$MCL1.LC_BC2CTUACXX_GCCAAT_L001_R1),
|
||||
xlim = c(0, 20), ylim = c(0, 15000))
|
||||
|
||||
#Let us look at boxplots.
|
||||
boxplot(log2(1 + dat$MCL1.DG_BC2CTUACXX_ACTTGA_L002_R1),
|
||||
ylim = c(0, 25))
|
||||
boxplot(log2(1 + dat$MCL1.LC_BC2CTUACXX_GCCAAT_L001_R1),
|
||||
ylim = c(0, 25))
|
||||
|
||||
#Scatter plots.
|
||||
plot(x= log2(1 + dat$MCL1.DG_BC2CTUACXX_ACTTGA_L002_R1),
|
||||
y= log2(1 + dat$MCL1.LC_BC2CTUACXX_GCCAAT_L001_R1))
|
||||
#Are higher counts associated with longer genes?
|
||||
#Can we color data points based on length?
|
||||
|
||||
#install.packages("ggplot2")
|
||||
library(ggplot2)
|
||||
qplot(x = log2(1 + MCL1.DG_BC2CTUACXX_ACTTGA_L002_R1),
|
||||
y = log2(1 + MCL1.LC_BC2CTUACXX_GCCAAT_L001_R1),
|
||||
data = dat[1:1000, ], color = log10(Length))
|
||||
|
||||
#Color scale does not give clear insight.
|
||||
#Can we use discrete color scale of points instead?
|
||||
dat_subset <- dat[1:1000, ]
|
||||
dat_subset$genetype <- "smallGene"
|
||||
dat_subset$genetype[dat_subset$Length > median(dat_subset$Length)] <- "longGene"
|
||||
qplot(x = log2(1 + MCL1.DG_BC2CTUACXX_ACTTGA_L002_R1),
|
||||
y = log2(1 + MCL1.LC_BC2CTUACXX_GCCAAT_L001_R1),
|
||||
data = dat_subset, color = genetype)
|
||||
|
||||
#Check the boxplots for long and small genes simultaneously.
|
||||
qplot(x = genetype,
|
||||
y = log2(1 + MCL1.LC_BC2CTUACXX_GCCAAT_L001_R1),
|
||||
data = dat_subset, geom = "boxplot")
|
||||
|
||||
#Additional stuff.
|
||||
#Hypothesis test.
|
||||
#Are counts for long genes significantly different from the small genes?
|
||||
dat_small <- subset(dat_subset, genetype == "smallGene")
|
||||
dat_long <- subset(dat_subset, genetype == "longGene")
|
||||
test <- t.test(dat_small$MCL1.LC_BC2CTUACXX_GCCAAT_L001_R1,
|
||||
dat_long$MCL1.LC_BC2CTUACXX_GCCAAT_L001_R1)
|
||||
test$p.value
|
||||
|
||||
#Conditional statements take a condition and perform steps depending on validitiy of the statement.
|
||||
if (test$p.value < 0.05) {
|
||||
print("Counts depend on gene length.")
|
||||
} else {
|
||||
print("Counts don't depend on gene length.")
|
||||
}
|
||||
|
||||
#Looping for repeating the same task but on different data.
|
||||
for (item in c("smallGene", "longGene")) {
|
||||
y <- subset(dat_subset, genetype == item)
|
||||
smry <- summary(y)
|
||||
print(item)
|
||||
print(smry)
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
@ -1,13 +0,0 @@
|
|||
,GEO,SRA,CellType,Status
|
||||
MCL1.DG,GSM1480297,SRR1552450,B,virgin
|
||||
MCL1.DH,GSM1480298,SRR1552451,B,virgin
|
||||
MCL1.DI,GSM1480299,SRR1552452,B,pregnant
|
||||
MCL1.DJ,GSM1480300,SRR1552453,B,pregnant
|
||||
MCL1.DK,GSM1480301,SRR1552454,B,lactating
|
||||
MCL1.DL,GSM1480302,SRR1552455,B,lactating
|
||||
MCL1.LA,GSM1480291,SRR1552444,L,virgin
|
||||
MCL1.LB,GSM1480292,SRR1552445,L,virgin
|
||||
MCL1.LC,GSM1480293,SRR1552446,L,pregnant
|
||||
MCL1.LD,GSM1480294,SRR1552447,L,pregnant
|
||||
MCL1.LE,GSM1480295,SRR1552448,L,lactating
|
||||
MCL1.LF,GSM1480296,SRR1552449,L,lactating
|
||||
|
132
intro-r-data-analysis/style.css
Normal file
|
|
@ -0,0 +1,132 @@
|
|||
|
||||
/* Set font size and alignment for the message */
|
||||
.bottom-message {
|
||||
font-size: 0.8em !important;
|
||||
font-style: italic !important;
|
||||
text-align: center;
|
||||
position: relative;
|
||||
bottom: 0 !important;
|
||||
left: 0;
|
||||
right: 0;
|
||||
}
|
||||
|
||||
/* Background color for code chunks */
|
||||
.reveal pre code {
|
||||
background-color: #d5d5d5 !important;
|
||||
color: #333 !important;
|
||||
font-size: 1.5em !important;
|
||||
}
|
||||
/* Left-align all code outputs */
|
||||
.reveal pre code {
|
||||
text-align: left !important;
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
/* Add horizontal scrolling to all code outputs */
|
||||
.reveal pre code.output {
|
||||
white-space: pre !important;
|
||||
overflow-x: auto !important;
|
||||
}
|
||||
/* Change the font family used for code blocks */
|
||||
pre, code, kbd, samp {
|
||||
font-family: "Courier New", Courier, monospace;
|
||||
}
|
||||
|
||||
|
||||
/* Add horizontal scrolling to all code chunks */
|
||||
.reveal pre code {
|
||||
white-space: pre !important;
|
||||
overflow-x: auto !important;
|
||||
}
|
||||
/* Bold slide titles and change color */
|
||||
.reveal h2 {
|
||||
font-weight: bold !important;
|
||||
color: #9c0366;
|
||||
}
|
||||
/* Bold slide titles and change color */
|
||||
.reveal h1 {
|
||||
font-weight: bold !important;
|
||||
color: #9c0366;
|
||||
}
|
||||
.reveal .slides>section:first-child h2 {
|
||||
color: #333;
|
||||
font-weight: normal !important;
|
||||
}
|
||||
/* Custom slide title */
|
||||
.my-title-slide h1 {
|
||||
font-weight: bold;
|
||||
color: #9c0366;
|
||||
}
|
||||
.my-title-slide h2 {
|
||||
color: #333;
|
||||
font-weight: normal !important;
|
||||
}
|
||||
|
||||
|
||||
|
||||
.reveal .slides>section:first-child h1 {
|
||||
font-weight: bold !important;
|
||||
color: #9c0366;
|
||||
}
|
||||
|
||||
/* Increase the spacing between list items */
|
||||
.reveal p {
|
||||
text-align: left;
|
||||
margin-left: 20px !important;
|
||||
}
|
||||
.reveal ul {
|
||||
display: block;
|
||||
margin-left: 75px !important;
|
||||
margin-right: 50px !important;
|
||||
}
|
||||
.reveal ol {
|
||||
display: block;
|
||||
margin-left: 75px !important;
|
||||
margin-right: 50px !important;
|
||||
}
|
||||
/* Increase the spacing between unordered list items */
|
||||
.reveal ul li {
|
||||
margin-bottom: 15px;
|
||||
}
|
||||
/* Increase the spacing between ordered list items */
|
||||
.reveal ol li {
|
||||
margin-bottom: 15px;
|
||||
}
|
||||
|
||||
|
||||
/* Make images fit in the slides, remove border, shadow and center align*/
|
||||
.reveal img {
|
||||
max-width: 100% !important;
|
||||
max-height: 100% !important;
|
||||
border: none !important;
|
||||
box-shadow: none !important;
|
||||
display: block !important;
|
||||
margin: 0 auto !important;
|
||||
margin-bottom: 10px !important;
|
||||
}
|
||||
|
||||
/* Make a class for images that should be shrunk a little*/
|
||||
.shrink {
|
||||
max-width: 70% !important;
|
||||
max-height: 70% !important;
|
||||
margin: 0 auto !important;
|
||||
}
|
||||
|
||||
small {
|
||||
font-size: 70%;
|
||||
}
|
||||
|
||||
.small-bullets ul {
|
||||
font-size: 70%;
|
||||
}
|
||||
|
||||
.less-small-bullets ul {
|
||||
font-size: 80%;
|
||||
}
|
||||
|
||||
.big-picture img{
|
||||
max-width: 70%;
|
||||
border: 1px solid black !important;
|
||||
}
|
||||
273
intro-r-data-analysis/xaringan-themer.css
Normal file
|
|
@ -0,0 +1,273 @@
|
|||
/* -------------------------------------------------------
|
||||
*
|
||||
* !! This file was generated by xaringanthemer !!
|
||||
*
|
||||
* Changes made to this file directly will be overwritten
|
||||
* if you used xaringanthemer in your xaringan slides Rmd
|
||||
*
|
||||
* Issues or likes?
|
||||
* - https://github.com/gadenbuie/xaringanthemer
|
||||
* - https://www.garrickadenbuie.com
|
||||
*
|
||||
* Need help? Try:
|
||||
* - vignette(package = "xaringanthemer")
|
||||
* - ?xaringanthemer::style_xaringan
|
||||
* - xaringan wiki: https://github.com/yihui/xaringan/wiki
|
||||
* - remarkjs wiki: https://github.com/gnab/remark/wiki
|
||||
*
|
||||
* Version: 0.4.2
|
||||
*
|
||||
* ------------------------------------------------------- */
|
||||
@import url(https://fonts.googleapis.com/css?family=Noto+Sans:400,400i,700,700i&display=swap);
|
||||
@import url(https://fonts.googleapis.com/css?family=Cabin:600,600i&display=swap);
|
||||
@import url(https://fonts.googleapis.com/css?family=Source+Code+Pro:400,700&display=swap);
|
||||
|
||||
|
||||
:root {
|
||||
/* Fonts */
|
||||
--text-font-family: 'Noto Sans';
|
||||
--text-font-is-google: 1;
|
||||
--text-font-family-fallback: -apple-system, BlinkMacSystemFont, avenir next, avenir, helvetica neue, helvetica, Ubuntu, roboto, noto, segoe ui, arial;
|
||||
--text-font-base: sans-serif;
|
||||
--header-font-family: Cabin;
|
||||
--header-font-is-google: 1;
|
||||
--header-font-family-fallback: Georgia, serif;
|
||||
--code-font-family: 'Source Code Pro';
|
||||
--code-font-is-google: 1;
|
||||
--base-font-size: 20px;
|
||||
--text-font-size: 1rem;
|
||||
--code-font-size: 0.9rem;
|
||||
--code-inline-font-size: 1em;
|
||||
--header-h1-font-size: 2.75rem;
|
||||
--header-h2-font-size: 2.25rem;
|
||||
--header-h3-font-size: 1.75rem;
|
||||
|
||||
/* Colors */
|
||||
--text-color: #839496;
|
||||
--header-color: #dc322f;
|
||||
--background-color: #002b36;
|
||||
--link-color: #b58900;
|
||||
--text-bold-color: #d33682;
|
||||
--code-highlight-color: #268bd240;
|
||||
--inverse-text-color: #002b36;
|
||||
--inverse-background-color: #fdf6e3;
|
||||
--inverse-header-color: #002b36;
|
||||
--inverse-link-color: #b58900;
|
||||
--title-slide-background-color: #fdf6e3;
|
||||
--title-slide-text-color: #002b36;
|
||||
--header-background-color: #dc322f;
|
||||
--header-background-text-color: #002b36;
|
||||
}
|
||||
|
||||
html {
|
||||
font-size: var(--base-font-size);
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: var(--text-font-family), var(--text-font-family-fallback), var(--text-font-base);
|
||||
font-weight: normal;
|
||||
color: var(--text-color);
|
||||
}
|
||||
h1, h2, h3 {
|
||||
font-family: var(--header-font-family), var(--header-font-family-fallback);
|
||||
font-weight: 600;
|
||||
color: var(--header-color);
|
||||
}
|
||||
.remark-slide-content {
|
||||
background-color: var(--background-color);
|
||||
font-size: 1rem;
|
||||
padding: 16px 64px 16px 64px;
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
}
|
||||
.remark-slide-content h1 {
|
||||
font-size: var(--header-h1-font-size);
|
||||
}
|
||||
.remark-slide-content h2 {
|
||||
font-size: var(--header-h2-font-size);
|
||||
}
|
||||
.remark-slide-content h3 {
|
||||
font-size: var(--header-h3-font-size);
|
||||
}
|
||||
.remark-code, .remark-inline-code {
|
||||
font-family: var(--code-font-family), Menlo, Consolas, Monaco, Liberation Mono, Lucida Console, monospace;
|
||||
}
|
||||
.remark-code {
|
||||
font-size: var(--code-font-size);
|
||||
}
|
||||
.remark-inline-code {
|
||||
font-size: var(--code-inline-font-size);
|
||||
color: #6c71c4;
|
||||
}
|
||||
.remark-slide-number {
|
||||
color: #586e75;
|
||||
opacity: 1;
|
||||
font-size: 0.9rem;
|
||||
}
|
||||
strong {
|
||||
font-weight: bold;
|
||||
color: var(--text-bold-color);
|
||||
}
|
||||
a, a > code {
|
||||
color: var(--link-color);
|
||||
text-decoration: none;
|
||||
}
|
||||
.footnote {
|
||||
position: absolute;
|
||||
bottom: 60px;
|
||||
padding-right: 4em;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
.remark-code-line-highlighted {
|
||||
background-color: var(--code-highlight-color);
|
||||
}
|
||||
.inverse {
|
||||
background-color: var(--inverse-background-color);
|
||||
color: var(--inverse-text-color);
|
||||
|
||||
}
|
||||
.inverse h1, .inverse h2, .inverse h3 {
|
||||
color: var(--inverse-header-color);
|
||||
}
|
||||
.inverse a, .inverse a > code {
|
||||
color: var(--inverse-link-color);
|
||||
}
|
||||
.title-slide, .title-slide h1, .title-slide h2, .title-slide h3 {
|
||||
color: var(--title-slide-text-color);
|
||||
}
|
||||
.title-slide {
|
||||
background-color: var(--title-slide-background-color);
|
||||
}
|
||||
.title-slide .remark-slide-number {
|
||||
display: none;
|
||||
}
|
||||
/* Two-column layout */
|
||||
.left-column {
|
||||
width: 20%;
|
||||
height: 92%;
|
||||
float: left;
|
||||
}
|
||||
.left-column h2, .left-column h3 {
|
||||
color: #586e75;
|
||||
}
|
||||
.left-column h2:last-of-type, .left-column h3:last-child {
|
||||
color: #93a1a1;
|
||||
}
|
||||
.right-column {
|
||||
width: 75%;
|
||||
float: right;
|
||||
padding-top: 1em;
|
||||
}
|
||||
.pull-left {
|
||||
float: left;
|
||||
width: 47%;
|
||||
}
|
||||
.pull-right {
|
||||
float: right;
|
||||
width: 47%;
|
||||
}
|
||||
.pull-right + * {
|
||||
clear: both;
|
||||
}
|
||||
img, video, iframe {
|
||||
max-width: 100%;
|
||||
}
|
||||
blockquote {
|
||||
border-left: solid 5px #cb4b16;
|
||||
padding-left: 1em;
|
||||
}
|
||||
.remark-slide table {
|
||||
margin: auto;
|
||||
border-top: 1px solid #657b83;
|
||||
border-bottom: 1px solid #657b83;
|
||||
}
|
||||
.remark-slide table thead th {
|
||||
border-bottom: 1px solid #657b83;
|
||||
}
|
||||
th, td {
|
||||
padding: 5px;
|
||||
}
|
||||
.remark-slide table:not(.table-unshaded) thead,
|
||||
.remark-slide table:not(.table-unshaded) tfoot,
|
||||
.remark-slide table:not(.table-unshaded) tr:nth-child(even) {
|
||||
background: #073642;
|
||||
}
|
||||
table.dataTable tbody {
|
||||
background-color: var(--background-color);
|
||||
color: var(--text-color);
|
||||
}
|
||||
table.dataTable.display tbody tr.odd {
|
||||
background-color: var(--background-color);
|
||||
}
|
||||
table.dataTable.display tbody tr.even {
|
||||
background-color: #073642;
|
||||
}
|
||||
table.dataTable.hover tbody tr:hover, table.dataTable.display tbody tr:hover {
|
||||
background-color: rgba(255, 255, 255, 0.5);
|
||||
}
|
||||
.dataTables_wrapper .dataTables_length, .dataTables_wrapper .dataTables_filter, .dataTables_wrapper .dataTables_info, .dataTables_wrapper .dataTables_processing, .dataTables_wrapper .dataTables_paginate {
|
||||
color: var(--text-color);
|
||||
}
|
||||
.dataTables_wrapper .dataTables_paginate .paginate_button {
|
||||
color: var(--text-color) !important;
|
||||
}
|
||||
|
||||
/* Horizontal alignment of code blocks */
|
||||
.remark-slide-content.left pre,
|
||||
.remark-slide-content.center pre,
|
||||
.remark-slide-content.right pre {
|
||||
text-align: start;
|
||||
width: max-content;
|
||||
max-width: 100%;
|
||||
}
|
||||
.remark-slide-content.left pre,
|
||||
.remark-slide-content.right pre {
|
||||
min-width: 50%;
|
||||
min-width: min(40ch, 100%);
|
||||
}
|
||||
.remark-slide-content.center pre {
|
||||
min-width: 66%;
|
||||
min-width: min(50ch, 100%);
|
||||
}
|
||||
.remark-slide-content.left pre {
|
||||
margin-left: unset;
|
||||
margin-right: auto;
|
||||
}
|
||||
.remark-slide-content.center pre {
|
||||
margin-left: auto;
|
||||
margin-right: auto;
|
||||
}
|
||||
.remark-slide-content.right pre {
|
||||
margin-left: auto;
|
||||
margin-right: unset;
|
||||
}
|
||||
|
||||
/* Slide Header Background for h1 elements */
|
||||
.remark-slide-content.header_background > h1 {
|
||||
display: block;
|
||||
position: absolute;
|
||||
top: 0;
|
||||
left: 0;
|
||||
width: 100%;
|
||||
background: var(--header-background-color);
|
||||
color: var(--header-background-text-color);
|
||||
padding: 2rem 64px 1.5rem 64px;
|
||||
margin-top: 0;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
.remark-slide-content.header_background {
|
||||
padding-top: 7rem;
|
||||
}
|
||||
|
||||
@page { margin: 0; }
|
||||
@media print {
|
||||
.remark-slide-scaler {
|
||||
width: 100% !important;
|
||||
height: 100% !important;
|
||||
transform: scale(1) !important;
|
||||
top: 0 !important;
|
||||
left: 0 !important;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||