diff --git a/docs/Intro_to_R_data_analysis_part_1.html b/docs/Intro_to_R_data_analysis_part_1.html index a1a4c8f..67bdaa8 100644 --- a/docs/Intro_to_R_data_analysis_part_1.html +++ b/docs/Intro_to_R_data_analysis_part_1.html @@ -2572,7 +2572,7 @@ class CountdownTimer {

Introduction to R Data Analysis

Part 1

Natalie Elphick

-

August 26th, 2024

+

November 11th, 2024

@@ -2585,8 +2585,6 @@ class CountdownTimer {

Introductions

Natalie Elphick
Bioinformatician I

-

Min-Gyoung Shin
-Bioinformatician III

Poll 1

@@ -2858,7 +2856,6 @@ one style of names
  • Logical
  • Character
  • +
    +

    Missing Values

    + +
    NA + 1
    +
    [1] NA
    +

    Poll 3

    Which of these is not the correct data type for the value?

      -
    1. 1.5 - Numeric
    2. -
    3. “1” - Character
    4. -
    5. NA - Logical
    6. -
    7. 1 - Integer
    8. +
    9. “1.5” - Numeric
    10. +
    11. “A” - Character
    12. +
    13. 1L - Integer
    14. +
    15. TRUE - Boolean
    @@ -2915,7 +2921,7 @@ types/structures (ex. nested lists)

    10 min break

    -
    +
    10:00
    @@ -2934,44 +2940,44 @@ types/structures (ex. nested lists) mean( ), median( ), mode( ) )
  • Relational comparison operators to compare values
  • -
    x == y  # Equal to
    -x != y  # Not equal to
    -x <  y  # Less than
    -x > y   # Greater than
    -x <= y  # Less than or equal to
    -x >= y  # Greater than or equal to
    -
    -x %in% y # Is x in this vector y?
    -
    -
    -

    Poll 4

    -

    What is the output of the following code?

    -
    4 %in% c(1, 2, 3, 4)
    -
      -
    1. TRUE
    2. -
    3. FALSE
    4. -
    5. NA
    6. -
    +
    x == y  # Equal to
    +x != y  # Not equal to
    +x <  y  # Less than
    +x > y   # Greater than
    +x <= y  # Less than or equal to
    +x >= y  # Greater than or equal to
    +
    +x %in% y # Is x in this vector y?

    Logical Operators

    -
    x <- TRUE
    -y <- FALSE
    -
    -!x     # Not x
    -x | y  # x or y
    -x & y  # x and y
    +
    x <- TRUE
    +y <- FALSE
    +
    +!x     # Not x
    +x | y  # x or y
    +x & y  # x and y
    +
    +
    +

    Poll 4

    +

    What is the output of the following code?

    +
    2 + 2 == 4 & 8 + 10 < 20
    +
      +
    1. TRUE
    2. +
    3. FALSE
    4. +
    5. NA
    6. +

    Poll 5

    What is the output of the following code?

    -
    x <- TRUE
    -y <- FALSE
    -
    -x & !y
    +
    x <- TRUE
    +y <- FALSE
    +
    +y | (y | x)
    1. TRUE
    2. FALSE
    3. @@ -2984,17 +2990,17 @@ mean( ), median( ), mode( ) )
    4. Relational and logical operations allow for conditional execution of code
    5. -
      dog_breeds <- c("Labrador Retriever", "Akita", "Bulldog")
      -
      -if ("Akita" %in% dog_breeds) {
      -  
      -  print("dog_breeds already contains Akita")
      -  
      -} else {
      -  
      -  dog_breeds <- c("Akita", dog_breeds)
      -  
      -}
      +
      dog_breeds <- c("Labrador Retriever", "Akita", "Bulldog")
      +
      +if ("Akita" %in% dog_breeds) {
      +  
      +  print("dog_breeds already contains Akita")
      +  
      +} else {
      +  
      +  dog_breeds <- c("Akita", dog_breeds)
      +  
      +}
      [1] "dog_breeds already contains Akita"
    @@ -3019,39 +3025,39 @@ R functions
  • To define a function we use the function keyword, the output is specified with the return function:
  • -
    add_dog <- function(dog_to_add, input_vector) {
    -  if (dog_to_add %in% input_vector) {
    -    
    -    print("Already contains this dog")
    -    
    -  } else {
    -    
    -    output <- c(dog_to_add, input_vector)
    -    return(output)
    -    
    -  }
    -}
    +
    add_dog <- function(dog_to_add, input_vector) {
    +  if (dog_to_add %in% input_vector) {
    +    
    +    print("Already contains this dog")
    +    
    +  } else {
    +    
    +    output <- c(dog_to_add, input_vector)
    +    return(output)
    +    
    +  }
    +}

    Example

    -
    add_dog(dog_to_add = "Akita",
    -        input_vector = dog_breeds)
    -
    [1] "Already contains this dog"
    -
    add_dog(dog_to_add = "German Shepard",
    +
    add_dog(dog_to_add = "Akita",
             input_vector = dog_breeds)
    +
    [1] "Already contains this dog"
    +
    add_dog(dog_to_add = "German Shepard",
    +        input_vector = dog_breeds)
    [1] "German Shepard"     "Labrador Retriever" "Akita"             
     [4] "Bulldog"           

    Poll 6

    What does this function do?

    -
    mystery_function <- function(x) {
    -  if (x > 0) {
    -    return(x)
    -  } else {
    -    return(-x)
    -  }
    -}
    +
    mystery_function <- function(x) {
    +  if (x > 0) {
    +    return(x)
    +  } else {
    +    return(-x)
    +  }
    +}
    1. Returns the absolute value of x
    2. Returns x
    3. @@ -3070,7 +3076,7 @@ specified with the return function:
    4. Packages are collections of functions that are specialized to a specific task (plotting, data manipulation etc.)
    5. -
      library(ggplot2) # Makes all of the ggplot2 functions available
      +
      library(ggplot2) # Makes all of the ggplot2 functions available
      • The tidyverse is a collection of commonly used data analysis packages @@ -3110,15 +3116,15 @@ can continue to improve these workshops

    Upcoming Workshops

    -

    Intermediate -RNA-Seq Analysis Using R
    -September 10, 2024 9am-12pm PDT

    -

    Introduction -to Statistics, Experimental Design, and Hypothesis Testing
    -September 10 - September 12, 2024 1-3pm PDT

    -

    Single -Cell RNA-Seq Data Analysis
    -September 16-September 17, 2024 9am-4pm PDT

    +

    Introduction +to scATAC-seq Data Analysis
    +November 14 - November 15, 2024 1:00-4:00pm PST

    +

    Introduction +to Linear Mixed Effects Models
    +November 18-November 19, 2024 1:00-3:00pm PST

    +

    scATAC-seq +and scRNA-seq Data Integration
    +November 22, 2024 1:00-4:00pm PST

    @@ -3048,7 +3046,7 @@ modified by adding layers

    10 min break

    -
    +
    10:00
    @@ -4624,8 +4622,23 @@ reach out to the authors using their preferred method

    Additional Resources

    -
    -

    R

    +
    +

    Coding Templates

    +

    Code templates can be used to avoid typing the same code over and +over again.

    + +
    +
    +

    R Resources

    -
    -
    -

    Statistics

    - -
    -
    -

    RNA-seq Analysis

    - -
    -
    -

    Dimensional Reduction

    -
    @@ -4686,15 +4664,15 @@ can continue to improve these workshops

    Upcoming Workshops

    -

    Intermediate -RNA-Seq Analysis Using R
    -September 10, 2024 9am-12pm PDT

    -

    Introduction -to Statistics, Experimental Design, and Hypothesis Testing
    -September 10 - September 12, 2024 1-3pm PDT

    -

    Single -Cell RNA-Seq Data Analysis
    -September 16-September 17, 2024 9am-4pm PDT

    +

    Introduction +to scATAC-seq Data Analysis
    +November 14 - November 15, 2024 1:00-4:00pm PST

    +

    Introduction +to Linear Mixed Effects Models
    +November 18-November 19, 2024 1:00-3:00pm PST

    +

    scATAC-seq +and scRNA-seq Data Integration
    +November 22, 2024 1:00-4:00pm PST

    • Check this link at for the full schedule
    • diff --git a/intro-r-data-analysis/Intro_to_R_data_analysis_part_1.Rmd b/intro-r-data-analysis/Intro_to_R_data_analysis_part_1.Rmd index 5072fe8..cb8f4f6 100644 --- a/intro-r-data-analysis/Intro_to_R_data_analysis_part_1.Rmd +++ b/intro-r-data-analysis/Intro_to_R_data_analysis_part_1.Rmd @@ -2,7 +2,7 @@ title: "Introduction to R Data Analysis" subtitle: "Part 1" author: "Natalie Elphick" -date: "August 26th, 2024" +date: "November 11th, 2024" knit: (function(input, ...) { rmarkdown::render( input, @@ -29,8 +29,7 @@ knitr::opts_chunk$set(comment = "") **Natalie Elphick** Bioinformatician I -**Min-Gyoung Shin** -Bioinformatician III + ## Poll 1 @@ -246,19 +245,27 @@ DogBreeds <- c("Labrador Retriever", "Akita", "Bulldog") - Decimal numbers - Logical - Boolean (TRUE, FALSE) - - NA (missing data) - Character - Letters and strings of letters - "A", "Labrador Retriever" +## Missing Values +- R has a special data type - NA which represents missing data +- NAs can take the place of any type but by default are logical +```{r} +NA + 1 +``` + + ## Poll 3 **Which of these is not the correct data type for the value?** -1. 1.5 - Numeric -2. "1" - Character -3. NA - Logical -4. 1 - Integer +1. "1.5" - Numeric +2. "A" - Character +3. 1L - Integer +4. TRUE - Boolean + ## Data Structures @@ -320,17 +327,7 @@ x >= y # Greater than or equal to x %in% y # Is x in this vector y? ``` -## Poll 4 -**What is the output of the following code?** - -```{r, eval = FALSE} -4 %in% c(1, 2, 3, 4) -``` - -1. TRUE -2. FALSE -3. NA ## Logical Operators @@ -344,6 +341,18 @@ x | y # x or y x & y # x and y ``` +## Poll 4 + +**What is the output of the following code?** + +```{r, eval = FALSE} +2 + 2 == 4 & 8 + 10 < 20 +``` + +1. TRUE +2. FALSE +3. NA + ## Poll 5 **What is the output of the following code?** @@ -351,7 +360,7 @@ x & y # x and y x <- TRUE y <- FALSE -x & !y +y | (y | x) ``` 1. TRUE @@ -470,16 +479,17 @@ packages ## Upcoming Workshops -[Intermediate RNA-Seq Analysis Using R](https://gladstone.org/events/intermediate-rna-seq-analysis-using-r-5) -September 10, 2024 9am-12pm PDT +[Introduction to scATAC-seq Data Analysis](https://gladstone.org/events/introduction-scatac-seq-data-analysis) +November 14 - November 15, 2024 1:00-4:00pm PST -[Introduction to Statistics, Experimental Design, and Hypothesis Testing](https://gladstone.org/events/introduction-statistics-experimental-design-and-hypothesis-testing-1) -September 10 - September 12, 2024 1-3pm PDT +[Introduction to Linear Mixed Effects Models](https://gladstone.org/events/introduction-linear-mixed-effects-models-0) +November 18-November 19, 2024 1:00-3:00pm PST -[Single Cell RNA-Seq Data Analysis](https://gladstone.org/events/single-cell-rna-seq-data-analysis-0) -September 16-September 17, 2024 9am-4pm PDT +[scATAC-seq and scRNA-seq Data Integration](https://gladstone.org/events/scatac-seq-and-scrna-seq-data-integration) +November 22, 2024 1:00-4:00pm PST - Check [this link](https://gladstone.org/events?series=data-science-training-program) at for the full schedule + diff --git a/intro-r-data-analysis/Intro_to_R_data_analysis_part_2.Rmd b/intro-r-data-analysis/Intro_to_R_data_analysis_part_2.Rmd index 8c963c9..014d3f5 100644 --- a/intro-r-data-analysis/Intro_to_R_data_analysis_part_2.Rmd +++ b/intro-r-data-analysis/Intro_to_R_data_analysis_part_2.Rmd @@ -31,11 +31,8 @@ knitr::opts_chunk$set(comment = "") **Natalie Elphick** Bioinformatician I -**Michela Traglia** -Senior Statistician - -**Ayushi Agrawal** -Bioinformatician III +**Reuben Thomas** +Associate Core Director # Schedule @@ -281,30 +278,23 @@ For any bioinformatics specific questions feel free to reach out to the Gladston # Additional Resources +## Coding Templates -## R +Code templates can be used to avoid typing the same code over and over again. + +- These are templates that we are using to automate things like plot appearance and documentation: + - [.Rmd Template](https://www.dropbox.com/scl/fi/a9cnyqdajgabbfcxbmm6y/RMD_template.Rmd?rlkey=yntfpo6aptw9b4pgjyzpe5ubi&dl=1) + - [.R Script Template](https://www.dropbox.com/scl/fi/cy43b8b1x3nzn17esnmmt/Rscript_template.R?rlkey=zn7b0g8nn0s9213blh70fjjsx&dl=1) + + + +## R Resources - [R for Data Science](https://r4ds.hadley.nz/) - [Top 10 R Errors and How to Fix them](https://statsandr.com/blog/top-10-errors-in-r/) - [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/how-to-read-this-book.html) - [ggplot2: elegant graphics for data analysis](https://ggplot2-book.org/) - [Advanced R](https://adv-r.hadley.nz/) -## Statistics - -- [Data Analysis in R](https://bookdown.org/steve_midway/DAR) : This book has more statistics details than *R for Data Science* -- [Generalized Linear Models](https://bookdown.org/steve_midway/DAR/glms-generalized-linear-models.html)\ -- [Random Effects](https://bookdown.org/steve_midway/DAR/random-effects.html) - -## RNA-seq Analysis - -- [RNA-seqlopedia](https://rnaseq.uoregon.edu/) : Everything you need to know about RNA-seq experiments -- [RNA-seq Expression Units](https://luisvalesilva.com/datasimple/rna-seq_units.html) : Blog post on understanding common units -- [Introduction to Single-Cell Analysis with Bioconductor](https://bioconductor.org/books/3.17/OSCA.intro/index.html) : Covers the basics of scRNA-seq analysis in R - -## Dimensional Reduction - -- [Tutorial on PCA](https://uw.pressbooks.pub/appliedmultivariatestatistics/chapter/pca/) : PCA explained with R code examples -- [Understanding UMAP](https://pair-code.github.io/understanding-umap/) : Short explanation with great visualizations, mainly useful for scRNA-seq analysis @@ -315,14 +305,14 @@ For any bioinformatics specific questions feel free to reach out to the Gladston ## Upcoming Workshops -[Intermediate RNA-Seq Analysis Using R](https://gladstone.org/events/intermediate-rna-seq-analysis-using-r-5) -September 10, 2024 9am-12pm PDT +[Introduction to scATAC-seq Data Analysis](https://gladstone.org/events/introduction-scatac-seq-data-analysis) +November 14 - November 15, 2024 1:00-4:00pm PST -[Introduction to Statistics, Experimental Design, and Hypothesis Testing](https://gladstone.org/events/introduction-statistics-experimental-design-and-hypothesis-testing-1) -September 10 - September 12, 2024 1-3pm PDT +[Introduction to Linear Mixed Effects Models](https://gladstone.org/events/introduction-linear-mixed-effects-models-0) +November 18-November 19, 2024 1:00-3:00pm PST -[Single Cell RNA-Seq Data Analysis](https://gladstone.org/events/single-cell-rna-seq-data-analysis-0) -September 16-September 17, 2024 9am-4pm PDT +[scATAC-seq and scRNA-seq Data Integration](https://gladstone.org/events/scatac-seq-and-scrna-seq-data-integration) +November 22, 2024 1:00-4:00pm PST - Check [this link](https://gladstone.org/events?series=data-science-training-program) at for the full schedule diff --git a/intro-r-data-analysis/lesson_0/lesson_0.Rmd b/intro-r-data-analysis/lesson_0/lesson_0.Rmd index 8e2b9cf..e9822e4 100644 --- a/intro-r-data-analysis/lesson_0/lesson_0.Rmd +++ b/intro-r-data-analysis/lesson_0/lesson_0.Rmd @@ -23,7 +23,7 @@ learnr::tutorial_options(exercise.timelimit = 10) This guide will help you get set up for Intro to R Data Analysis. There are just a few steps to make sure you'll have the necessary software installed and ready to go on day 1. **Please ensure that you've completed each step by running the validation test prior to the start of the workshop**. -This guide will walk you through how to install R, RStudio, and some additional tools that we’ll be using in the course. By rough analogy to a car, R is like the car’s engine and RStudio is like the dashboard. More precisely, R is a programming language and Rstudio is an ‘integrated development environment’ (IDE), which is basically a nice software interface for interacting with R. For our purposes, you will only ever interact directly with RStudio, but it needs to have R installed to work (like a car needing its engine). +This guide will help you set up R, RStudio, and a few extra tools we'll use in this course. You can think of R as the engine that powers everything, while RStudio is like the dashboard that makes it easy to control. R is a programming language, and RStudio is a tool that helps you work with it. Even though you'll mainly use RStudio, it needs R to be installed to work, just like a car needs an engine to run. Please complete the following steps (must be done in this order). If you already have R and Rstudio installed you can skip ahead. Make sure you complete step 5 though! @@ -83,4 +83,4 @@ You should see a plot that looks like this appear: -If you see an error that says “R version is too old” that means you need to update your R version. The update process is the same as the installation process. It will update your R installation. If you see an error that says “There is no package called ggplot2” that means you need to install the tidyverse package (see above). +If you see an error that says “R version is too old” that means you need to update your R version. The update process is the same as the installation process. It will update your R installation. If you see an error that says “There is no package called ggplot2” that means you need to install the tidyverse package (see the *Install Required Packages* section).