Merge pull request #16 from gladstone-institutes/wynton_2024

Wynton 2024
2025-11-30 09:45:43 -08:00 · 2024-03-27 13:49:32 -07:00 · 2024-03-27 13:49:32 -07:00 · b0d59a2d98
commit b0d59a2d98
parent 4add4e266e 86a7de1077
17 changed files with 6495 additions and 1 deletions
--- a/docs/Working_on_Wynton_Part_1.html
+++ b/docs/Working_on_Wynton_Part_1.html
--- a/docs/Working_on_Wynton_Part_2.html
+++ b/docs/Working_on_Wynton_Part_2.html
--- a/working-on-wynton-hpc/.Rprofile
+++ b/working-on-wynton-hpc/.Rprofile
@ -0,0 +1 @@
+source("renv/activate.R")
--- a/working-on-wynton-hpc/README.md
+++ b/working-on-wynton-hpc/README.md
@ -2,4 +2,14 @@

 [Link to wiki](https://github.com/gladstone-institutes/Bioinformatics-Workshops/wiki/Working-on-Wynton-HPC)

-### Description of files
+## Structure
+
+- `working-on-wynton-hpc.Rproj` : Used to manage the `renv` for the workshop slides  
+  - To install the project env, open this file and run `renv::restore()` in the console   
+
+- `./slide_materials` : Contains any images/assets needed for the slides.   
+- `./workshop_materials` : Contains any materials needed by the registrants (switch to using DropBox if the size of this becomes too large)
+- `./renv` : Used by `renv` to manage project library, do not modfiy these files directly  
+- `./Working_on_Wynton_Part_1.Rmd` : revealjs based slides for part 1
+- `./Working_on_Wynton_Part_2.Rmd` : revealjs based slides for part 2
+- `./style.css` : CSS style sheet for both sets of revealjs slides
--- a/working-on-wynton-hpc/Working_on_Wynton_Part_1.Rmd
+++ b/working-on-wynton-hpc/Working_on_Wynton_Part_1.Rmd
@ -0,0 +1,479 @@
+---
+title: "Working on Wynton - Part 1"
+author: "Natalie Elphick"
+date: "April 15th, 2024"
+knit: (function(input, ...) {
+    rmarkdown::render(
+      input,
+      output_dir = "../docs"
+    )
+  })
+output:
+  revealjs::revealjs_presentation:
+    theme: simple
+    highlight: default
+    css: style.css
+---
+
+```{r, setup, include=FALSE}
+
+```
+
+## 
+
+<center>*Press the ? key for tips on navigating these slides*</center>
+
+## Introductions
+
+**Natalie Elphick**    
+Bioinformatician I   
+
+**Alex Pico**     
+Bioinformatics Core Director
+
+
+## Target Audience
+-   Prior experience with UNIX command-line 
+
+
+
+## Part 1:
+
+1. What is an HPC cluster? 
+2. Node Types and Loging in
+3. Storage
+4. Data Transfer
+5. Installing Software
+6. Containers
+
+
+
+# What is Wynton HPC?
+
+## High-performance Computing Cluster
+
+-   A collection of specialized computers (nodes) connected together on a fast local network   
+
+![HPC Diagram](slide_materials/HPC_diagram.png)
+
+
+
+## Wynton {.small-bullets}
+
+-   A HPC Linux environment available to all UCSF researchers for free   
+-   Uses the Rocky 8 linux OS
+-   Includes several hundred compute nodes and a large shared storage system ([Cluster specifications](https://wynton.ucsf.edu/hpc/about/specs.html))    
+-   Funded and administered cooperatively by UCSF campus IT and key research groups   
+
+[https://wynton.ucsf.edu](https://wynton.ucsf.edu)
+
+
+
+# Node Types and Logging in
+
+## Node Types {.small-bullets}
+
+-   **Login:** Submit and query jobs. SSH to development nodes. File management.
+-   **Development:** Compile and install software. Test job scripts. Submit and query jobs. Version control. File management.
+-   **Compute:** Running short and long-running job scripts.
+-   **Transfer:** Fast in- & outbound file transfers. File management.
+
+## The Login Nodes {.small-bullets}
+
+-   Only capable of basic tasks (file management, submitting and checking on jobs)
+-   Lacks access to pre-installed software tools that the development nodes have
+-   The primary method to log in is to use an SSH client application
+-   The Wynton HPC is up to date with information on logging in: [Access Cluster](https://wynton.ucsf.edu/hpc/get-started/access-cluster.html)
+
+<u>Names</u>:   
+    
+    
+log1, log2 and plog (for PHI users)
+
+## Login {.small-bullets}
+
+-   Make sure you are on the UCSF or Gladstone WiFi networks (or the respective VPN)
+-   **ssh [your-username]@[node-name].wynton.ucsf.edu**
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo "{local}$ ssh alice@log1.wynton.ucsf.edu
+alice@log1.wynton.ucsf.edu's password: 
+[alice@log1 ~]$" 
+```
+
+-   There will not be any visual feedback when typing your password
+
+
+## The Development Nodes {.small-bullets}
+
+-   Has a set of [core software](https://wynton.ucsf.edu/hpc/software/core-software.html) installed
+    - e.g. git, vim, nano, make and python
+-   Also has access to [software repositories](https://wynton.ucsf.edu/hpc/software/software-repositories.htmll) some which are maintained by other users or research groups
+    - e.g. matlab, R and openjdk
+-   Cannot be logged in to directly, only from a login node
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo "ssh dev1" 
+```
+
+<u>Names</u>:   
+    
+dev[1-3], gpudev1, pdev1 (PHI) and pgpudev1 (PHI)
+
+## Data Transfer Nodes {.small-bullets}
+
+-   Can be logged in to directly
+-   Fast network speed
+-   Limited software
+-   Use for transferring files to and from Wynton
+
+<u>Example</u>: 
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo "{local}$ scp local_file.tsv alice@dt1.wynton.ucsf.edu:~/" 
+```
+
+<u>Names</u>:   
+    
+dt1 and dt2
+
+
+
+## Compute Nodes {.small-bullets .big-picture}
+
+-   Can **not** be logged in to directly
+-   Used to run non-interactive compute job scripts
+-   The software to run the job script is provided using a container
+
+
+![Compute Jobs](slide_materials/compute_job_workflow.svg)
+
+
+
+# Storage
+
+
+## The File System {.small-bullets}
+
+-   A file system how information is stored and retrieved on a computer
+      -   Consists of files and directories
+-   A local file system is function of the operating system and only accessible from a single computer
+-   A shared file system is accessible from multiple computers
+
+
+
+
+
+
+## BeeGFS {.small-bullets}
+
+-   Wynton uses a *parallel* shared file system called BeeGFS
+    - The files are stored as "chunks" spread across many different servers
+-   BeeGFS has multiple services that work together to manage the file system
+    - Storage (stores the chunks)
+    - Metadata (tracks the chunks and information about their file)
+    - Management (tracks all of the services)
+    - Client (provides linux access to the file system)
+    
+
+## BeeGFS - Advantages 
+-   High throughput
+-   Redundancy can be built in by mirroring services
+-   Adding new storage is fast and does not require downtime
+
+## BeeGFS - Caveats 
+-   For any client node, performance is limited by the network bandwidth of that node
+-   Network latency becomes extremely important for all metadata requests
+-   Certain input/output patterns can be problematic
+
+## BeeGFS - I/O patterns {.small-bullets}
+-   Anything that requires lots of metadata operations can feel slow 
+    -   e.g: lots of writes to the same directory and lots of file lookups and directory searches (**conda**)
+-   Keep the number of reads and writes to a single directory to a reasonable number
+-   If using conda, putting the conda application inside a Apptainer (formerly singularity) container will result in better performance
+
+## BeeGFS - Tips 
+
+-   Prefer fewer, large files over many small ones
+-   Distribute reading and writing over several directories
+    -   Including compute job output and error files
+-   Use local scratch (/scratch) when possible
+-   Don't include anything in **/wynton** in your default LD_LIBRARY_PATH
+
+
+
+
+## Storage {.small-bullets}
+
+-   **Wynton storage is not backed up**
+-   /wynton/**[group_name]**/**[user]**
+    - User home directory - limited to 500 GiB
+-   /wynton/**[group_name]**
+    - User group directory - disk quota varies by group
+    - Use this directory for any analysis you want to share with your lab
+-   [More information on disk quotas](https://wynton.ucsf.edu/hpc/howto/storage-size.html#file-sizes-and-disk-quotas)
+
+To check your group disk quota run:
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo 'beegfs-ctl --getquota --storagepoolid=12 --gid "$(id --group)"'
+```
+
+
+
+## Scratch - Temporary Storage
+-   Local **/scratch** - 0.1-1.8 TiB/node storage unique to each compute node
+    - Can only be accessed from the specific compute node
+-   **/wynton/scratch** and **/wynton/protected/scratch** (for PHI users)
+    -   703 TiB storage accessible from everywhere
+-   No quotas
+    
+<br></br>    
+    
+**Files not used for 2 weeks are automatically deleted**
+
+
+
+## Gladstone HIVE
+-   Gladstone's HIVE storage server is mounted directly to Wynton under **/gladstone**
+    -   Only certain HIVE folders are accessible directly on Wynton
+    -   Files under **/gladstone** are backed up
+-   Naming: **/gladstone/[lab]**
+    -   Directories that are shared between multiple labs can be set up by contacting Gladstone IT   
+-   For more information visit the [IT knowledge base page](https://help.gladstone.org/support/solutions/articles/14000033963)
+
+
+## Storage Advice
+
+-   Always back up anything you store under **/wynton**
+-   Use **/gladstone** if you have access to it for all of your work since it is automatically backed up
+-   Use the scratch directories to store temporary files so they do not count against your group or user quotas
+
+
+# Data Transfer
+
+
+
+## Secure Copy - scp
+
+-   Local file to Wynton
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo "{local}$ scp /path/to/local_file.tsv alice@dt1.wynton.ucsf.edu:/destination/path"
+```
+
+-   Copy a directory to a folder on Wynton
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo "{local}$ scp -r local_folder/ alice@dt1.wynton.ucsf.edu:/destination/path"
+```
+
+-   Copy a single file to Wynton from your local machine
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo "{local}$ scp alice@dt1.wynton.ucsf.edu:/path/to/local_file.tsv /destination/path"
+```
+
+
+## GUI SFTP Clients {.small-bullets}
+
+-   These let you transfer files to and from Wynton using a GUI
+-   [2 factor authentication](https://wynton.ucsf.edu/hpc/get-started/duo-signup.html) may be required
+-   [Cyberduck](https://cyberduck.io/) 
+    -   Navigate to Preferences -> Transfers -> General
+    -   change the Transfer Files setting "Use browser connection" instead of "Open Multiple connections"
+
+-   [FileZilla](https://filezilla-project.org/) 
+    -   In the General tab, select ‘SFTP’ as the Protocol instead of ‘FTP’
+    -   For Logon Type, select ‘Interactive’ instead of ‘Ask for Password’
+    -   Under the Transfer Settings tab, you might need to click the ‘Limit number of simultaneous connections’ and make sure the ‘Maximum number of connections’ is set to 1
+
+
+## Globus
+
+-   [Globus](https://wynton.ucsf.edu/hpc/transfers/globus.html) is a service for moving, syncing, and sharing large amounts of data
+-   Wynton Accounts are not required to transfer data with Globus
+-   Useful for transferring data between institutions
+
+## Rclone
+
+-   Rclone is a command-line program to manage files on remote storage
+-   Can be used to transfer data from Wynton directly to [DropBox](https://rclone.org/dropbox/) or other storage systems (AWS, Azure, Google Drive etc.)
+    -   Do this from a data transfer node using screen/tmux
+-   Do not use rclone for transfers to Box, follow the [Wynton to UCSF Box](https://wynton.ucsf.edu/hpc/transfers/ucsf-box.html) instructions
+
+
+
+
+## Poll 1 
+
+Which of these can you **not** log in to from your computer?
+
+1.    Login Nodes
+2.    Development Nodes
+3.    Data transfer Nodes
+4.    Compute Nodes
+
+## Poll 2
+
+The **/wynton** directory is backed up on a nightly basis so do not need to back up the data you store here.
+
+1.    True
+2.    False
+
+
+
+
+# Installing Software
+
+## Basics
+
+-   Check if the tool is already available in a [module](https://wynton.ucsf.edu/hpc/software/software-repositories.html#software-repositories)
+-   Ensure the software you are trying to install is compatible with Rocky 8 linux (use a container if not)
+-   <u>Always install software in a development node</u>
+-   Download a precompiled binary or [install from source](https://wynton.ucsf.edu/hpc/howto/install-from-source.html)
+
+## Install Samtools from Source  {.small-list}
+
+1.    Download and extract source code
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '[alice@dev1 ~]$ mkdir -p "/scratch/$USER"
+[alice@dev1 ~]$ cd "/scratch/$USER"
+[alice@dev1 alice]$ wget https://github.com/samtools/samtools/releases/download/1.19.2/samtools-1.19.2.tar.bz2
+[alice@dev1 alice]$ tar -x -f samtools-1.19.2.tar.bz2'
+```
+
+2.    Create install location and configure
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '[alice@dev1 ~]$ mkdir -p $HOME/software/samtools-1.14'
+echo '[alice@dev1 ~]$ ./configure --prefix=$HOME/software/samtools-1.14'
+```
+3.    Build and install
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '[alice@dev1 ~]$ make'
+echo '[alice@dev1 ~]$ make install'
+```
+
+## Install Nextflow
+
+-   Scientific workflow system with a community maintained set of [core bioinformatics analysis](https://nf-co.re/) pipelines
+    -   We will cover an example RNA-seq pipeline in part 2   
+-   These can be configured to use the Wynton compute job submission system 
+
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '[alice@dev1 ~]$ cd ~/software'
+echo '[alice@dev1 ~]$ curl -s "https://get.sdkman.io" | bash'
+echo '[alice@dev1 ~]$ exit'
+echo '[alice@log1 ~]$ ssh dev1'
+echo '[alice@dev1 ~]$ sdk install java 17.0.6-tem'
+echo '[alice@dev1 ~]$ wget -qO- https://get.nextflow.io | bash'
+echo '[alice@dev1 ~]$ nextflow -v'
+```
+
+
+
+
+
+# Containers
+
+
+## Motivation {.small-bullets}
+
+-   Compute heavy jobs (high RAM, multiple cores) should be run on compute nodes
+-   Containers allow us to make additional software available to the compute nodes
+    -   Also allows the use of software that might be hard to install on Rocky 8 Linux
+    -   Improves reproducibility
+
+![Compute Jobs](slide_materials/compute_job_workflow.svg)
+
+
+## Definitions {.small-bullets}
+
+-   **Virtualization:** When software mimics the functions of physical hardware to run virtual machines
+-   **Containers:** Implements virtualization using an *image* as its base
+-   **Images:**  An ordered collection of root filesystem changes and the corresponding execution parameters for use within a container runtime
+
+
+## Apptainer {.small-bullets}
+
+-   Wynton supports [Apptainer](https://wynton.ucsf.edu/hpc/software/apptainer.html) (formerly singularity) containers 
+-   [Docker](https://docs.docker.com/) is a commonly used image creation software, these can be turned into apptainer image files (.sif) easily
+
+-   apptainer run <image_file>
+    -   Run predefined script within container
+-   apptainer exec <image_file> 
+    -   Execute any command within container
+-   apptainer shell <image_file>
+    -   Run bash shell within container
+
+
+## Example Container - Hello World
+
+-   Run this command to convert the public Docker image to a apptainer image file
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '[alice@dev1 ~]$ apptainer pull docker://natalie23gill/hello-world:1.0'
+```
+
+- Execute the "hi" command in the container
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '[alice@dev1 ~]$ apptainer exec hello-world_1.0.sif hi'
+```
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '    __  __     ____         _       __           __    __   __
+   / / / /__  / / /___     | |     / /___  _____/ /___/ /  / /
+  / /_/ / _ \/ / / __ \    | | /| / / __ \/ ___/ / __  /  / / 
+ / __  /  __/ / / /_/ /    | |/ |/ / /_/ / /  / / /_/ /  /_/  
+/_/ /_/\___/_/_/\____/     |__/|__/\____/_/  /_/\__,_/  (_) '
+```
+
+
+## Example Container
+
+-   This container has **figlet** installed which creates ASCII art from text input
+-   Try running this command to create your own using *exec*
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '[alice@dev1 ~]$ apptainer exec hello-world_1.0.sif figlet your_text'
+```
+
+## Docker {.small-bullets}
+
+-   Docker uses Dockerfiles to specify image creation
+-   Preferred by the Gladstone Bioinformatics Core to create new images
+-   In part 2, we will go over how to build custom container images from DockerFiles
+    -   If you want to follow along, [install the docker engine](https://docs.docker.com/engine/install/) following the instructions for your OS
+    -   Set up a free [DockerHub](https://hub.docker.com/) account to store your images
+-   To see the Dockerfile used to create the hello-world image, run:
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '[alice@dev1 ~]$ apptainer exec hello-world_1.0.sif cat /Dockerfile'
+```
+
+
+# End of Part 1
+
+## Thank You!
+
+-   Please take some time to fill out the workshop survey if you are not attending part 2:   
+[https://www.surveymonkey.com/r/F75J6VZ](https://www.surveymonkey.com/r/F75J6VZ)
+
+
+## Upcoming Data Science Training Program Workshops
+
+
+[Introduction to Linear Mixed Effects Models](https://gladstone.org/events/introduction-linear-mixed-effects-models)   
+April 25-April 26, 2024 1-3pm PDT
+
+[Single Cell RNA-Seq Data Analysis](https://gladstone.org/events/single-cell-rna-seq-data-analysis)     
+April 29-April 30, 2024 9am-4pm PDT
+
+[Single Cell ATAC-Seq Data Analysis Part 1](https://gladstone.org/events/single-cell-atac-seq-data-analysis-part-1-1)    
+May 6-May 7, 2024 1-4pm PDT
+
+
+[Complete Schedule](https://gladstone.org/events?series=189)     
+
+
--- a/working-on-wynton-hpc/Working_on_Wynton_Part_2.Rmd
+++ b/working-on-wynton-hpc/Working_on_Wynton_Part_2.Rmd
@ -0,0 +1,449 @@
+---
+title: "Working on Wynton - Part 2"
+author: "Natalie Elphick"
+date: "April 16th, 2024"
+knit: (function(input, ...) {
+    rmarkdown::render(
+      input,
+      output_dir = "../docs"
+    )
+  })
+output:
+  revealjs::revealjs_presentation:
+    theme: simple
+    highlight: default
+    css: style.css
+---
+
+```{r, setup, include=FALSE}
+
+```
+
+## 
+
+<center>*Press the ? key for tips on navigating these slides*</center>
+
+## Introductions
+
+**Natalie Elphick**    
+Bioinformatician I   
+
+**Alex Pico**     
+Bioinformatics Core Director
+
+
+## Target Audience
+-   Prior experience with UNIX command-line 
+
+
+
+## Part 2:
+
+1.    Custom Containers
+2.    Submitting Compute Jobs
+3.    Array Jobs
+4.    GPU Jobs
+5.    Running Pipelines
+6.    Jupyter Notebooks 
+7.    RStudio Server
+8.    How to get help
+
+
+
+
+# Custom Containers
+
+## Motivation {.small-bullets}
+
+-   Compute heavy jobs (high RAM, multiple cores) should be run on compute nodes
+-   Containers allow us to make additional software available to the compute nodes
+    -   Also allows the use of software that might be hard to install on Rocky 8 Linux
+    -   Improves reproducibility
+
+![Compute Jobs](slide_materials/compute_job_workflow.svg)
+
+
+
+
+## Dockerfile Basics
+
+-   Dockerfiles contain instructions to build an image in **layers**
+-   Layers are added using Dockerfile instruction syntax
+-   Images are built by navigating to the directory that contains the Dockerfile and running:
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo 'docker build .'
+```
+
+## Dockerfile Instructions {.small-bullets}
+-   First instruction is always **FROM** which specifies the base image
+    -   Base images are a starting point with some basics already installed like the OS and build tools, find them on [DockerHub](https://hub.docker.com/)
+-   **RUN** : Use before running any shell commands
+-   **SHELL** : Set the shell
+-   **USER** : Set the user (within the image)
+-   **CMD** : Set the default instruction to be run by the image
+-   **COPY** : COPY files into the image
+
+
+See the [Dockerfile documentation](https://docs.docker.com/reference/dockerfile/) for a full list of instructions
+
+## Example Dockerfile {.code-alt}
+
+-   Click [here](https://www.dropbox.com/scl/fi/mdbefp3h8ahdvxtgjypqo/Dockerfile?rlkey=7d4zd9ge1m3wwszlfy78712ky&dl=1) to download the example Dockerfile
+-   Open in your preffered text editor
+
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+curl -s -L -o Dockerfile 'https://www.dropbox.com/scl/fi/mdbefp3h8ahdvxtgjypqo/Dockerfile?rlkey=7d4zd9ge1m3wwszlfy78712ky&dl=0'
+cat Dockerfile
+rm Dockerfile
+```
+
+## Building Example Image
+
+-   Do not run this during the workshop
+    -   It requires a lot of RAM
+-   On macOS, make sure you have the Docker Desktop App running
+-   We can provide an additional argument to the **build** command, -t, to set the name of the docker image
+      -   We can add version tags after the name using ":" 
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo "docker build -t docker_hub_user/seurat-harmony:1.0 ."
+```
+
+
+## Pushing Images to DockerHub  {.small-bullets}
+
+-   Make sure you are signed in to your DockerHub account locally (Docker Desktop for macOS)
+-   The image name must start with your user name
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo "docker push docker_hub_user/seurat-harmony:1.0"
+```
+
+-   These can then be "pulled" on to Wynton as apptainer image files (image must be public)
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo "[alice@dev1 ~]$ apptainer pull docker://docker_hub_user/seurat-harmony:1.0"
+```
+
+## Notes on Building Custom Images {.code-small}
+
+-   Time consuming process and uses a lot of RAM on your local machine
+-   A good base image can save you a lot of time
+-   You must run `apt-get update` and `apt-get install` in the same command
+    -   Otherwise you will encounter caching issues
+-   Remember to use `apt-get install -y`
+    -   You will have no control over the process while it's building
+
+# Compute Jobs
+
+
+## Submission Script - Basics {.small-bullets .code-alt}
+
+-   [Download](https://www.dropbox.com/scl/fi/fzp33y1ojslw005q8epuz/simple_submission_script.sh?rlkey=xmg3lqec962y3i57a1bkriosx&dl=1) this example job submission script
+-   Read the full Wynton [job submission guide](https://wynton.ucsf.edu/hpc/scheduler/submit-jobs.html)
+-   Wynton uses the [Sun Grid Engine](https://web.archive.org/web/20210826212738/https://arc.liv.ac.uk/SGE/howto/howto.html) job scheduler
+
+
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+curl -s -L -o submission.sh 'https://www.dropbox.com/scl/fi/fzp33y1ojslw005q8epuz/simple_submission_script.sh?rlkey=xmg3lqec962y3i57a1bkriosx&dl=0'
+cat submission.sh
+rm submission.sh
+```
+
+## Submission Script - Apptainer {.small-bullets .code-alt}
+
+-   [Download](https://www.dropbox.com/scl/fi/zzl9fnfcoxu3pyrx5ffd1/apptainer_submission_script.sh?rlkey=w05e18ahw4hvbvaucac379za9&dl=1) this example job submission script that uses a container
+-   Paths that the container needs read/write access to need to be mounted with APPTAINER_BINDPATH
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+curl -s -L -o submission.sh 'https://www.dropbox.com/scl/fi/zzl9fnfcoxu3pyrx5ffd1/apptainer_submission_script.sh?rlkey=w05e18ahw4hvbvaucac379za9&dl=1'
+cat submission.sh
+rm submission.sh
+```
+
+## Parallel Processing Jobs {.small-bullets}
+
+-   By default jobs run on a single core
+-   Multicore jobs must run in a SGE parallel environment (PE) and tell SGE how many cores the job will use
+-   **Do not use more cores than requested**
+
+
+-   There are four parallel environments on Wynton:
+    -   **smp**: for single-host parallel jobs using [Symmetric multiprocessing](https://en.wikipedia.org/wiki/Symmetric_multiprocessing) (SMP)
+    -   **mpi**: for multiple-host parallel jobs based on [MPI parallelization](https://en.wikipedia.org/wiki/Message_Passing_Interface)
+    -   **mpi_onehost**: for single-host parallel jobs based on MPI parallelization
+    -   **mpi-8**: for multi-threaded multi-host jobs based on MPI parallelization
+
+## Example Parallel Job {.small-bullets .code-alt}
+
+-   The simplest parallel environment on Wynton is **smp**, a single node with *n* cores
+-   [Download](https://www.dropbox.com/scl/fi/71xo0cioh266pj3uwcdps/smp_submission_script.sh?rlkey=kw7qaz8pip6jveqv317b5swqr&dl=1) this example smp job submission script
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+curl -s -L -o submission.sh 'https://www.dropbox.com/scl/fi/71xo0cioh266pj3uwcdps/smp_submission_script.sh?rlkey=kw7qaz8pip6jveqv317b5swqr&dl=0'
+cat submission.sh
+rm submission.sh
+```
+
+
+## Array Jobs {.small-bullets .code-alt}
+
+-   This is a good option if the script you want to run operates on discrete sets of data
+    - e.g. sample or chromosome
+-   [Download](https://www.dropbox.com/scl/fi/upl71jeny62fxfzkxao1f/array_job_submission_script.sh?rlkey=ggkyjxx8nz400e1t96mif5t34&dl=1) this example array job submission script
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+curl -s -L -o submission.sh 'https://www.dropbox.com/scl/fi/upl71jeny62fxfzkxao1f/array_job_submission_script.sh?rlkey=ggkyjxx8nz400e1t96mif5t34&dl=0'
+cat submission.sh
+rm submission.sh
+```
+
+## GPU Jobs  {.small-bullets}
+
+-   To run a [GPU job](https://wynton.ucsf.edu/hpc/scheduler/gpu.html), specify **-q gpu.q** (queue) as a GPU queue
+    -   Other GPU queues may be available to you depending on your lab
+-   It is important to specify the GPU using the **SGE_GPU** variable so that your job uses its assigned GPU
+    -   For CUDA based tools, add **export CUDA_VISIBLE_DEVICES=$SGE_GPU** to your submission script
+-   GPU jobs must include a runtime request or they will be removed from the queue
+
+
+## Submitting and Querying jobs
+
+-   Use **qsub** to submit jobs
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '[alice@dev1 ~]$ qsub job1.sh
+Your job 714888 ("job1.sh") has been submitted'
+```
+
+
+
+
+-   Use **qstat** to check the status of your jobs
+
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '[alice@dev1 ~]$ qstat
+job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
+-----------------------------------------------------------------------------------------------------------------
+ 714888 0.06532 job1 alice     r     03/25/2024 19:54:18 member.q@msg-hmio1                 1        
+ 714889 0.06532 job2 alice     r     03/25/2024 19:54:19 member.q@msg-hmio1                 1        
+'
+```
+
+
+Read the [querying jobs](https://wynton.ucsf.edu/hpc/scheduler/list-jobs.html) Wynton documentation for more information.
+
+
+## Estimating Job Resources
+
+-   Try to estimate the amount of RAM needed using a small test dataset
+-   Request a little more RAM than you need to avoid having your job cancelled
+-   Check on jobs you are running for the first time with **qstat -j <job-id>** to make sure they are not going over
+
+
+## Poll 3
+
+Anything that you can run on a compute node can be run on a development node.
+
+1.    True
+2.    False
+
+
+
+# Running Pipelines
+
+
+## Nextflow RNA-seq {.small-bullets .big-picture}
+
+-   Scientific workflow system with a community maintained set of core bioinformatics [analysis pipelines](https://nf-co.re/)
+-   The most commonly used one is the [RNA-seq pipeline](https://nf-co.re/rnaseq/3.14.0)
+
+![RNA-seq](slide_materials/nf-core-rnaseq_metro_map_grey.png)
+
+
+## Example - RNA-seq Pipeline {.small-bullets}
+
+**Do not run this during the workshop as it will fill up the Wynton SGE queue**
+
+-   Download the [testing script](https://www.dropbox.com/scl/fi/3c9qdmnwg8vw9x517mo05/nextflow_example.sh?rlkey=e9nxbvpcdtdyi5w0y16z9k7bq&dl=0)
+    -   Runs a minimal test on the RNA-seq pipeline
+-   Download the [config file](https://www.dropbox.com/scl/fi/befhl3z6nipn1fqcxpqth/nextflow.config?rlkey=pd8d9vup6pnvb7bbrmekayn2j&dl=0)
+    -   Configures nextflow to use the SGE job scheduler and sets limits on compute job resources for each process
+-   Put these in the same directory (do not use your user home directory for this) and run the script in a screen/tmux session
+-   When not running the test, the **-profile** should be apptainer
+
+
+
+
+
+
+
+# Jupyter Notebooks 
+
+## Installing Jupyter Notebooks
+-   The preferred way to install and use [Jupyter notebooks](https://wynton.ucsf.edu/hpc/howto/jupyter.html) on Wynton is though pip, not conda
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo 'python3 -m pip install --user notebook'
+```
+-   Jupyter notebooks can only be run on development nodes
+-   See the Wynton [python documentation](https://wynton.ucsf.edu/hpc/howto/python.html) for more info on managing python environments on Wynton
+
+
+## Running Jupyter Notebooks - Step 1
+
+-   You cannot connect from outside Wynton HPC directly to a development node
+    -   Instead we need to use SSH port forwarding to establish the connection with a local web browser
+-   Find an available TCP port:
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '[alice@dev1 ~]$ module load CBI port4me
+[alice@dev1 ~]$ port4me --tool=jupyter
+47467'
+```
+
+Note the port number returned by port4me, you will need this later.
+
+
+## Running Jupyter Notebooks - Step 2 {.code-small}
+-   Launch Jupyter notebook using the port numer from step 1
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '[alice@dev1]$ jupyter notebook --no-browser --port 47467
+[I 2024-03-20 14:48:45.693 ServerApp] jupyter_lsp | extension was successfully linked.
+[I 2024-03-20 14:48:45.698 ServerApp] jupyter_server_terminals | extension was successfully linked.
+[I 2024-03-20 14:48:45.703 ServerApp] jupyterlab | extension was successfully linked.
+[I 2024-03-20 14:48:45.708 ServerApp] notebook | extension was successfully linked.
+[I 2024-03-20 14:48:46.577 ServerApp] notebook_shim | extension was successfully linked.
+[I 2024-03-20 14:48:46.666 ServerApp] notebook_shim | extension was successfully loaded.
+[I 2024-03-20 14:48:46.668 ServerApp] jupyter_lsp | extension was successfully loaded.
+[I 2024-03-20 14:48:46.669 ServerApp] jupyter_server_terminals | extension was successfully loaded.
+[I 2024-03-20 14:48:46.675 LabApp] JupyterLab extension loaded from /wynton/home/boblab/alice/.local/lib/python3.11/site-packages/jupyterlab
+[I 2024-03-20 14:48:46.675 LabApp] JupyterLab application directory is /wynton/home/boblab/alice/.local/share/jupyter/lab
+[I 2024-03-20 14:48:46.677 LabApp] Extension Manager is 'pypi'.
+[I 2024-03-20 14:48:46.707 ServerApp] jupyterlab | extension was successfully loaded.
+[I 2024-03-20 14:48:46.711 ServerApp] notebook | extension was successfully loaded.
+[I 2024-03-20 14:48:46.712 ServerApp] Serving notebooks from local directory: /wynton/home/boblab/alice
+[I 2024-03-20 14:48:46.712 ServerApp] Jupyter Server 2.13.0 is running at:
+[I 2024-03-20 14:48:46.712 ServerApp] http://localhost:44214/tree?token=8e37f8d62fca6a1c9b2da429f27df5ebcec706a808c3a8f2
+[I 2024-03-20 14:48:46.712 ServerApp]     http://127.0.0.1:44214/tree?token=8e37f8d62fca6a1c9b2da429f27df5ebcec706a808c3a8f2
+[I 2024-03-20 14:48:46.712 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
+[C 2024-03-20 14:48:46.725 ServerApp]
+
+    To access the server, open this file in a browser:
+        file:///wynton/home/boblab/alice/.local/share/jupyter/runtime/jpserver-2853162-open.html
+    Or copy and paste one of these URLs:
+        http://localhost:44214/tree?token=8e37f8d62fca6a1c9b2da429f27df5ebcec706a808c3a8f2
+        http://127.0.0.1:44214/tree?token=8e37f8d62fca6a1c9b2da429f27df5ebcec706a808c3a8f2'
+```
+
+## Running Jupyter Notebooks - Step 3
+
+- Set up SSH port forwarding on your local machine in a separate terminal, leave both terminals open
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '{local}$ ssh -J alice@log1.wynton.ucsf.edu -L 47467:localhost:47467 alice@dev1
+...
+[alice@dev1 ~]$ '
+```
+
+The notebook should now be available at the URL from step 2
+
+
+# RStudio Server
+
+## RStudio Server
+-   [RStudio server](https://wynton.ucsf.edu/hpc/howto/rstudio.html) is already available in the CBI module
+-   This allows you to set up a personal RStudio instance that only you can access
+-   Requires two separate SSH connections to the cluster:\
+    -   One to launch RStudio Server
+    -   One to connect to it
+
+
+## RStudio Server - Step 1 {.code-small}
+
+-   Launch your own RStudio Server instance
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '[alice@dev1 ~]$ module load CBI rstudio-server-controller
+[alice@dev1 ~]$ rsc start
+alice, your personal RStudio Server 2023.09.1-494 running R 4.3.2 is available on:
+
+  <http://127.0.0.1:20612>
+
+Importantly, if you are running from a remote machine without direct access
+to dev1, you need to set up SSH port forwarding first, which you can do by
+running:
+
+  ssh -L 20612:dev1:20612 alice@log1.wynton.ucsf.edu
+
+in a second terminal from your local computer.
+
+Any R session started times out after being idle for 120 minutes.
+WARNING: You now have 10 minutes, until 2023-11-15 17:06:50-08:00, to
+connect and log in to the RStudio Server before everything times out.
+Your one-time random password for RStudio Server is: y+IWo7rfl7Z7MRCPI3Z4'
+```
+
+Note the password and URL, they will be needed to log in to the server instance.
+
+
+## RStudio Server - Step 2
+
+-   Connect to your personal RStudio Server instance from your local machine in a separate terminal
+```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
+echo '{local}$ ssh -L 20612:dev1:20612 alice@log1.wynton.ucsf.edu
+alice1@log1.wynton.ucsf.edu:s password: XXXXXXXXXXXXXXXXXXX
+[alice@log1 ~]$ '
+```
+
+## RStudio Server - Step 3
+-   Open RStudio Server in your local web browser
+-   Open the link from step 1
+-   Enter your Wynton user name
+-   Enter the password from step 1
+
+# How to Get Help
+
+## Wynton Questions
+
+-   Follow the Wynton [question checklist](https://wynton.ucsf.edu/hpc/support/index.html)
+-   Email
+    -   [support@wynton.ucsf.edu](mailto:support@wynton.ucsf.edu)
+-   Slack
+    -   [ucsf-wynton](https://join.slack.com/t/ucsf-wynton/signup)
+    -   Sign-up using a UCSF email address
+    -   Email support if that does not work
+-   Zoom office hours every **Tuesday at 11-12pm**
+    -   Zoom URL in the message-of-the-day (MOTD) that you get when you log into Wynton
+
+
+
+## Bioinformatics Questions
+
+-   Email
+    -   [bioinformatics@gladstone.ucsf.edu](mailto:bioinformatics@gladstone.ucsf.edu)
+-   Slack channel #questions-about-bioinformatics
+    -   Contact us at the email above to be added to the channel
+
+
+
+# End of Part 2
+
+## Thank You!
+
+-   Please take some time to fill out the workshop survey:   
+[https://www.surveymonkey.com/r/F75J6VZ](https://www.surveymonkey.com/r/F75J6VZ)
+
+
+## Upcoming Data Science Training Program Workshops
+
+
+[Introduction to Linear Mixed Effects Models](https://gladstone.org/events/introduction-linear-mixed-effects-models)   
+April 25-April 26, 2024 1-3pm PDT
+
+[Single Cell RNA-Seq Data Analysis](https://gladstone.org/events/single-cell-rna-seq-data-analysis)     
+April 29-April 30, 2024 9am-4pm PDT
+
+[Single Cell ATAC-Seq Data Analysis Part 1](https://gladstone.org/events/single-cell-atac-seq-data-analysis-part-1-1)    
+May 6-May 7, 2024 1-4pm PDT
+
+
+[Complete Schedule](https://gladstone.org/events?series=189)     
+
+
+
--- a/working-on-wynton-hpc/renv.lock
+++ b/working-on-wynton-hpc/renv.lock
--- a/working-on-wynton-hpc/renv/.gitignore
+++ b/working-on-wynton-hpc/renv/.gitignore
@ -0,0 +1,7 @@
+library/
+local/
+cellar/
+lock/
+python/
+sandbox/
+staging/
--- a/working-on-wynton-hpc/renv/activate.R
+++ b/working-on-wynton-hpc/renv/activate.R
--- a/working-on-wynton-hpc/renv/settings.json
+++ b/working-on-wynton-hpc/renv/settings.json
@ -0,0 +1,19 @@
+{
+  "bioconductor.version": null,
+  "external.libraries": [],
+  "ignored.packages": [],
+  "package.dependency.fields": [
+    "Imports",
+    "Depends",
+    "LinkingTo"
+  ],
+  "ppm.enabled": null,
+  "ppm.ignored.urls": [],
+  "r.version": null,
+  "snapshot.type": "implicit",
+  "use.cache": true,
+  "vcs.ignore.cellar": true,
+  "vcs.ignore.library": true,
+  "vcs.ignore.local": true,
+  "vcs.manage.ignores": true
+}
--- a/working-on-wynton-hpc/slide_materials/Gladstone_Wynton_Docs.pdf
+++ b/working-on-wynton-hpc/slide_materials/Gladstone_Wynton_Docs.pdf
--- a/working-on-wynton-hpc/slide_materials/HPC_diagram.png
+++ b/working-on-wynton-hpc/slide_materials/HPC_diagram.png
--- a/working-on-wynton-hpc/slide_materials/compute_job_workflow.svg
+++ b/working-on-wynton-hpc/slide_materials/compute_job_workflow.svg
--- a/working-on-wynton-hpc/slide_materials/nf-core-rnaseq_metro_map_grey.png
+++ b/working-on-wynton-hpc/slide_materials/nf-core-rnaseq_metro_map_grey.png
--- a/working-on-wynton-hpc/slide_materials/status-chart.png
+++ b/working-on-wynton-hpc/slide_materials/status-chart.png
--- a/working-on-wynton-hpc/style.css
+++ b/working-on-wynton-hpc/style.css
@ -0,0 +1,175 @@
+
+/* Set font size and alignment for the message */
+.bottom-message {
+    font-size: 0.8em !important;
+    font-style: italic !important;
+    text-align: center;
+    position: relative;
+    bottom: 0 !important;
+    left: 0;
+    right: 0;
+}
+
+.reveal code {
+    background-color: #1e1e1eef; /* Dark background for code chunks */
+    color: white; /* White text for code */
+    font-size: 1.2em;
+    line-height: 1.2;
+}
+
+.code-small code {
+    background-color: #1e1e1eef; /* Dark background for code chunks */
+    color: white; /* White text for code */
+    font-size: 1em;
+    line-height: 1;
+}
+
+.reveal code::selection {
+    background-color: #d97306 !important; /* Dark orange background for selected text */
+}
+
+
+/* Specific styles for code output: background */
+.reveal code.output {
+    background-color: black; /* Black background for code outputs */
+}
+
+/* Custom class for code alt display */
+.code-alt code {
+    background-color: #ffecd0ac; /* Dark background for code outputs */
+    max-height: 400px !important;
+    font-family: 'Menlo', sans-serif;
+    font-size: 0.8em;
+    color: rgb(76, 76, 76)
+}
+
+
+
+/* Code output text color */
+.reveal pre code.output {
+    color: white;
+}
+
+  
+/* Left-align all code outputs */
+.reveal pre code {
+  text-align: left !important;
+}
+
+
+/* Add horizontal scrolling to all code outputs */
+.reveal pre code.output {
+    white-space: pre !important;
+    overflow-x: auto !important;
+}
+/* Change the font family used for code blocks */
+pre, code, kbd, samp {
+    font-family: "Courier New", Courier, monospace;
+}
+  
+  
+/* Add horizontal scrolling to all code chunks */
+.reveal pre code {
+    white-space: pre !important;
+    overflow-x: auto !important;
+}
+
+/* Change the font family used for all text except code */
+.reveal p, .reveal li, .reveal h1, .reveal h2, .reveal h3, .reveal h4, .reveal h5, .reveal h6 {
+    font-family: "Helvetica", sans-serif;
+}
+
+.reveal h3 {
+    color: black;
+    font-size: 0.7em;
+}
+/* Bold slide titles and change color */
+.reveal h2 {
+    font-weight: bold !important;
+    color: #9c0366;
+    font-size: 1.3em;
+}
+/* Bold slide titles and change color */
+.reveal h1 {
+    font-weight: bold !important;
+    color: #9c0366;
+    font-size: 2.0em;
+}
+
+.reveal .slides>section:first-child h2 {
+    color: black;
+    font-weight: normal !important;
+}
+
+.reveal .slides>section:first-child h1 {
+    font-weight: bold !important;
+    color: #9c0366;
+}
+
+
+.reveal p {
+    text-align: left;
+    margin-left: 20px !important;
+  }
+.reveal ul {
+    display: block;
+    margin-left: 75px !important;
+    margin-right: 50px !important;
+}
+
+.reveal ul ul {
+    font-size: 0.75em; /* Smaller font size */
+    margin-bottom: 5px !important;
+}
+.reveal ol {
+    display: block;
+    margin-bottom: 20px;
+    margin-left: 75px;
+    margin-right: 50px
+} 
+
+
+/* Decrease size of image, remove border, shadow and center align*/
+.reveal img {
+    max-width: 60%;
+    border: none !important;
+    box-shadow: none !important;
+    display: block !important;
+    margin: 0 auto !important;
+}
+
+small {
+    font-size: 70%;
+}
+
+/* Create a custom class for the small bullets, increase the spacing between list items */
+.small-bullets ul {
+    font-size: 85%;
+}
+.small-list ol {
+    font-size: 80%;
+}
+
+.reveal li {
+    margin-bottom: 10px !important;
+}
+
+
+.less-small-bullets ul {
+    font-size: 80%;
+}
+
+.big-picture img{
+    max-width: 95%;
+
+}
+
+/* Chage link color to sky blue */
+.reveal a {
+    color: #0c74dc;
+}
+
+/* Change link color to magenta on hover */
+.reveal a:hover {
+    color: #9c0366 !important;
+}
--- a/working-on-wynton-hpc/working-on-wynton-hpc.Rproj
+++ b/working-on-wynton-hpc/working-on-wynton-hpc.Rproj
@ -0,0 +1,13 @@
+Version: 1.0
+
+RestoreWorkspace: Default
+SaveWorkspace: Default
+AlwaysSaveHistory: Default
+
+EnableCodeIndexing: Yes
+UseSpacesForTab: Yes
+NumSpacesForTab: 2
+Encoding: UTF-8
+
+RnwWeave: Sweave
+LaTeX: pdfLaTeX