+
+
+Motivation
+
+- Compute heavy jobs (high RAM, multiple cores) should be run on
+compute nodes
+- Containers allow us to make additional software available to the
+compute nodes
+
+- Also allows the use of software that might be hard to install on
+Rocky 8 Linux
+- Improves reproducibility
+
+
+![Compute Jobs]()
+
+
+Dockerfile Basics
+
+- Dockerfiles contain instructions to build an image in
+layers
+- Layers are added using Dockerfile instruction syntax
+- Images are built by navigating to the directory that contains the
+Dockerfile and running:
+
+docker build .
+
+
+Dockerfile Instructions
+
+- First instruction is always FROM which specifies
+the base image
+
+- Base images are a starting point with some basics already installed
+like the OS and build tools, find them on DockerHub
+
+- RUN : Use before running any shell commands
+- SHELL : Set the shell
+- USER : Set the user (within the image)
+- CMD : Set the default instruction to be run by the
+image
+- COPY : COPY files into the image
+
+See the Dockerfile
+documentation for a full list of instructions
+
+
+Example Dockerfile
+
+- Click here
+to download the example Dockerfile
+- Open in your preffered text editor
+
+# Bioconductor base image gives us access to a lot of bioinformatics tools and R packages.
+FROM bioconductor/bioconductor_docker:RELEASE_3_17
+
+# Shell options, we want to exit if any command fails
+SHELL ["/bin/bash", "-o", "pipefail", "-c"]
+
+# Root permissions are required to install packages
+USER root
+
+
+# Install any UNIX packages you need
+# First we update the package list and then install GNU make
+# We clean up after ourselves to reduce the image size
+RUN apt-get update && apt-get upgrade -y \
+ && apt-get install -y --no-install-recommends make \
+ && apt-get clean \
+ && rm -rf /var/lib/apt/lists/*
+
+# Install Seurat and harmony
+RUN Rscript -e 'install.packages(c("Seurat","harmony"))'
+# Check if installs worked
+RUN Rscript -e 'lapply(c("Seurat","harmony"), library, character.only = TRUE)'
+
+
+# Run container as non-root to avoid permission issues
+RUN groupadd -g 10001 notroot && \
+ useradd -u 10000 -g notroot notroot
+
+# Switch to the non-root user
+USER notroot:notroot
+
+# Default command to run when the container starts
+CMD ["/bin/bash"]
+
+# Copy dockerfile into the image (optional, but can be useful for reproducibility)
+COPY Dockerfile /Dockerfile
+
+
+Building Example Image
+
+- Do not run this during the workshop
+
+- It requires a lot of RAM
+
+- On macOS, make sure you have the Docker Desktop App running
+- We can provide an additional argument to the build
+command, -t, to set the name of the docker image
+
+- We can add version tags after the name using “:”
+
+
+docker build -t docker_hub_user/seurat-harmony:1.0 .
+
+
+Pushing Images to DockerHub
+
+- Make sure you are signed in to your DockerHub account locally
+(Docker Desktop for macOS)
+- The image name must start with your user name
+
+docker push docker_hub_user/seurat-harmony:1.0
+
+- These can then be “pulled” on to Wynton as apptainer image files
+(image must be public)
+
+[alice@dev1 ~]$ apptainer pull docker://docker_hub_user/seurat-harmony:1.0
+
+
+Notes on Building Custom Images
+
+- Time consuming process and uses a lot of RAM on your local
+machine
+- A good base image can save you a lot of time
+- You must run
apt-get update and
+apt-get install in the same command
+
+- Otherwise you will encounter caching issues
+
+- Remember to use
apt-get install -y
+
+- You will have no control over the process while it’s building
+
+
+
+
+
+Submission Script - Basics
+
+#!/bin/bash # the shell language when run outside of the job scheduler
+# # lines starting with #$ is an instruction to the job scheduler
+#$ -S /bin/bash # the shell language when run via the job scheduler [IMPORTANT]
+#$ -cwd # job should run in the current working directory
+#$ -j y # STDERR and STDOUT should be joined
+#$ -l mem_free=1G # job requires up to 1 GiB of RAM per slot (core)
+#$ -l scratch=2G # job requires up to 2 GiB of local /scratch space
+#$ -l h_rt=1:00:00 # job requires up to 1 hour of runtime
+#$ -r y # if job crashes, it should be restarted
+
+date
+hostname
+
+## End-of-job summary, if running as a job
+[[ -n "$JOB_ID" ]] && qstat -j "$JOB_ID" # This is useful for debugging and usage purposes,
+ # e.g. "did my job exceed its memory request?"
+
+
+Submission Script - Apptainer
+
+- Download
+this example job submission script that uses a container
+- Paths that the container needs read/write access to need to be
+mounted with APPTAINER_BINDPATH
+
+#!/bin/bash
+#$ -S /bin/bash # the shell language when run via the job scheduler
+#$ -cwd # job should run in the current working directory
+#$ -j y # STDERR and STDOUT should be joined
+#$ -l mem_free=1G # job requires up to 1 GiB of RAM per slot
+#$ -l scratch=2G # job requires up to 2 GiB of local /scratch space
+#$ -l h_rt=1:00:00 # job requires up to 1 hour of runtime
+
+
+# Mount the current directory to the container
+# Any directroy that needs to be accessed by the container should be mounted
+directory=$(pwd)
+export APPTAINER_BINDPATH="$directory"
+
+h=$(hostname)
+
+singularity run hello-world_1.0.sif figlet $h > $directory/hello.txt
+
+[[ -n "$JOB_ID" ]] && qstat -j "$JOB_ID"
+
+
+Parallel Processing Jobs
+
+By default jobs run on a single core
+Multicore jobs must run in a SGE parallel environment (PE) and
+tell SGE how many cores the job will use
+Do not use more cores than requested
+There are four parallel environments on Wynton:
+
+- smp: for single-host parallel jobs using Symmetric
+multiprocessing (SMP)
+- mpi: for multiple-host parallel jobs based on MPI
+parallelization
+- mpi_onehost: for single-host parallel jobs based on
+MPI parallelization
+- mpi-8: for multi-threaded multi-host jobs based on
+MPI parallelization
+
+
+
+
+Example Parallel Job
+
+- The simplest parallel environment on Wynton is smp,
+a single node with n cores
+- Download
+this example smp job submission script
+
+#!/bin/bash
+#$ -S /bin/bash
+#$ -cwd
+#$ -j y
+#$ -pe smp 4 # 4 cores on a single node
+#$ -l mem_free=2G # 2 GiB of RAM per slot (core), so 8 GiB total
+#$ -l scratch=5G # 5 GiB of local /scratch space
+#S -l h_rt=08:00:00
+
+
+# Code that requires 4 cores
+# **Specify the number of cores as ${NSLOTS}**
+
+
+
+[[ -n "$JOB_ID" ]] && qstat -j "$JOB_ID"
+
+
+Array Jobs
+
+- This is a good option if the script you want to run operates on
+discrete sets of data
+
+- e.g. sample or chromosome
+
+- Download
+this example array job submission script
+
+#!/bin/bash
+#$ -S /bin/bash
+#$ -cwd
+#$ -j y
+#$ -l mem_free=1G
+#$ -l scratch=2G
+#$ -l h_rt=1:00:00
+#$ -t 1-5 # Number of tasks to run in the array (each is a job with the same resource requirements above)
+
+params=(sample1 sample2 sample3 sample4 sample5)
+
+# The task ID is stored in the variable SGE_TASK_ID
+# This variable is used to index the array of parameters
+# The task ID is 1-indexed
+param=${params[$SGE_TASK_ID - 1]}
+
+echo "Running task $SGE_TASK_ID with parameter $param"
+
+# Code for each task
+
+[[ -n "$JOB_ID" ]] && qstat -j "$JOB_ID"
+
+
+GPU Jobs
+
+- To run a GPU job,
+specify -q gpu.q (queue) as a GPU queue
+
+- Other GPU queues may be available to you depending on your lab
+
+- It is important to specify the GPU using the
+SGE_GPU variable so that your job uses its assigned GPU
+
+- For CUDA based tools, add export
+CUDA_VISIBLE_DEVICES=$SGE_GPU to your submission script
+
+- GPU jobs must include a runtime request or they will be removed from
+the queue
+
+
+
+Submitting and Querying jobs
+
+- Use qsub to submit jobs
+
+[alice@dev1 ~]$ qsub job1.sh
+Your job 714888 ("job1.sh") has been submitted
+
+- Use qstat to check the status of your jobs
+
+[alice@dev1 ~]$ qstat
+job-ID prior name user state submit/start at queue slots ja-task-ID
+-----------------------------------------------------------------------------------------------------------------
+ 714888 0.06532 job1 alice r 03/25/2024 19:54:18 member.q@msg-hmio1 1
+ 714889 0.06532 job2 alice r 03/25/2024 19:54:19 member.q@msg-hmio1 1
+Read the querying
+jobs Wynton documentation for more information.
+
+
+Estimating Job Resources
+
+- Try to estimate the amount of RAM needed using a small test
+dataset
+- Request a little more RAM than you need to avoid having your job
+cancelled
+- Check on jobs you are running for the first time with qstat
+-j to make sure they are not going over
+
+
+
+Poll 3
+Anything that you can run on a compute node can be run on a
+development node.
+
+- True
+- False
+
+
+
+
+Installing Jupyter Notebooks
+
+- The preferred way to install and use Jupyter
+notebooks on Wynton is though pip, not conda
+
+python3 -m pip install --user notebook
+
+- Jupyter notebooks can only be run on development nodes
+- See the Wynton python
+documentation for more info on managing python environments on
+Wynton
+
+
+
+Running Jupyter Notebooks - Step 1
+
+- You cannot connect from outside Wynton HPC directly to a development
+node
+
+- Instead we need to use SSH port forwarding to establish the
+connection with a local web browser
+
+- Find an available TCP port:
+
+[alice@dev1 ~]$ module load CBI port4me
+[alice@dev1 ~]$ port4me --tool=jupyter
+47467
+Note the port number returned by port4me, you will need this
+later.
+
+
+Running Jupyter Notebooks - Step 2
+
+- Launch Jupyter notebook using the port numer from step 1
+
+[alice@dev1]$ jupyter notebook --no-browser --port 47467
+[I 2024-03-20 14:48:45.693 ServerApp] jupyter_lsp | extension was successfully linked.
+[I 2024-03-20 14:48:45.698 ServerApp] jupyter_server_terminals | extension was successfully linked.
+[I 2024-03-20 14:48:45.703 ServerApp] jupyterlab | extension was successfully linked.
+[I 2024-03-20 14:48:45.708 ServerApp] notebook | extension was successfully linked.
+[I 2024-03-20 14:48:46.577 ServerApp] notebook_shim | extension was successfully linked.
+[I 2024-03-20 14:48:46.666 ServerApp] notebook_shim | extension was successfully loaded.
+[I 2024-03-20 14:48:46.668 ServerApp] jupyter_lsp | extension was successfully loaded.
+[I 2024-03-20 14:48:46.669 ServerApp] jupyter_server_terminals | extension was successfully loaded.
+[I 2024-03-20 14:48:46.675 LabApp] JupyterLab extension loaded from /wynton/home/boblab/alice/.local/lib/python3.11/site-packages/jupyterlab
+[I 2024-03-20 14:48:46.675 LabApp] JupyterLab application directory is /wynton/home/boblab/alice/.local/share/jupyter/lab
+[I 2024-03-20 14:48:46.677 LabApp] Extension Manager is pypi.
+[I 2024-03-20 14:48:46.707 ServerApp] jupyterlab | extension was successfully loaded.
+[I 2024-03-20 14:48:46.711 ServerApp] notebook | extension was successfully loaded.
+[I 2024-03-20 14:48:46.712 ServerApp] Serving notebooks from local directory: /wynton/home/boblab/alice
+[I 2024-03-20 14:48:46.712 ServerApp] Jupyter Server 2.13.0 is running at:
+[I 2024-03-20 14:48:46.712 ServerApp] http://localhost:44214/tree?token=8e37f8d62fca6a1c9b2da429f27df5ebcec706a808c3a8f2
+[I 2024-03-20 14:48:46.712 ServerApp] http://127.0.0.1:44214/tree?token=8e37f8d62fca6a1c9b2da429f27df5ebcec706a808c3a8f2
+[I 2024-03-20 14:48:46.712 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
+[C 2024-03-20 14:48:46.725 ServerApp]
+
+ To access the server, open this file in a browser:
+ file:///wynton/home/boblab/alice/.local/share/jupyter/runtime/jpserver-2853162-open.html
+ Or copy and paste one of these URLs:
+ http://localhost:44214/tree?token=8e37f8d62fca6a1c9b2da429f27df5ebcec706a808c3a8f2
+ http://127.0.0.1:44214/tree?token=8e37f8d62fca6a1c9b2da429f27df5ebcec706a808c3a8f2
+
+
+Running Jupyter Notebooks - Step 3
+
+- Set up SSH port forwarding on your local machine in a separate
+terminal, leave both terminals open
+
+{local}$ ssh -J alice@log1.wynton.ucsf.edu -L 47467:localhost:47467 alice@dev1
+...
+[alice@dev1 ~]$
+The notebook should now be available at the URL from step 2
+