Natalie Elphick
Bioinformatician I
Alex Pico
Bioinformatics Core Director
docker build .
See the Dockerfile documentation for a full list of instructions
# Bioconductor base image gives us access to a lot of bioinformatics tools and R packages.
FROM bioconductor/bioconductor_docker:RELEASE_3_17
# Shell options, we want to exit if any command fails
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
# Root permissions are required to install packages
USER root
# Install any UNIX packages you need
# First we update the package list and then install GNU make
# We clean up after ourselves to reduce the image size
RUN apt-get update && apt-get upgrade -y \
&& apt-get install -y --no-install-recommends make \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Install Seurat and harmony
RUN Rscript -e 'install.packages(c("Seurat","harmony"))'
# Check if installs worked
RUN Rscript -e 'lapply(c("Seurat","harmony"), library, character.only = TRUE)'
# Run container as non-root to avoid permission issues
RUN groupadd -g 10001 notroot && \
useradd -u 10000 -g notroot notroot
# Switch to the non-root user
USER notroot:notroot
# Default command to run when the container starts
CMD ["/bin/bash"]
# Copy dockerfile into the image (optional, but can be useful for reproducibility)
COPY Dockerfile /Dockerfile
docker build -t docker_hub_user/seurat-harmony:1.0 .
docker push docker_hub_user/seurat-harmony:1.0
[alice@dev1 ~]$ apptainer pull docker://docker_hub_user/seurat-harmony:1.0
apt-get update and
apt-get install in the same command
apt-get install -y
#!/bin/bash # the shell language when run outside of the job scheduler
# # lines starting with #$ is an instruction to the job scheduler
#$ -S /bin/bash # the shell language when run via the job scheduler [IMPORTANT]
#$ -cwd # job should run in the current working directory
#$ -j y # STDERR and STDOUT should be joined
#$ -l mem_free=1G # job requires up to 1 GiB of RAM per slot (core)
#$ -l scratch=2G # job requires up to 2 GiB of local /scratch space
#$ -l h_rt=1:00:00 # job requires up to 1 hour of runtime
#$ -r y # if job crashes, it should be restarted
date
hostname
## End-of-job summary, if running as a job
[[ -n "$JOB_ID" ]] && qstat -j "$JOB_ID" # This is useful for debugging and usage purposes,
# e.g. "did my job exceed its memory request?"
#!/bin/bash
#$ -S /bin/bash # the shell language when run via the job scheduler
#$ -cwd # job should run in the current working directory
#$ -j y # STDERR and STDOUT should be joined
#$ -l mem_free=1G # job requires up to 1 GiB of RAM per slot
#$ -l scratch=2G # job requires up to 2 GiB of local /scratch space
#$ -l h_rt=1:00:00 # job requires up to 1 hour of runtime
# Mount the current directory to the container
# Any directroy that needs to be accessed by the container should be mounted
directory=$(pwd)
export APPTAINER_BINDPATH="$directory"
h=$(hostname)
singularity run hello-world_1.0.sif figlet $h > $directory/hello.txt
[[ -n "$JOB_ID" ]] && qstat -j "$JOB_ID"
By default jobs run on a single core
Multicore jobs must run in a SGE parallel environment (PE) and tell SGE how many cores the job will use
Do not use more cores than requested
There are four parallel environments on Wynton:
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 4 # 4 cores on a single node
#$ -l mem_free=2G # 2 GiB of RAM per slot (core), so 8 GiB total
#$ -l scratch=5G # 5 GiB of local /scratch space
#S -l h_rt=08:00:00
# Code that requires 4 cores
# **Specify the number of cores as ${NSLOTS}**
[[ -n "$JOB_ID" ]] && qstat -j "$JOB_ID"
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -j y
#$ -l mem_free=1G
#$ -l scratch=2G
#$ -l h_rt=1:00:00
#$ -t 1-5 # Number of tasks to run in the array (each is a job with the same resource requirements above)
params=(sample1 sample2 sample3 sample4 sample5)
# The task ID is stored in the variable SGE_TASK_ID
# This variable is used to index the array of parameters
# The task ID is 1-indexed
param=${params[$SGE_TASK_ID - 1]}
echo "Running task $SGE_TASK_ID with parameter $param"
# Code for each task
[[ -n "$JOB_ID" ]] && qstat -j "$JOB_ID"
[alice@dev1 ~]$ qsub job1.sh
Your job 714888 ("job1.sh") has been submitted
[alice@dev1 ~]$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
714888 0.06532 job1 alice r 03/25/2024 19:54:18 member.q@msg-hmio1 1
714889 0.06532 job2 alice r 03/25/2024 19:54:19 member.q@msg-hmio1 1
Read the querying jobs Wynton documentation for more information.
Anything that you can run on a compute node can be run on a development node.
Do not run this during the workshop as it will fill up the Wynton SGE queue
python3 -m pip install --user notebook
[alice@dev1 ~]$ module load CBI port4me
[alice@dev1 ~]$ port4me --tool=jupyter
47467
Note the port number returned by port4me, you will need this later.
[alice@dev1]$ jupyter notebook --no-browser --port 47467
[I 2024-03-20 14:48:45.693 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2024-03-20 14:48:45.698 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2024-03-20 14:48:45.703 ServerApp] jupyterlab | extension was successfully linked.
[I 2024-03-20 14:48:45.708 ServerApp] notebook | extension was successfully linked.
[I 2024-03-20 14:48:46.577 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-03-20 14:48:46.666 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-03-20 14:48:46.668 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-03-20 14:48:46.669 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-03-20 14:48:46.675 LabApp] JupyterLab extension loaded from /wynton/home/boblab/alice/.local/lib/python3.11/site-packages/jupyterlab
[I 2024-03-20 14:48:46.675 LabApp] JupyterLab application directory is /wynton/home/boblab/alice/.local/share/jupyter/lab
[I 2024-03-20 14:48:46.677 LabApp] Extension Manager is pypi.
[I 2024-03-20 14:48:46.707 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-03-20 14:48:46.711 ServerApp] notebook | extension was successfully loaded.
[I 2024-03-20 14:48:46.712 ServerApp] Serving notebooks from local directory: /wynton/home/boblab/alice
[I 2024-03-20 14:48:46.712 ServerApp] Jupyter Server 2.13.0 is running at:
[I 2024-03-20 14:48:46.712 ServerApp] http://localhost:44214/tree?token=8e37f8d62fca6a1c9b2da429f27df5ebcec706a808c3a8f2
[I 2024-03-20 14:48:46.712 ServerApp] http://127.0.0.1:44214/tree?token=8e37f8d62fca6a1c9b2da429f27df5ebcec706a808c3a8f2
[I 2024-03-20 14:48:46.712 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2024-03-20 14:48:46.725 ServerApp]
To access the server, open this file in a browser:
file:///wynton/home/boblab/alice/.local/share/jupyter/runtime/jpserver-2853162-open.html
Or copy and paste one of these URLs:
http://localhost:44214/tree?token=8e37f8d62fca6a1c9b2da429f27df5ebcec706a808c3a8f2
http://127.0.0.1:44214/tree?token=8e37f8d62fca6a1c9b2da429f27df5ebcec706a808c3a8f2
{local}$ ssh -J alice@log1.wynton.ucsf.edu -L 47467:localhost:47467 alice@dev1
...
[alice@dev1 ~]$
The notebook should now be available at the URL from step 2
[alice@dev1 ~]$ module load CBI rstudio-server-controller
[alice@dev1 ~]$ rsc start
alice, your personal RStudio Server 2023.09.1-494 running R 4.3.2 is available on:
<http://127.0.0.1:20612>
Importantly, if you are running from a remote machine without direct access
to dev1, you need to set up SSH port forwarding first, which you can do by
running:
ssh -L 20612:dev1:20612 alice@log1.wynton.ucsf.edu
in a second terminal from your local computer.
Any R session started times out after being idle for 120 minutes.
WARNING: You now have 10 minutes, until 2023-11-15 17:06:50-08:00, to
connect and log in to the RStudio Server before everything times out.
Your one-time random password for RStudio Server is: y+IWo7rfl7Z7MRCPI3Z4
Note the password and URL, they will be needed to log in to the server instance.
{local}$ ssh -L 20612:dev1:20612 alice@log1.wynton.ucsf.edu
alice1@log1.wynton.ucsf.edu:s password: XXXXXXXXXXXXXXXXXXX
[alice@log1 ~]$
Introduction
to Linear Mixed Effects Models
April 25-April 26, 2024 1-3pm PDT
Single
Cell RNA-Seq Data Analysis
April 29-April 30, 2024 9am-4pm PDT
Single
Cell ATAC-Seq Data Analysis Part 1
May 6-May 7, 2024 1-4pm PDT