mirror of
https://github.com/gladstone-institutes/Bioinformatics-Workshops.git
synced 2025-11-30 09:45:43 -08:00
final edits
This commit is contained in:
parent
b0dd9c589e
commit
d482023d75
4 changed files with 189 additions and 174 deletions
|
|
@ -585,10 +585,16 @@ document.addEventListener('DOMContentLoaded', function(e) {
|
|||
</section>
|
||||
<section id="introductions" class="slide level2">
|
||||
<h2>Introductions</h2>
|
||||
<p><strong>Natalie Elphick</strong><br />
|
||||
Bioinformatician I</p>
|
||||
<p><strong>Alex Pico (TA)</strong><br />
|
||||
Bioinformatics Core Director</p>
|
||||
<p>Instructor:</p>
|
||||
<p> <strong>Natalie Elphick</strong><br />
|
||||
<em>Bioinformatician I</em></p>
|
||||
<p>TAs:</p>
|
||||
<p> <strong>Alex Pico</strong><br />
|
||||
<em>Bioinformatics Core Director</em><br />
|
||||
<strong>Ayushi Agrawal</strong><br />
|
||||
<em>Bioinformatician III</em><br />
|
||||
<strong>Min-Gyoung Shin</strong><br />
|
||||
<em>Bioinformatician III</em></p>
|
||||
</section>
|
||||
<section id="target-audience" class="slide level2">
|
||||
<h2>Target Audience</h2>
|
||||
|
|
@ -634,7 +640,7 @@ specifications</a>)<br />
|
|||
<li>Funded and administered cooperatively by UCSF campus IT and key
|
||||
research groups</li>
|
||||
</ul>
|
||||
<p><a href="https://wynton.ucsf.edu">https://wynton.ucsf.edu</a></p>
|
||||
<p><a href="https://wynton.ucsf.edu" class="uri">https://wynton.ucsf.edu</a></p>
|
||||
</section></section>
|
||||
<section>
|
||||
<section id="node-types-and-logging-in" class="title-slide slide level1">
|
||||
|
|
@ -781,8 +787,8 @@ requests</li>
|
|||
<li>Certain input/output patterns can be problematic</li>
|
||||
</ul>
|
||||
</section>
|
||||
<section id="beegfs---io-patterns" class="slide level2 small-bullets">
|
||||
<h2 class="small-bullets">BeeGFS - I/O patterns</h2>
|
||||
<section id="beegfs---io-patterns" class="slide level2">
|
||||
<h2>BeeGFS - I/O patterns</h2>
|
||||
<ul>
|
||||
<li>Anything that requires lots of metadata operations can feel slow
|
||||
<ul>
|
||||
|
|
@ -791,21 +797,18 @@ and directory searches (<strong>conda</strong>)</li>
|
|||
</ul></li>
|
||||
<li>Keep the number of reads and writes to a single directory to a
|
||||
reasonable number</li>
|
||||
<li>If using conda, putting the conda application inside a Apptainer
|
||||
(formerly singularity) container will result in better performance</li>
|
||||
</ul>
|
||||
</section>
|
||||
<section id="beegfs---tips" class="slide level2">
|
||||
<h2>BeeGFS - Tips</h2>
|
||||
<section id="beegfs---takehome-message" class="slide level2 small-bullets">
|
||||
<h2 class="small-bullets">BeeGFS - Takehome Message</h2>
|
||||
<ul>
|
||||
<li>Prefer fewer, large files over many small ones</li>
|
||||
<li>Distribute reading and writing over several directories
|
||||
<ul>
|
||||
<li>Including compute job output and error files</li>
|
||||
</ul></li>
|
||||
<li>Use local scratch (/scratch) when possible</li>
|
||||
<li>Distribute reading and writing over several directories</li>
|
||||
<li>Use local scratch (<strong>/scratch</strong>) when possible</li>
|
||||
<li>Don’t include anything in <strong>/wynton</strong> in your default
|
||||
LD_LIBRARY_PATH</li>
|
||||
<li>If using conda, putting the conda application inside a Apptainer
|
||||
(formerly singularity) container will result in better performance</li>
|
||||
</ul>
|
||||
</section>
|
||||
<section id="storage-1" class="slide level2 small-bullets">
|
||||
|
|
@ -872,18 +875,22 @@ contacting Gladstone IT<br />
|
|||
knowledge base page</a></li>
|
||||
</ul>
|
||||
</section>
|
||||
<section id="storage-advice" class="slide level2">
|
||||
<h2>Storage Advice</h2>
|
||||
<section id="storage-advice" class="slide level2 small-bullets">
|
||||
<h2 class="small-bullets">Storage Advice</h2>
|
||||
<ul>
|
||||
<li>Always back up anything you store under
|
||||
<strong>/wynton</strong></li>
|
||||
<li>Back up your data on <strong>/gladstone</strong> if you have access
|
||||
to it
|
||||
<li>If you have access to it keep all of your data on
|
||||
<strong>/gladstone</strong>
|
||||
<ul>
|
||||
<li>A large number of jobs reading and writing to these directories will
|
||||
<li>A large number of jobs reading and writing to these directories may
|
||||
be slower since it is NFS mounted not BeeGFS</li>
|
||||
</ul></li>
|
||||
<li>Use the scratch directories to store temporary files</li>
|
||||
<li>Use the scratch directories to store temporary files
|
||||
<ul>
|
||||
<li>e.g. A large amount of .fastq that you do not need after the
|
||||
alignment step</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
</section></section>
|
||||
<section>
|
||||
|
|
@ -906,6 +913,13 @@ be slower since it is NFS mounted not BeeGFS</li>
|
|||
</ul>
|
||||
<pre><code>{local}$ scp alice@dt1.wynton.ucsf.edu:/path/to/local_file.tsv /destination/path</code></pre>
|
||||
</section>
|
||||
<section id="hands-on" class="slide level2">
|
||||
<h2>Hands-on</h2>
|
||||
<ul>
|
||||
<li>Use scp to copy this <a href="https://www.dropbox.com/scl/fi/463ymz88q89d2co90kj30/candidatus_carsonella_ruddii_complete_genome.fasta?rlkey=9x64iek2yy149sh2i1r9sse9y&dl=1">file</a>
|
||||
into your home directory on Wynton</li>
|
||||
</ul>
|
||||
</section>
|
||||
<section id="gui-sftp-clients" class="slide level2 small-bullets">
|
||||
<h2 class="small-bullets">GUI SFTP Clients</h2>
|
||||
<ul>
|
||||
|
|
@ -955,8 +969,7 @@ UCSF Box</a> instructions</li>
|
|||
</section>
|
||||
<section id="poll-1" class="slide level2">
|
||||
<h2>Poll 1</h2>
|
||||
<p>Which of these can you <strong>not</strong> log in to from your
|
||||
computer?</p>
|
||||
<p>Poll 1 - Which of these can you <strong>not</strong> SSH in to?</p>
|
||||
<ol type="1">
|
||||
<li>Login Nodes</li>
|
||||
<li>Development Nodes</li>
|
||||
|
|
@ -967,7 +980,7 @@ computer?</p>
|
|||
<section id="poll-2" class="slide level2">
|
||||
<h2>Poll 2</h2>
|
||||
<p>The <strong>/wynton</strong> directory is backed up on a nightly
|
||||
basis so do not need to back up the data you store here.</p>
|
||||
basis, so there is no need to back up anything stored here.</p>
|
||||
<ol type="1">
|
||||
<li>True</li>
|
||||
<li>False</li>
|
||||
|
|
@ -1136,7 +1149,7 @@ run:</li>
|
|||
<ul>
|
||||
<li>Please take some time to fill out the workshop survey if you are not
|
||||
attending part 2:<br />
|
||||
<a href="https://www.surveymonkey.com/r/F75J6VZ">https://www.surveymonkey.com/r/F75J6VZ</a></li>
|
||||
<a href="https://www.surveymonkey.com/r/F75J6VZ" class="uri">https://www.surveymonkey.com/r/F75J6VZ</a></li>
|
||||
</ul>
|
||||
</section>
|
||||
<section id="upcoming-data-science-training-program-workshops" class="slide level2">
|
||||
|
|
|
|||
|
|
@ -585,10 +585,14 @@ document.addEventListener('DOMContentLoaded', function(e) {
|
|||
</section>
|
||||
<section id="introductions" class="slide level2">
|
||||
<h2>Introductions</h2>
|
||||
<p><strong>Natalie Elphick</strong><br />
|
||||
Bioinformatician I</p>
|
||||
<p><strong>Alex Pico (TA)</strong><br />
|
||||
Bioinformatics Core Director</p>
|
||||
<p>Instructor:</p>
|
||||
<p> <strong>Natalie Elphick</strong><br />
|
||||
<em>Bioinformatician I</em></p>
|
||||
<p>TAs:</p>
|
||||
<p> <strong>Alex Pico</strong><br />
|
||||
<em>Bioinformatics Core Director</em><br />
|
||||
<strong>Michela Traglia</strong><br />
|
||||
<em>Senior Statistician</em></p>
|
||||
</section>
|
||||
<section id="target-audience" class="slide level2">
|
||||
<h2>Target Audience</h2>
|
||||
|
|
@ -939,8 +943,8 @@ cancelled</li>
|
|||
</section>
|
||||
<section id="poll-3" class="slide level2">
|
||||
<h2>Poll 3</h2>
|
||||
<p>Anything that you can run on a compute node can be run on a
|
||||
development node.</p>
|
||||
<p>Any submitted job to compute nodes can also be run on development
|
||||
nodes.</p>
|
||||
<ol type="1">
|
||||
<li>True</li>
|
||||
<li>False</li>
|
||||
|
|
@ -1155,6 +1159,8 @@ into Wynton</li>
|
|||
</section>
|
||||
<section id="bioinformatics-questions" class="slide level2">
|
||||
<h2>Bioinformatics Questions</h2>
|
||||
<p>For any bioinformatics specific questions feel free to reach out to
|
||||
the Gladstone Bioinformatics Core.</p>
|
||||
<ul>
|
||||
<li>Email
|
||||
<ul>
|
||||
|
|
@ -1174,8 +1180,11 @@ into Wynton</li>
|
|||
<section id="thank-you" class="slide level2">
|
||||
<h2>Thank You!</h2>
|
||||
<ul>
|
||||
<li>Please take some time to fill out the workshop survey:<br />
|
||||
<a href="https://www.surveymonkey.com/r/F75J6VZ">https://www.surveymonkey.com/r/F75J6VZ</a></li>
|
||||
<li><p>Please take some time to fill out the workshop survey:<br />
|
||||
<a href="https://www.surveymonkey.com/r/F75J6VZ">https://www.surveymonkey.com/r/F75J6VZ</a></p></li>
|
||||
<li><p>Want some additional Wynton training?<br />
|
||||
Check out the UCSF library <a href="https://calendars.library.ucsf.edu/event/12197724">Introduction to
|
||||
Wynton HPC Cluster</a> Workshop</p></li>
|
||||
</ul>
|
||||
</section>
|
||||
<section id="upcoming-data-science-training-program-workshops" class="slide level2">
|
||||
|
|
|
|||
|
|
@ -25,50 +25,50 @@ output:
|
|||
<center>*Press the ? key for tips on navigating these slides*</center>
|
||||
|
||||
## Introductions
|
||||
Instructor:
|
||||
|
||||
**Natalie Elphick**
|
||||
Bioinformatician I
|
||||
**Natalie Elphick**
|
||||
*Bioinformatician I*
|
||||
|
||||
**Alex Pico (TA)**
|
||||
Bioinformatics Core Director
|
||||
|
||||
TAs:
|
||||
|
||||
**Alex Pico**
|
||||
*Bioinformatics Core Director*
|
||||
**Ayushi Agrawal**
|
||||
*Bioinformatician III*
|
||||
**Min-Gyoung Shin**
|
||||
*Bioinformatician III*
|
||||
|
||||
## Target Audience
|
||||
- Prior experience with UNIX command-line
|
||||
|
||||
|
||||
- Prior experience with UNIX command-line
|
||||
|
||||
## Part 1:
|
||||
|
||||
1. What is an HPC cluster?
|
||||
2. Node Types and Logging in
|
||||
3. Storage
|
||||
4. Data Transfer
|
||||
5. Installing Software
|
||||
6. Containers
|
||||
|
||||
|
||||
1. What is an HPC cluster?
|
||||
2. Node Types and Logging in
|
||||
3. Storage
|
||||
4. Data Transfer
|
||||
5. Installing Software
|
||||
6. Containers
|
||||
|
||||
# What is Wynton HPC?
|
||||
|
||||
## High-performance Computing Cluster {.smaller-picture}
|
||||
|
||||
- A collection of specialized computers (nodes) connected together on a fast local network
|
||||
- A collection of specialized computers (nodes) connected together on a fast local network
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
## Wynton {.small-bullets}
|
||||
|
||||
- A HPC Linux environment available to all UCSF researchers for free
|
||||
- A HPC Linux environment available to all UCSF researchers for free\
|
||||
- Uses the Rocky 8 linux OS
|
||||
- Includes several hundred compute nodes and a large shared storage system ([Cluster specifications](https://wynton.ucsf.edu/hpc/about/specs.html))
|
||||
- Funded and administered cooperatively by UCSF campus IT and key research groups
|
||||
|
||||
[https://wynton.ucsf.edu](https://wynton.ucsf.edu)
|
||||
|
||||
- Includes several hundred compute nodes and a large shared storage system ([Cluster specifications](https://wynton.ucsf.edu/hpc/about/specs.html))\
|
||||
- Funded and administered cooperatively by UCSF campus IT and key research groups
|
||||
|
||||
<https://wynton.ucsf.edu>
|
||||
|
||||
# Node Types and Logging in
|
||||
|
||||
|
|
@ -86,15 +86,14 @@ Bioinformatics Core Director
|
|||
- The primary method to log in is to use an SSH client application
|
||||
- The Wynton HPC is up to date with information on logging in: [Access Cluster](https://wynton.ucsf.edu/hpc/get-started/access-cluster.html)
|
||||
|
||||
<u>Names</u>:
|
||||
|
||||
|
||||
<u>Names</u>:
|
||||
|
||||
log1, log2 and plog (for PHI users)
|
||||
|
||||
## Login {.small-bullets}
|
||||
|
||||
- Connect to the UCSF or Gladstone WiFi networks (or the respective VPN) or using [2FA](https://wynton.ucsf.edu/hpc/get-started/duo-signup.html)
|
||||
- **ssh [your-username]@[node].wynton.ucsf.edu**
|
||||
- Connect to the UCSF or Gladstone WiFi networks (or the respective VPN) or using [2FA](https://wynton.ucsf.edu/hpc/get-started/duo-signup.html)\
|
||||
- **ssh [your-username]\@[node].wynton.ucsf.edu**
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
|
||||
echo "{local}$ ssh alice@log1.wynton.ucsf.edu
|
||||
|
|
@ -104,20 +103,20 @@ alice@log1.wynton.ucsf.edu's password:
|
|||
|
||||
- There will not be any visual feedback when typing your password
|
||||
|
||||
|
||||
## The Development Nodes {.small-bullets}
|
||||
|
||||
- Has a set of [core software](https://wynton.ucsf.edu/hpc/software/core-software.html) installed
|
||||
- e.g. git, vim, nano, make and python
|
||||
- e.g. git, vim, nano, make and python
|
||||
- Also has access to [software repositories](https://wynton.ucsf.edu/hpc/software/software-repositories.html) some which are maintained by other users or research groups
|
||||
- e.g. matlab, R and openjdk
|
||||
- e.g. matlab, R and openjdk
|
||||
- Cannot be logged in to directly, only from a login node
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
|
||||
echo "ssh dev1"
|
||||
```
|
||||
|
||||
<u>Names</u>:
|
||||
|
||||
<u>Names</u>:
|
||||
|
||||
dev[1-3], gpudev1, pdev1 (PHI) and pgpudev1 (PHI)
|
||||
|
||||
## Data Transfer Nodes {.small-bullets}
|
||||
|
|
@ -127,88 +126,81 @@ dev[1-3], gpudev1, pdev1 (PHI) and pgpudev1 (PHI)
|
|||
- Limited software
|
||||
- Use for transferring files to and from Wynton
|
||||
|
||||
<u>Example</u>:
|
||||
<u>Example</u>:
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
|
||||
echo "{local}$ scp local_file.tsv alice@dt1.wynton.ucsf.edu:~/"
|
||||
```
|
||||
|
||||
<u>Names</u>:
|
||||
|
||||
<u>Names</u>:
|
||||
|
||||
dt1 and dt2
|
||||
|
||||
|
||||
|
||||
## Compute Nodes {.small-bullets}
|
||||
|
||||
- Can **not** be logged in to directly
|
||||
- No internet or UCSF network access
|
||||
- No internet or UCSF network access\
|
||||
- Used to run non-interactive compute job scripts
|
||||
- The software to run the job script is provided using a container
|
||||
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
# Storage
|
||||
|
||||
|
||||
## The File System {.small-bullets}
|
||||
|
||||
- A file system how information is stored and retrieved on a computer
|
||||
- Consists of files and directories
|
||||
- Consists of files and directories
|
||||
- A local file system is function of the operating system and only accessible from a single computer
|
||||
- A shared file system is accessible from multiple computers
|
||||
|
||||
|
||||
|
||||
## BeeGFS {.small-bullets}
|
||||
|
||||
- Wynton uses a *parallel* shared file system called BeeGFS
|
||||
- The files are stored as "chunks" spread across many different servers
|
||||
- The files are stored as "chunks" spread across many different servers
|
||||
- BeeGFS has multiple services that work together to manage the file system
|
||||
- Storage (stores the chunks)
|
||||
- Metadata (tracks the chunks and information about their file)
|
||||
- Management (tracks all of the services)
|
||||
- Client (provides linux access to the file system)
|
||||
|
||||
- Storage (stores the chunks)
|
||||
- Metadata (tracks the chunks and information about their file)
|
||||
- Management (tracks all of the services)
|
||||
- Client (provides linux access to the file system)
|
||||
|
||||
## BeeGFS - Advantages
|
||||
|
||||
## BeeGFS - Advantages
|
||||
- High throughput
|
||||
- Redundancy can be built in by mirroring services
|
||||
- Adding new storage is fast and does not require downtime
|
||||
|
||||
## BeeGFS - Caveats
|
||||
## BeeGFS - Caveats
|
||||
|
||||
- For any client node, performance is limited by the network bandwidth of that node
|
||||
- Network latency becomes extremely important for all metadata requests
|
||||
- Certain input/output patterns can be problematic
|
||||
|
||||
## BeeGFS - I/O patterns {.small-bullets}
|
||||
- Anything that requires lots of metadata operations can feel slow
|
||||
## BeeGFS - I/O patterns
|
||||
|
||||
- Anything that requires lots of metadata operations can feel slow
|
||||
- e.g: lots of writes to the same directory and lots of file lookups and directory searches (**conda**)
|
||||
- Keep the number of reads and writes to a single directory to a reasonable number
|
||||
- If using conda, putting the conda application inside a Apptainer (formerly singularity) container will result in better performance
|
||||
|
||||
## BeeGFS - Tips
|
||||
|
||||
## BeeGFS - Takehome Message {.small-bullets}
|
||||
|
||||
- Prefer fewer, large files over many small ones
|
||||
- Distribute reading and writing over several directories
|
||||
- Including compute job output and error files
|
||||
- Use local scratch (**/scratch**) when possible
|
||||
- Don't include anything in **/wynton** in your default LD_LIBRARY_PATH
|
||||
|
||||
- If using conda, putting the conda application inside a Apptainer (formerly singularity) container will result in better performance
|
||||
|
||||
## Storage {.small-bullets}
|
||||
|
||||
- **Wynton storage is not backed up**
|
||||
- /wynton/home/**[group_name]**/**[user]**
|
||||
- PHI users : /wynton/protected/home/**[group_name]**/**[user]**
|
||||
- User home directory - limited to 500 GiB
|
||||
- PHI users : /wynton/protected/home/**[group_name]**/**[user]**
|
||||
- User home directory - limited to 500 GiB
|
||||
- /wynton/group/**[group_name]**
|
||||
- PHI users : /wynton/protected/group/**[group_name]**
|
||||
- User group directory - disk quota varies by group
|
||||
- Use this directory for any analysis you want to share with your lab
|
||||
- PHI users : /wynton/protected/group/**[group_name]**
|
||||
- User group directory - disk quota varies by group
|
||||
- Use this directory for any analysis you want to share with your lab
|
||||
- [More information on disk quotas](https://wynton.ucsf.edu/hpc/howto/storage-size.html#file-sizes-and-disk-quotas)
|
||||
|
||||
To check your group disk quota run:
|
||||
|
|
@ -217,76 +209,79 @@ To check your group disk quota run:
|
|||
echo 'beegfs-ctl --getquota --storagepoolid=12 --gid "$(id --group)"'
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Scratch - Temporary Storage {.small-bullets}
|
||||
|
||||
- Local **/scratch** - 0.1-1.8 TiB/node storage unique to each compute node
|
||||
- Can only be accessed from the specific compute node
|
||||
- Use this to store intermediate files only needed for a job
|
||||
- Use this to store intermediate files only needed for a job\
|
||||
- **/wynton/scratch** and **/wynton/protected/scratch** (for PHI users)
|
||||
- 703 TiB storage accessible from everywhere
|
||||
- No quotas
|
||||
|
||||
<br></br>
|
||||
|
||||
|
||||
<br></br>
|
||||
|
||||
**Files not used for 2 weeks are automatically deleted**
|
||||
|
||||
|
||||
|
||||
## Gladstone HIVE
|
||||
|
||||
- Gladstone's HIVE storage server is mounted directly to Wynton under **/gladstone**
|
||||
- Only certain HIVE folders are accessible directly on Wynton
|
||||
- Files under **/gladstone** are backed up
|
||||
- Naming: **/gladstone/[lab]**
|
||||
- Directories that are shared between multiple labs can be set up by contacting Gladstone IT
|
||||
- Directories that are shared between multiple labs can be set up by contacting Gladstone IT\
|
||||
- For more information visit the [IT knowledge base page](https://help.gladstone.org/support/solutions/articles/14000033963)
|
||||
|
||||
|
||||
## Storage Advice
|
||||
## Storage Advice {.small-bullets}
|
||||
|
||||
- Always back up anything you store under **/wynton**
|
||||
- Back up your data on **/gladstone** if you have access to it
|
||||
- A large number of jobs reading and writing to these directories will be slower since it is NFS mounted not BeeGFS
|
||||
- If you have access to it keep all of your data on **/gladstone**
|
||||
- A large number of jobs reading and writing to these directories may be slower since it is NFS mounted not BeeGFS
|
||||
- Use the scratch directories to store temporary files
|
||||
|
||||
|
||||
- e.g. A large amount of .fastq that you do not need after the alignment step
|
||||
|
||||
# Data Transfer
|
||||
|
||||
|
||||
|
||||
## Secure Copy - scp
|
||||
|
||||
- Local file to Wynton
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
|
||||
echo "{local}$ scp /path/to/local_file.tsv alice@dt1.wynton.ucsf.edu:/destination/path"
|
||||
```
|
||||
|
||||
- Copy a directory to a folder on Wynton
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
|
||||
echo "{local}$ scp -r local_folder/ alice@dt1.wynton.ucsf.edu:/destination/path"
|
||||
```
|
||||
|
||||
- Copy a single file to Wynton from your local machine
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
|
||||
echo "{local}$ scp alice@dt1.wynton.ucsf.edu:/path/to/local_file.tsv /destination/path"
|
||||
```
|
||||
|
||||
|
||||
## Hands-on
|
||||
|
||||
- Use scp to copy this [file](https://www.dropbox.com/scl/fi/463ymz88q89d2co90kj30/candidatus_carsonella_ruddii_complete_genome.fasta?rlkey=9x64iek2yy149sh2i1r9sse9y&dl=1) into your home directory on Wynton
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## GUI SFTP Clients {.small-bullets}
|
||||
|
||||
- These let you transfer files to and from Wynton using a GUI
|
||||
- [2 factor authentication](https://wynton.ucsf.edu/hpc/get-started/duo-signup.html) may be required
|
||||
- [Cyberduck](https://cyberduck.io/)
|
||||
- Navigate to Preferences -> Transfers -> General
|
||||
- [Cyberduck](https://cyberduck.io/)
|
||||
- Navigate to Preferences -\> Transfers -\> General
|
||||
- change the Transfer Files setting "Use browser connection" instead of "Open Multiple connections"
|
||||
|
||||
- [FileZilla](https://filezilla-project.org/)
|
||||
- [FileZilla](https://filezilla-project.org/)
|
||||
- In the General tab, select ‘SFTP’ as the Protocol instead of ‘FTP’
|
||||
- For Logon Type, select ‘Interactive’ instead of ‘Ask for Password’
|
||||
- Under the Transfer Settings tab, you might need to click the ‘Limit number of simultaneous connections’ and make sure the ‘Maximum number of connections’ is set to 1
|
||||
|
||||
|
||||
## Globus
|
||||
|
||||
- [Globus](https://wynton.ucsf.edu/hpc/transfers/globus.html) is a service for moving, syncing, and sharing large amounts of data
|
||||
|
|
@ -300,27 +295,21 @@ echo "{local}$ scp alice@dt1.wynton.ucsf.edu:/path/to/local_file.tsv /destinatio
|
|||
- Do this from a data transfer node using screen/tmux
|
||||
- Do not use rclone for transfers to Box, follow the [Wynton to UCSF Box](https://wynton.ucsf.edu/hpc/transfers/ucsf-box.html) instructions
|
||||
|
||||
## Poll 1
|
||||
|
||||
Poll 1 - Which of these can you **not** SSH in to?
|
||||
|
||||
|
||||
## Poll 1
|
||||
|
||||
Which of these can you **not** log in to from your computer?
|
||||
|
||||
1. Login Nodes
|
||||
2. Development Nodes
|
||||
3. Data transfer Nodes
|
||||
4. Compute Nodes
|
||||
1. Login Nodes
|
||||
2. Development Nodes
|
||||
3. Data transfer Nodes
|
||||
4. Compute Nodes
|
||||
|
||||
## Poll 2
|
||||
|
||||
The **/wynton** directory is backed up on a nightly basis so do not need to back up the data you store here.
|
||||
|
||||
1. True
|
||||
2. False
|
||||
|
||||
|
||||
The **/wynton** directory is backed up on a nightly basis, so there is no need to back up anything stored here.
|
||||
|
||||
1. True
|
||||
2. False
|
||||
|
||||
# Installing Software
|
||||
|
||||
|
|
@ -331,9 +320,9 @@ The **/wynton** directory is backed up on a nightly basis so do not need to back
|
|||
- <u>Always install software in a development node</u>
|
||||
- Download a precompiled binary or [install from source](https://wynton.ucsf.edu/hpc/howto/install-from-source.html)
|
||||
|
||||
## Install Samtools from Source {.small-list}
|
||||
## Install Samtools from Source {.small-list}
|
||||
|
||||
1. Download and extract source code
|
||||
1. Download and extract source code
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
|
||||
echo '[alice@dev1 ~]$ mkdir -p "/scratch/$USER"
|
||||
|
|
@ -342,13 +331,14 @@ echo '[alice@dev1 ~]$ mkdir -p "/scratch/$USER"
|
|||
[alice@dev1 alice]$ tar -x -f samtools-1.19.2.tar.bz2'
|
||||
```
|
||||
|
||||
2. Create install location and configure
|
||||
2. Create install location and configure
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
|
||||
echo '[alice@dev1 ~]$ mkdir -p $HOME/software/samtools-1.14'
|
||||
echo '[alice@dev1 ~]$ ./configure --prefix=$HOME/software/samtools-1.14'
|
||||
```
|
||||
3. Build and install
|
||||
|
||||
3. Build and install
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
|
||||
echo '[alice@dev1 ~]$ make'
|
||||
|
|
@ -358,9 +348,8 @@ echo '[alice@dev1 ~]$ make install'
|
|||
## Install Nextflow
|
||||
|
||||
- Scientific workflow system with a community maintained set of [core bioinformatics analysis](https://nf-co.re/) pipelines
|
||||
- We will cover an example RNA-seq pipeline in part 2
|
||||
- These can be configured to use the Wynton compute job submission system
|
||||
|
||||
- We will cover an example RNA-seq pipeline in part 2\
|
||||
- These can be configured to use the Wynton compute job submission system
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
|
||||
echo '[alice@dev1 ~]$ cd ~/software'
|
||||
|
|
@ -372,11 +361,8 @@ echo '[alice@dev1 ~]$ wget -qO- https://get.nextflow.io | bash'
|
|||
echo '[alice@dev1 ~]$ nextflow -v'
|
||||
```
|
||||
|
||||
|
||||
|
||||
# Containers
|
||||
|
||||
|
||||
## Motivation {.small-bullets .small-picture}
|
||||
|
||||
- Compute heavy jobs (high RAM, multiple cores) should be run on compute nodes
|
||||
|
|
@ -386,25 +372,28 @@ echo '[alice@dev1 ~]$ nextflow -v'
|
|||
|
||||

|
||||
|
||||
|
||||
## Definitions {.small-bullets}
|
||||
|
||||
- **Containers:** An isolated environment for running software that is created from an *image* file, preventing conflicts with the host system.
|
||||
- **Images:** An ordered collection of root filesystem changes that contain all necessary dependencies, ensuring software run identically across various computing platforms.
|
||||
|
||||
- **Images:** An ordered collection of root filesystem changes that contain all necessary dependencies, ensuring software run identically across various computing platforms.
|
||||
|
||||
## Apptainer {.small-bullets}
|
||||
|
||||
- Wynton supports [Apptainer](https://wynton.ucsf.edu/hpc/software/apptainer.html) (formerly singularity) containers
|
||||
- Wynton supports [Apptainer](https://wynton.ucsf.edu/hpc/software/apptainer.html) (formerly singularity) containers
|
||||
|
||||
- [Docker](https://docs.docker.com/) is a commonly used image creation software, these can be turned into apptainer image files (.sif) easily
|
||||
|
||||
- apptainer run <image_file>
|
||||
- Run predefined script within container
|
||||
- apptainer exec <image_file>
|
||||
- Execute any command within container
|
||||
- apptainer shell <image_file>
|
||||
- Run bash shell within container
|
||||
|
||||
- Run predefined script within container
|
||||
|
||||
- apptainer exec <image_file>
|
||||
|
||||
- Execute any command within container
|
||||
|
||||
- apptainer shell <image_file>
|
||||
|
||||
- Run bash shell within container
|
||||
|
||||
## Example Container - Hello World
|
||||
|
||||
|
|
@ -414,7 +403,7 @@ echo '[alice@dev1 ~]$ nextflow -v'
|
|||
echo '[alice@dev1 ~]$ apptainer pull docker://natalie23gill/hello-world:1.0'
|
||||
```
|
||||
|
||||
- Execute the "hi" command in the container
|
||||
- Execute the "hi" command in the container
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup',comment=NA, highlight=TRUE, echo=FALSE}
|
||||
echo '[alice@dev1 ~]$ apptainer exec hello-world_1.0.sif hi'
|
||||
|
|
@ -428,7 +417,6 @@ echo ' __ __ ____ _ __ __ __ __
|
|||
/_/ /_/\___/_/_/\____/ |__/|__/\____/_/ /_/\__,_/ (_) '
|
||||
```
|
||||
|
||||
|
||||
## Example Container
|
||||
|
||||
- This container has **figlet** installed which creates ASCII art from text input
|
||||
|
|
@ -451,28 +439,22 @@ echo '[alice@dev1 ~]$ apptainer exec hello-world_1.0.sif figlet your_text'
|
|||
echo '[alice@dev1 ~]$ apptainer exec hello-world_1.0.sif cat /Dockerfile'
|
||||
```
|
||||
|
||||
|
||||
# End of Part 1
|
||||
|
||||
## Thank You!
|
||||
|
||||
- Please take some time to fill out the workshop survey if you are not attending part 2:
|
||||
[https://www.surveymonkey.com/r/F75J6VZ](https://www.surveymonkey.com/r/F75J6VZ)
|
||||
|
||||
- Please take some time to fill out the workshop survey if you are not attending part 2:\
|
||||
<https://www.surveymonkey.com/r/F75J6VZ>
|
||||
|
||||
## Upcoming Data Science Training Program Workshops
|
||||
|
||||
|
||||
[Introduction to Linear Mixed Effects Models](https://gladstone.org/events/introduction-linear-mixed-effects-models)
|
||||
[Introduction to Linear Mixed Effects Models](https://gladstone.org/events/introduction-linear-mixed-effects-models)\
|
||||
April 25-April 26, 2024 1-3pm PDT
|
||||
|
||||
[Single Cell RNA-Seq Data Analysis](https://gladstone.org/events/single-cell-rna-seq-data-analysis)
|
||||
[Single Cell RNA-Seq Data Analysis](https://gladstone.org/events/single-cell-rna-seq-data-analysis)\
|
||||
April 29-April 30, 2024 9am-4pm PDT
|
||||
|
||||
[Single Cell ATAC-Seq Data Analysis Part 1](https://gladstone.org/events/single-cell-atac-seq-data-analysis-part-1-1)
|
||||
[Single Cell ATAC-Seq Data Analysis Part 1](https://gladstone.org/events/single-cell-atac-seq-data-analysis-part-1-1)\
|
||||
May 6-May 7, 2024 1-4pm PDT
|
||||
|
||||
|
||||
[Complete Schedule](https://gladstone.org/events?series=189)
|
||||
|
||||
|
||||
[Complete Schedule](https://gladstone.org/events?series=189)
|
||||
|
|
|
|||
|
|
@ -25,12 +25,18 @@ output:
|
|||
<center>*Press the ? key for tips on navigating these slides*</center>
|
||||
|
||||
## Introductions
|
||||
Instructor:
|
||||
|
||||
**Natalie Elphick**
|
||||
Bioinformatician I
|
||||
**Natalie Elphick**
|
||||
*Bioinformatician I*
|
||||
|
||||
**Alex Pico (TA)**
|
||||
Bioinformatics Core Director
|
||||
|
||||
TAs:
|
||||
|
||||
**Alex Pico**
|
||||
*Bioinformatics Core Director*
|
||||
**Michela Traglia**
|
||||
*Senior Statistician*
|
||||
|
||||
|
||||
## Target Audience
|
||||
|
|
@ -244,7 +250,7 @@ Read the [querying jobs](https://wynton.ucsf.edu/hpc/scheduler/list-jobs.html) W
|
|||
|
||||
## Poll 3
|
||||
|
||||
Anything that you can run on a compute node can be run on a development node.
|
||||
Any submitted job to compute nodes can also be run on development nodes.
|
||||
|
||||
1. True
|
||||
2. False
|
||||
|
|
@ -417,6 +423,8 @@ alice1@log1.wynton.ucsf.edu:s password: XXXXXXXXXXXXXXXXXXX
|
|||
|
||||
## Bioinformatics Questions
|
||||
|
||||
For any bioinformatics specific questions feel free to reach out to the Gladstone Bioinformatics Core.
|
||||
|
||||
- Email
|
||||
- [bioinformatics@gladstone.ucsf.edu](mailto:bioinformatics@gladstone.ucsf.edu)
|
||||
- Slack channel #questions-about-bioinformatics
|
||||
|
|
@ -431,6 +439,9 @@ alice1@log1.wynton.ucsf.edu:s password: XXXXXXXXXXXXXXXXXXX
|
|||
- Please take some time to fill out the workshop survey:
|
||||
[https://www.surveymonkey.com/r/F75J6VZ](https://www.surveymonkey.com/r/F75J6VZ)
|
||||
|
||||
- Want some additional Wynton training?
|
||||
Check out the UCSF library [Introduction to Wynton HPC Cluster](https://calendars.library.ucsf.edu/event/12197724) Workshop
|
||||
|
||||
|
||||
## Upcoming Data Science Training Program Workshops
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue