mirror of
https://github.com/gladstone-institutes/Bioinformatics-Workshops.git
synced 2025-11-30 09:45:43 -08:00
add part 2
This commit is contained in:
parent
d6aa2d5bec
commit
47920f7348
6 changed files with 1594 additions and 1527 deletions
|
|
@ -639,13 +639,14 @@ workshop.</p>
|
|||
<h2>Commands</h2>
|
||||
<p>Shell commands are basic instructions used to perform specific
|
||||
tasks.</p>
|
||||
<p><br></p>
|
||||
<p>Basic structure of commands:<br />
|
||||
<code>command_name -[option(s)] [argument(s)]</code></p>
|
||||
<p><code>command_name -[option(s)] [argument(s)]</code></p>
|
||||
<p>Example:</p>
|
||||
<pre class="text"><code>ls -lah part_1</code></pre>
|
||||
<p>Here we are providing multiple options to the <code>ls</code> command
|
||||
and the directory <strong>part_1</strong> as an argument</p>
|
||||
<ul>
|
||||
<li>To cancel a command press CTRL+C</li>
|
||||
</ul>
|
||||
</section>
|
||||
<section id="man-pull-up-the-manual-page-for-a-command" class="slide level2">
|
||||
<h2>man: pull up the manual page for a command</h2>
|
||||
|
|
@ -998,22 +999,22 @@ size)</li>
|
|||
</ul>
|
||||
<pre class="text"><code>ls -lah part_1</code></pre>
|
||||
<pre><code>total 8
|
||||
drwx---rw-@ 4 nelphick staff 128B Apr 14 11:26 .
|
||||
drwxr-xr-x@ 5 nelphick staff 160B Apr 14 11:26 ..
|
||||
-rw-r--rw-@ 1 nelphick staff 0B Apr 11 16:29 .hidden_file.txt
|
||||
-rw-r--rw-@ 1 nelphick staff 60B Apr 12 15:40 list_numbers.tsv</code></pre>
|
||||
drwx---rw-@ 4 nelphick staff 128B Apr 16 21:10 .
|
||||
drwxr-xr-x@ 5 nelphick staff 160B Apr 16 21:10 ..
|
||||
-rw-r--r--@ 1 nelphick staff 0B Apr 11 16:29 .hidden_file.txt
|
||||
-rw-r--r--@ 1 nelphick staff 60B Apr 12 15:40 list_numbers.tsv</code></pre>
|
||||
</section>
|
||||
<section id="cd-move-to-a-directory" class="slide level2">
|
||||
<h2>cd: move to a directory</h2>
|
||||
<pre class="text"><code>cd unix_workshop_2023/part_1
|
||||
ls -l</code></pre>
|
||||
<pre><code>total 8
|
||||
-rw-r--rw-@ 1 nelphick staff 60 Apr 12 15:40 list_numbers.tsv</code></pre>
|
||||
-rw-r--r--@ 1 nelphick staff 60 Apr 12 15:40 list_numbers.tsv</code></pre>
|
||||
<pre><code>cd ..
|
||||
ls -l</code></pre>
|
||||
<pre><code>total 0
|
||||
drwx---rw-@ 4 nelphick staff 128 Apr 14 11:26 part_1
|
||||
drwxr-xr-x@ 2 nelphick staff 64 Apr 14 11:26 part_2</code></pre>
|
||||
drwx---rw-@ 4 nelphick staff 128 Apr 16 21:10 part_1
|
||||
drwxr-xr-x@ 2 nelphick staff 64 Apr 16 21:10 part_2</code></pre>
|
||||
</section></section>
|
||||
<section>
|
||||
<section id="creating-and-altering-files" class="title-slide slide level1">
|
||||
|
|
@ -1094,17 +1095,17 @@ used on <strong>cannot be recovered</strong></li>
|
|||
<h2>Check-in</h2>
|
||||
<p>If you are following along with the commands we have run so far, this
|
||||
is the file structure you should have:</p>
|
||||
<pre class="text"><code>ls ./*</code></pre>
|
||||
<pre><code>./new_directory:
|
||||
<pre class="text"><code>ls *</code></pre>
|
||||
<pre><code>new_directory:
|
||||
new_file1.txt
|
||||
|
||||
./part_1:
|
||||
part_1:
|
||||
list_numbers.tsv
|
||||
|
||||
./part_2:</code></pre>
|
||||
part_2:</code></pre>
|
||||
<ul>
|
||||
<li><code>*</code> is a wildcard so <code>ls</code> will list and
|
||||
directories in the current one</li>
|
||||
<li>“*” represents any number of characters, including zero characters
|
||||
so this command runs ls on all of the folders</li>
|
||||
</ul>
|
||||
</section>
|
||||
<section id="text-editors" class="slide level2">
|
||||
|
|
@ -1296,17 +1297,33 @@ cat part_1/list_numbers.csv</code></pre>
|
|||
<h2>Check-in</h2>
|
||||
<p>If you followed along with the commands we have run so far, you
|
||||
should have this directory structure:</p>
|
||||
<pre class="text"><code>ls ./*</code></pre>
|
||||
<pre><code>./new_directory:
|
||||
<pre class="text"><code>ls *</code></pre>
|
||||
<pre><code>new_directory:
|
||||
new_file1.txt
|
||||
|
||||
./part_1:
|
||||
part_1:
|
||||
list_numbers.csv
|
||||
list_numbers.tsv
|
||||
subset_list_numbers.tsv
|
||||
|
||||
./part_2:
|
||||
part_2:
|
||||
homo_sapiens.refseq.tsv.gz</code></pre>
|
||||
</section>
|
||||
<section id="sort-sort-values" class="slide level2">
|
||||
<h2>sort: sort values</h2>
|
||||
<pre class="text"><code>cat part_1/list_numbers.csv | cut -d "," -f 1 | sort -n</code></pre>
|
||||
<pre><code>1
|
||||
7
|
||||
13</code></pre>
|
||||
<ul>
|
||||
<li>-n : sort numerically (default is alphabetical)</li>
|
||||
</ul>
|
||||
<pre class="text"><code>cat part_1/list_numbers.csv | cut -d "," -f 8 | sort -nu</code></pre>
|
||||
<pre><code>1
|
||||
3</code></pre>
|
||||
<ul>
|
||||
<li>-u : sort and remove duplicates</li>
|
||||
</ul>
|
||||
</section></section>
|
||||
<section>
|
||||
<section id="end-of-part-1" class="title-slide slide level1">
|
||||
|
|
@ -1316,6 +1333,7 @@ homo_sapiens.refseq.tsv.gz</code></pre>
|
|||
<section id="other-helpful-commands" class="slide level2 small-bullets">
|
||||
<h2 class="small-bullets">Other helpful commands</h2>
|
||||
<ul>
|
||||
<li><code>wc</code> : count lines and words</li>
|
||||
<li><code>chmod</code> : Change the permissions of a file or
|
||||
directory</li>
|
||||
<li><code>chown</code> : Change the owner of a file or directory</li>
|
||||
|
|
|
|||
1106
docs/Intro_to_Unix_Part_2.html
Normal file
1106
docs/Intro_to_Unix_Part_2.html
Normal file
File diff suppressed because one or more lines are too long
|
|
@ -2,6 +2,12 @@
|
|||
title: "Introduction to Unix Command-line - Part 1"
|
||||
author: "Natalie Elphick"
|
||||
date: "April 17th"
|
||||
knit: (function(input, ...) {
|
||||
rmarkdown::render(
|
||||
input,
|
||||
output_dir = "../docs"
|
||||
)
|
||||
})
|
||||
output:
|
||||
revealjs::revealjs_presentation:
|
||||
css: style.css
|
||||
|
|
@ -119,9 +125,6 @@ Both bash and zsh should be able to run all of the commands in this workshop.
|
|||
|
||||
Shell commands are basic instructions used to perform specific tasks.
|
||||
|
||||
<br>
|
||||
|
||||
Basic structure of commands:
|
||||
`command_name -[option(s)] [argument(s)]`
|
||||
|
||||
|
||||
|
|
@ -132,6 +135,8 @@ ls -lah part_1
|
|||
|
||||
Here we are providing multiple options to the `ls` command and the directory **part_1** as an argument
|
||||
|
||||
- To cancel a command press CTRL+C
|
||||
|
||||
## man: pull up the manual page for a command
|
||||
|
||||
|
||||
|
|
@ -245,6 +250,9 @@ ls .
|
|||
ls -lah part_1
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
## cd: move to a directory
|
||||
|
||||
```{r, engine='bash', eval=FALSE, results='markup', highlight=FALSE, comment=NA}
|
||||
|
|
@ -344,10 +352,11 @@ du -h */*
|
|||
|
||||
If you are following along with the commands we have run so far, this is the file structure you should have:
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA}
|
||||
ls ./*
|
||||
ls *
|
||||
```
|
||||
|
||||
- `*` is a wildcard so `ls` will list and directories in the current one
|
||||
- "*" represents any number of characters, including zero characters so this command runs ls on all of the folders
|
||||
|
||||
|
||||
## Text editors
|
||||
|
||||
|
|
@ -512,13 +521,31 @@ cat part_1/list_numbers.csv
|
|||
If you followed along with the commands we have run so far, you should have this directory structure:
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA}
|
||||
ls ./*
|
||||
ls *
|
||||
```
|
||||
|
||||
|
||||
|
||||
## sort: sort values
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
cat part_1/list_numbers.csv | cut -d "," -f 1 | sort -n
|
||||
```
|
||||
|
||||
- -n : sort numerically (default is alphabetical)
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
cat part_1/list_numbers.csv | cut -d "," -f 8 | sort -nu
|
||||
```
|
||||
|
||||
- -u : sort and remove duplicates
|
||||
|
||||
|
||||
# End of Part 1
|
||||
|
||||
## Other helpful commands {.small-bullets}
|
||||
|
||||
- `wc` : count lines and words
|
||||
- `chmod` : Change the permissions of a file or directory
|
||||
- `chown` : Change the owner of a file or directory
|
||||
- `df` : Display information about disk usage and available space
|
||||
|
|
@ -573,4 +600,6 @@ rm -r new_directory
|
|||
rm part_2/homo_sapiens.refseq.tsv.gz
|
||||
rm part_1/subset_list_numbers.tsv
|
||||
rm part_1/list_numbers.csv
|
||||
```
|
||||
```
|
||||
|
||||
|
||||
|
|
|
|||
File diff suppressed because one or more lines are too long
|
|
@ -2,6 +2,12 @@
|
|||
title: "Introduction to Unix Command-line - Part 2"
|
||||
author: "Natalie Elphick"
|
||||
date: "April 18th"
|
||||
knit: (function(input, ...) {
|
||||
rmarkdown::render(
|
||||
input,
|
||||
output_dir = "../docs"
|
||||
)
|
||||
})
|
||||
output:
|
||||
revealjs::revealjs_presentation:
|
||||
css: style.css
|
||||
|
|
@ -26,10 +32,407 @@ Bioinformatician I
|
|||
**Yihang Xin (TA)**
|
||||
Software Engineer II
|
||||
|
||||
<br>
|
||||
|
||||
# Setup
|
||||
|
||||
Run the following commands if you did not attend part 1:
|
||||
|
||||
|
||||
```{r, engine='bash', eval=FALSE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
mkdir unix_workshop
|
||||
```
|
||||
|
||||
|
||||
```{r, engine='bash', eval=FALSE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
cd unix_workshop
|
||||
```
|
||||
|
||||
|
||||
```{r, engine='bash', eval=FALSE, results="hide", highlight=FALSE, comment=NA, echo = TRUE}
|
||||
curl -L -o unix_workshop_2023.tar.gz 'https://www.dropbox.com/s/smb12au2y82jmvq/unix_workshop_2023.tar.gz?dl=0'
|
||||
```
|
||||
|
||||
|
||||
```{r, engine='bash', eval=FALSE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
tar -xzf unix_workshop_2023.tar.gz
|
||||
```
|
||||
|
||||
|
||||
```{r, engine='bash', eval=FALSE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
cd unix_workshop_2023
|
||||
```
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results="hide", highlight=FALSE, comment=NA, echo = TRUE}
|
||||
curl -o part_2/homo_sapiens.refseq.tsv.gz https://ftp.ensembl.org/pub/current_tsv/homo_sapiens/Homo_sapiens.GRCh38.109.refseq.tsv.gz
|
||||
```
|
||||
|
||||
# File Compression
|
||||
|
||||
## Command-line tools for compression
|
||||
|
||||
- Compression reduces the size of a file
|
||||
- `gzip` : compresses a file and replaces it with a compressed version (.gz)
|
||||
- `tar` : create and manipulate archive files
|
||||
|
||||
<br>
|
||||
|
||||
**Archive**: a single file that contains one or more files and/or folders that have been compressed
|
||||
|
||||
|
||||
|
||||
## gzip/gunzip: compress/uncompress a file
|
||||
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
gunzip part_2/homo_sapiens.refseq.tsv.gz
|
||||
du -h part_2/homo_sapiens.refseq.tsv
|
||||
```
|
||||
|
||||
- The uncompressed file is 27 megabytes
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
gzip part_2/homo_sapiens.refseq.tsv
|
||||
du -h part_2/homo_sapiens.refseq.tsv.gz
|
||||
```
|
||||
|
||||
- Compressing it makes it a 10th of the size
|
||||
|
||||
## Note
|
||||
|
||||
- The magnitude of the compression depends on type of data
|
||||
- The units for file sizes are not the same across all systems
|
||||
- Some systems define a kilobyte as 1000 bytes, while others define it as 1024 bytes
|
||||
|
||||
|
||||
|
||||
## tar: compressing folders into archives
|
||||
|
||||
- Does not provide compression on its own, it uses gzip to create compressed archive files
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
tar -czf part_1.tar.gz part_1
|
||||
ls -l
|
||||
```
|
||||
|
||||
- -c: create a new archive
|
||||
- -f: specify the name of the archive file
|
||||
- -z: compress the archive with gzip
|
||||
|
||||
## Unarchiving
|
||||
|
||||
- We did this in part 1 to unarchive the workshop folders
|
||||
|
||||
|
||||
```{r, engine='bash', eval=FALSE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
tar -xzf part_1.tar.gz
|
||||
```
|
||||
- -x: extract an archive
|
||||
- -z: uncompress the archive with gzip
|
||||
- -f: specify the name of the archive file
|
||||
|
||||
|
||||
## gunzip -c: cat compressed files
|
||||
|
||||
- To avoid uncompressing a large file just to read its contents, we can use `gunzip -c`
|
||||
- This will output the the file to the terminal
|
||||
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
gunzip -c part_2/homo_sapiens.refseq.tsv.gz | head
|
||||
```
|
||||
|
||||
|
||||
|
||||
# System Variables
|
||||
|
||||
## What are system variables?
|
||||
|
||||
- Special variables that contain information about the system's configuration and state
|
||||
- Used by the OS and programs to change their behavior based on the system's state
|
||||
|
||||
Example:
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
echo $HOME
|
||||
```
|
||||
|
||||
|
||||
## Common System Variables
|
||||
|
||||
- **$PWD** : The working directory
|
||||
- **$HOME** : The current user's home directory
|
||||
- **$PS1** : the shell prompt string
|
||||
- **$TEMP** : location of temporary files
|
||||
|
||||
|
||||
|
||||
## PATH: locations of executable files
|
||||
|
||||
- When you enter a command, the OS searches the directories in the `$PATH` to find its associated executable file
|
||||
|
||||
```{r, engine='bash', eval=FALSE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
echo $PATH
|
||||
```
|
||||
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE}
|
||||
echo "/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin:/usr/local/mysql/bin"
|
||||
```
|
||||
|
||||
|
||||
|
||||
- The OS will check these directories in the order they appear and use the first executable it finds
|
||||
|
||||
## export: set system variables
|
||||
|
||||
- Useful for setting variables you want to be used across programs
|
||||
- You can add new software to your `$PATH` like this:
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE}
|
||||
echo 'export PATH="/path/to/new/software:$PATH"'
|
||||
```
|
||||
|
||||
- This will modify the `$PATH` for the current terminal session
|
||||
|
||||
## Modifying the PATH for all future terminal sessions
|
||||
|
||||
- Add the export line to your `~/.bashrc` or `~/.zshrc`
|
||||
- **Proceed with caution**
|
||||
- Make backups of these and read this [guide](https://tldp.org/LDP/Bash-Beginners-Guide/html/sect_03_01.html)
|
||||
- Changing the `$PATH` incorrectly can <span style="color:#b01212;font-weight:bold">break system functionality</span>
|
||||
|
||||
|
||||
## which: locate the executable associated with a command
|
||||
|
||||
- This command shows the location of the executable that the OS finds
|
||||
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
which ls
|
||||
```
|
||||
|
||||
- Useful to check if there are multiple versions of a software installed
|
||||
|
||||
# Shell Scripting
|
||||
|
||||
|
||||
## What is a script?
|
||||
|
||||
- Scripts are executable files for reusing code
|
||||
- By convention scripts end in `.sh`
|
||||
- This first line of the script is called the shebang
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE}
|
||||
echo "nano part_2/example_script.sh"
|
||||
cat ../materials/example_script.sh > part_2/example_script.sh
|
||||
```
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE}
|
||||
cat ../materials/example_script.sh | head -n 1
|
||||
```
|
||||
|
||||
- The text that follows `#!` tells the OS where the interpreter is
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
which bash
|
||||
```
|
||||
|
||||
## chmod: making a script executable
|
||||
|
||||
- By default, files are not executable
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
ls -l part_2/example_script.sh
|
||||
```
|
||||
|
||||
- We can set the execute bit like this
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
chmod u+x part_2/example_script.sh
|
||||
ls -l part_2/example_script.sh
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE}
|
||||
cat ../materials/example_script.sh
|
||||
```
|
||||
|
||||
## Let's run it
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
./part_2/example_script.sh part_2/homo_sapiens.refseq.tsv.gz
|
||||
```
|
||||
|
||||
## Loops
|
||||
|
||||
- Useful for iterating over lines of a file or lists
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
for i in {1..3}
|
||||
do
|
||||
|
||||
echo $i
|
||||
|
||||
done
|
||||
```
|
||||
|
||||
## While loops
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
count=0
|
||||
|
||||
while [ $count -lt 5 ] # loop while count is less than 5
|
||||
do
|
||||
echo $count
|
||||
count=$((count+1))
|
||||
done
|
||||
```
|
||||
|
||||
|
||||
## If statements
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
x=5
|
||||
|
||||
if [ $x -gt 10 ] # check if x is greater than 10
|
||||
then
|
||||
echo "x is greater than 10"
|
||||
else
|
||||
echo "x is not greater than 10"
|
||||
fi # end if statement
|
||||
```
|
||||
|
||||
# Other Useful Commands
|
||||
|
||||
## sed : stream editor
|
||||
|
||||
- Parses and transforms text, using a compact programming language
|
||||
- It reads and modifies text line by line from a file or input stream
|
||||
- Supports [regular expressions](https://v4.software-carpentry.org/regexp/index.html)
|
||||
- Useful for replacing text
|
||||
|
||||
Example:
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE}
|
||||
echo "sed 's/search_string/replace_string/g' input.txt > output.txt"
|
||||
```
|
||||
|
||||
## ssh : secure shell - conect to remote server
|
||||
|
||||
- Logging in to a remote server
|
||||
- Remote desktop for the terminal
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE}
|
||||
echo "ssh username@remote"
|
||||
```
|
||||
- The `username` would be your user on the remote server and `remote` is the hostname or IP address of the remote server or computer
|
||||
|
||||
## scp : secure copy
|
||||
|
||||
- Copy files from a remote server or computer
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE}
|
||||
echo "scp [options] [source] [destination]"
|
||||
```
|
||||
- Copy from local to remote
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE}
|
||||
echo "scp /path/to/local/file.txt username@remote:/path/to/remote/directory/"
|
||||
```
|
||||
- Copy from remote to local
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE}
|
||||
echo "scp username@remote:/path/to/file.txt /path/to/local/directory/"
|
||||
```
|
||||
|
||||
- -r : copy a whole folder
|
||||
|
||||
# AWK
|
||||
|
||||
## awk : processing structured data
|
||||
|
||||
- A small programming language that is designed to work with structured data
|
||||
- Has more complicated syntax but is faster at processing large files
|
||||
- Designed to read a file or input stream line by line
|
||||
- Operates on **records** (lines) and **fields** (columns)
|
||||
|
||||
Basic command:
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE}
|
||||
echo "awk options 'pattern {action}' input_file"
|
||||
```
|
||||
|
||||
## Example : Sum the first 2 columns of a file
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
awk -F '\t' '{print $1+$2}' part_1/list_numbers.tsv
|
||||
```
|
||||
|
||||
- -F : provides the field separator
|
||||
- `$1,$2` : the first and second fields
|
||||
|
||||
## Example : Find the average of a column
|
||||
|
||||
- For this example we only want the average if the 5th column equals "RefSeq_mRNA"
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE}
|
||||
gunzip -c part_2/homo_sapiens.refseq.tsv.gz | \
|
||||
awk -F '\t' '$5 == "RefSeq_mRNA" {sum += $7; count++} \
|
||||
END {print sum / count}'
|
||||
|
||||
```
|
||||
|
||||
## Resources for learning AWK and sed
|
||||
|
||||
- [The GNU AWK manual](https://www.gnu.org/software/gawk/manual/gawk.html)
|
||||
- [AWK Tutorial by Bruce Barnett](https://www.grymoire.com/Unix/Awk.html)
|
||||
- [Sed Tutorial by Bruce Barnett](https://www.grymoire.com/Unix/Sed.html)
|
||||
|
||||
|
||||
|
||||
# End of Part 2
|
||||
|
||||
|
||||
|
||||
|
||||
## Survey
|
||||
|
||||
- Please take some time to fill out the workshop survey:
|
||||
https://www.surveymonkey.com/r/DY7K5ZY
|
||||
|
||||
|
||||
## Additional learning materials
|
||||
|
||||
- Software carpentry provides a self paced course:
|
||||
- [The Unix Shell](https://swcarpentry.github.io/shell-novice/)
|
||||
|
||||
- Free online books:
|
||||
- [The Unix Workbench](https://seankross.com/the-unix-workbench/index.html)
|
||||
- [The Linux Command Line](http://linuxcommand.org/tlcl.php)
|
||||
|
||||
|
||||
|
||||
|
||||
## Upcoming Data Science Training Program Workshops
|
||||
|
||||
[Linear Mixed Effects Modeling](https://gladstone.org/index.php/events/linear-mixed-effects-modeling-0)
|
||||
April 24-April 25, 2023 10:00am-12:00pm PDT
|
||||
|
||||
[Machine Learning](https://gladstone.org/index.php/events/machine-learning)
|
||||
April 28, 2023 10:00am-12:00pm PDT
|
||||
|
||||
[Advanced Cytoscape Automation](https://gladstone.org/index.php/events/advanced-cytoscape-automation-2)
|
||||
May 2, 2023 1:00-4:00pm PDT
|
||||
|
||||
[Introduction to RNA-Seq Analysis](https://gladstone.org/index.php/events/introduction-rna-seq-analysis-4)
|
||||
May 15-May 16, 2023 9:00am-12:00pm PDT
|
||||
|
||||
|
||||
|
||||
|
||||
```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo=FALSE}
|
||||
rm part_2/example_script.sh
|
||||
rm part_2/homo_sapiens.refseq.tsv*
|
||||
rm part_1.tar.gz
|
||||
```
|
||||
|
||||
|
||||
|
|
|
|||
11
intro-unix-command-line/materials/example_script.sh
Normal file
11
intro-unix-command-line/materials/example_script.sh
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
#!/bin/bash
|
||||
|
||||
# This is a comment. Comments are ignored by the shell.
|
||||
|
||||
# $1 is the first argument passed to the script
|
||||
echo "Counting the genes in $1"
|
||||
|
||||
# count the unique genes in the file
|
||||
u_genes=$(gunzip -c $1 | cut -f 1 | sort -u | wc -l)
|
||||
|
||||
echo "There are $u_genes unique genes in $1"
|
||||
Loading…
Add table
Add a link
Reference in a new issue