From 47920f73481db613fdc637462eb8fc242b926b17 Mon Sep 17 00:00:00 2001 From: Natalie Elphick Date: Sun, 16 Apr 2023 21:12:31 -0700 Subject: [PATCH] add part 2 --- docs/Intro_to_Unix_Part_1.html | 58 +- docs/Intro_to_Unix_Part_2.html | 1106 ++++++++++++ .../Intro_to_Unix_Part_1.Rmd | 43 +- .../Intro_to_Unix_Part_1.html | 1500 ----------------- .../Intro_to_Unix_Part_2.Rmd | 403 +++++ .../materials/example_script.sh | 11 + 6 files changed, 1594 insertions(+), 1527 deletions(-) create mode 100644 docs/Intro_to_Unix_Part_2.html delete mode 100644 intro-unix-command-line/Intro_to_Unix_Part_1.html create mode 100644 intro-unix-command-line/materials/example_script.sh diff --git a/docs/Intro_to_Unix_Part_1.html b/docs/Intro_to_Unix_Part_1.html index 4af4529..d478ba7 100644 --- a/docs/Intro_to_Unix_Part_1.html +++ b/docs/Intro_to_Unix_Part_1.html @@ -639,13 +639,14 @@ workshop.

Commands

Shell commands are basic instructions used to perform specific tasks.

-


-

Basic structure of commands:
-command_name -[option(s)] [argument(s)]

+

command_name -[option(s)] [argument(s)]

Example:

ls -lah part_1

Here we are providing multiple options to the ls command and the directory part_1 as an argument

+

man: pull up the manual page for a command

@@ -998,22 +999,22 @@ size)
ls -lah part_1
total 8
-drwx---rw-@ 4 nelphick  staff   128B Apr 14 11:26 .
-drwxr-xr-x@ 5 nelphick  staff   160B Apr 14 11:26 ..
--rw-r--rw-@ 1 nelphick  staff     0B Apr 11 16:29 .hidden_file.txt
--rw-r--rw-@ 1 nelphick  staff    60B Apr 12 15:40 list_numbers.tsv
+drwx---rw-@ 4 nelphick staff 128B Apr 16 21:10 . +drwxr-xr-x@ 5 nelphick staff 160B Apr 16 21:10 .. +-rw-r--r--@ 1 nelphick staff 0B Apr 11 16:29 .hidden_file.txt +-rw-r--r--@ 1 nelphick staff 60B Apr 12 15:40 list_numbers.tsv

cd: move to a directory

cd unix_workshop_2023/part_1
 ls -l
total 8
--rw-r--rw-@ 1 nelphick  staff  60 Apr 12 15:40 list_numbers.tsv
+-rw-r--r--@ 1 nelphick staff 60 Apr 12 15:40 list_numbers.tsv
cd ..
 ls -l
total 0
-drwx---rw-@ 4 nelphick  staff  128 Apr 14 11:26 part_1
-drwxr-xr-x@ 2 nelphick  staff   64 Apr 14 11:26 part_2
+drwx---rw-@ 4 nelphick staff 128 Apr 16 21:10 part_1 +drwxr-xr-x@ 2 nelphick staff 64 Apr 16 21:10 part_2
@@ -1094,17 +1095,17 @@ used on cannot be recovered

Check-in

If you are following along with the commands we have run so far, this is the file structure you should have:

-
ls ./*
-
./new_directory:
+
ls *
+
new_directory:
 new_file1.txt
 
-./part_1:
+part_1:
 list_numbers.tsv
 
-./part_2:
+part_2:
    -
  • * is a wildcard so ls will list and -directories in the current one
  • +
  • “*” represents any number of characters, including zero characters +so this command runs ls on all of the folders
@@ -1296,17 +1297,33 @@ cat part_1/list_numbers.csv

Check-in

If you followed along with the commands we have run so far, you should have this directory structure:

-
ls ./*
-
./new_directory:
+
ls *
+
new_directory:
 new_file1.txt
 
-./part_1:
+part_1:
 list_numbers.csv
 list_numbers.tsv
 subset_list_numbers.tsv
 
-./part_2:
+part_2:
 homo_sapiens.refseq.tsv.gz
+
+
+

sort: sort values

+
cat part_1/list_numbers.csv | cut -d "," -f 1 | sort -n
+
1
+7
+13
+
    +
  • -n : sort numerically (default is alphabetical)
  • +
+
cat part_1/list_numbers.csv | cut -d "," -f 8 | sort -nu
+
1
+3
+
    +
  • -u : sort and remove duplicates
  • +
@@ -1316,6 +1333,7 @@ homo_sapiens.refseq.tsv.gz

Other helpful commands

    +
  • wc : count lines and words
  • chmod : Change the permissions of a file or directory
  • chown : Change the owner of a file or directory
  • diff --git a/docs/Intro_to_Unix_Part_2.html b/docs/Intro_to_Unix_Part_2.html new file mode 100644 index 0000000..8d58c61 --- /dev/null +++ b/docs/Intro_to_Unix_Part_2.html @@ -0,0 +1,1106 @@ + + + + + + + Introduction to Unix Command-line - Part 2 + + + + + + + + + + + + + + + + + +
    +
    + +
    +

    Introduction to Unix Command-line - Part 2

    +

    Natalie Elphick

    +

    April 18th

    +
    + +
    +

    +
    +Press the ? key for tips on navigating these slides +
    +
    +
    +

    Introductions

    +

    Natalie Elphick
    +Bioinformatician I

    +


    +

    Yihang Xin (TA)
    +Software Engineer II

    +


    +
    +
    +

    Setup

    +

    Run the following commands if you did not attend part 1:

    +
    mkdir unix_workshop
    +
    cd unix_workshop
    +
    curl -L -o unix_workshop_2023.tar.gz 'https://www.dropbox.com/s/smb12au2y82jmvq/unix_workshop_2023.tar.gz?dl=0'
    +
    tar -xzf unix_workshop_2023.tar.gz
    +
    cd unix_workshop_2023
    +
    curl -o part_2/homo_sapiens.refseq.tsv.gz https://ftp.ensembl.org/pub/current_tsv/homo_sapiens/Homo_sapiens.GRCh38.109.refseq.tsv.gz
    +
    + +
    +
    +

    File Compression

    + +
    +
    +

    Command-line tools for compression

    +
      +
    • Compression reduces the size of a file
    • +
    • gzip : compresses a file and replaces it with a +compressed version (.gz)
    • +
    • tar : create and manipulate archive files
    • +
    +


    +

    Archive: a single file that contains one or more +files and/or folders that have been compressed

    +
    +
    +

    gzip/gunzip: compress/uncompress a file

    +
    gunzip part_2/homo_sapiens.refseq.tsv.gz
    +du -h part_2/homo_sapiens.refseq.tsv
    +
     26M    part_2/homo_sapiens.refseq.tsv
    +
      +
    • The uncompressed file is 27 megabytes
    • +
    +
    gzip part_2/homo_sapiens.refseq.tsv
    +du -h part_2/homo_sapiens.refseq.tsv.gz
    +
    2.7M    part_2/homo_sapiens.refseq.tsv.gz
    +
      +
    • Compressing it makes it a 10th of the size
    • +
    +
    +
    +

    Note

    +
      +
    • The magnitude of the compression depends on type of data
    • +
    • The units for file sizes are not the same across all systems +
        +
      • Some systems define a kilobyte as 1000 bytes, while others define it +as 1024 bytes
      • +
    • +
    +
    +
    +

    tar: compressing folders into archives

    +
      +
    • Does not provide compression on its own, it uses gzip to create +compressed archive files
    • +
    +
    tar -czf part_1.tar.gz part_1
    +ls -l
    +
    total 8
    +drwx---rw-@ 4 nelphick  staff  128 Apr 16 21:10 part_1
    +-rw-r--r--@ 1 nelphick  staff  814 Apr 16 21:11 part_1.tar.gz
    +drwxr-xr-x@ 3 nelphick  staff   96 Apr 16 21:11 part_2
    +
      +
    • -c: create a new archive
    • +
    • -f: specify the name of the archive file
    • +
    • -z: compress the archive with gzip
    • +
    +
    +
    +

    Unarchiving

    +
      +
    • We did this in part 1 to unarchive the workshop folders
    • +
    +
    tar -xzf part_1.tar.gz
    +
      +
    • -x: extract an archive
    • +
    • -z: uncompress the archive with gzip
    • +
    • -f: specify the name of the archive file
    • +
    +
    +
    +

    gunzip -c: cat compressed files

    +
      +
    • To avoid uncompressing a large file just to read its contents, we +can use gunzip -c
    • +
    • This will output the the file to the terminal
    • +
    +
    gunzip -c part_2/homo_sapiens.refseq.tsv.gz | head
    +
    gene_stable_id  transcript_stable_id    protein_stable_id   xref    db_name info_type   source_identity xref_identity   linkage_type
    +ENSG00000160072 ENST00000673477 ENSP00000500094 NP_001304167    RefSeq_peptide  INFERRED_PAIR   -   -   -
    +ENSG00000160072 ENST00000673477 ENSP00000500094 NP_114127   RefSeq_peptide  DIRECT  100 100 -
    +ENSG00000160072 ENST00000673477 ENSP00000500094 NM_001317238    RefSeq_mRNA DIRECT  90  82  -
    +ENSG00000160072 ENST00000673477 ENSP00000500094 NM_031921   RefSeq_mRNA DIRECT  100 100 -
    +ENSG00000160072 ENST00000673477 ENSP00000500094 XM_005244806    RefSeq_mRNA_predicted   DIRECT  45  94  -
    +ENSG00000160072 ENST00000673477 ENSP00000500094 XM_011542241    RefSeq_mRNA_predicted   DIRECT  35  87  -
    +ENSG00000160072 ENST00000673477 ENSP00000500094 XM_011542244    RefSeq_mRNA_predicted   DIRECT  90  87  -
    +ENSG00000160072 ENST00000673477 ENSP00000500094 XM_047431593    RefSeq_mRNA_predicted   SEQUENCE_MATCH  90  96  -
    +ENSG00000160072 ENST00000673477 ENSP00000500094 XR_001737468    RefSeq_ncRNA_predicted  DIRECT  -   -   -
    +
    +
    +
    +

    System Variables

    + +
    +
    +

    What are system variables?

    +
      +
    • Special variables that contain information about the system’s +configuration and state
    • +
    • Used by the OS and programs to change their behavior based on the +system’s state
    • +
    +

    Example:

    +
    echo $HOME
    +
    /Users/nelphick
    +
    +
    +

    Common System Variables

    +
      +
    • $PWD : The working directory
    • +
    • $HOME : The current user’s home directory
    • +
    • $PS1 : the shell prompt string
    • +
    • $TEMP : location of temporary files
    • +
    +
    +
    +

    PATH: locations of executable files

    +
      +
    • When you enter a command, the OS searches the directories in the +$PATH to find its associated executable file
    • +
    +
    echo $PATH
    +
    /usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin:/usr/local/mysql/bin
    +
      +
    • The OS will check these directories in the order they appear and use +the first executable it finds
    • +
    +
    +
    +

    export: set system variables

    +
      +
    • Useful for setting variables you want to be used across +programs
    • +
    • You can add new software to your $PATH like this:
    • +
    +
    export PATH="/path/to/new/software:$PATH"
    +
      +
    • This will modify the $PATH for the current terminal +session
    • +
    +
    +
    +

    Modifying the PATH for all future terminal sessions

    +
      +
    • Add the export line to your ~/.bashrc or +~/.zshrc
    • +
    • Proceed with caution +
        +
      • Make backups of these and read this guide
      • +
      • Changing the $PATH incorrectly can break system +functionality
      • +
    • +
    +
    +
    +

    which: locate the executable associated with a command

    +
      +
    • This command shows the location of the executable that the OS +finds
    • +
    +
    which ls
    +
    /bin/ls
    +
      +
    • Useful to check if there are multiple versions of a software +installed
    • +
    +
    +
    +
    +

    Shell Scripting

    + +
    +
    +

    What is a script?

    +
      +
    • Scripts are executable files for reusing code
    • +
    • By convention scripts end in .sh
    • +
    • This first line of the script is called the shebang
    • +
    +
    nano part_2/example_script.sh
    +
    #!/bin/bash
    +
      +
    • The text that follows #! tells the OS where the +interpreter is
    • +
    +
    which bash
    +
    /bin/bash
    +
    +
    +

    chmod: making a script executable

    +
      +
    • By default, files are not executable
    • +
    +
    ls -l part_2/example_script.sh
    +
    -rw-r--r--@ 1 nelphick  staff  287 Apr 16 21:11 part_2/example_script.sh
    +
      +
    • We can set the execute bit like this
    • +
    +
    chmod u+x part_2/example_script.sh
    +ls -l part_2/example_script.sh
    +
    -rwxr--r--@ 1 nelphick  staff  287 Apr 16 21:11 part_2/example_script.sh
    +
    +
    +

    Example

    +
    #!/bin/bash
    +
    +# This is a comment. Comments are ignored by the shell.
    +
    +# $1 is the first argument passed to the script
    +echo "Counting the genes in $1"
    +
    +# count the unique genes in the file
    +u_genes=$(gunzip -c $1 | cut -f 1 | sort -u | wc -l)
    +
    +echo "There are $u_genes unique genes in $1"
    +
    +
    +

    Let’s run it

    +
    ./part_2/example_script.sh part_2/homo_sapiens.refseq.tsv.gz
    +
    Counting the genes in part_2/homo_sapiens.refseq.tsv.gz
    +There are    32538 unique genes in part_2/homo_sapiens.refseq.tsv.gz
    +
    +
    +

    Loops

    +
      +
    • Useful for iterating over lines of a file or lists
    • +
    +
    for i in {1..3}
    +do
    +
    +echo $i
    +
    +done
    +
    1
    +2
    +3
    +
    +
    +

    While loops

    +
    count=0
    +
    +while [ $count -lt 5 ]        # loop while count is less than 5
    +do
    +    echo $count
    +    count=$((count+1))
    +done
    +
    0
    +1
    +2
    +3
    +4
    +
    +
    +

    If statements

    +
    x=5
    +
    +if [ $x -gt 10 ]                      # check if x is greater than 10
    +then
    +    echo "x is greater than 10"
    +else
    +    echo "x is not greater than 10"
    +fi                                    # end if statement
    +
    x is not greater than 10
    +
    +
    +
    +

    Other Useful Commands

    + +
    +
    +

    sed : stream editor

    +
      +
    • Parses and transforms text, using a compact programming +language
    • +
    • It reads and modifies text line by line from a file or input +stream
    • +
    • Supports regular +expressions
    • +
    • Useful for replacing text
    • +
    +

    Example:

    +
    sed 's/search_string/replace_string/g' input.txt > output.txt
    +
    +
    +

    ssh : secure shell - conect to remote server

    +
      +
    • Logging in to a remote server
    • +
    • Remote desktop for the terminal
    • +
    +
    ssh username@remote
    +
      +
    • The username would be your user on the remote server +and remote is the hostname or IP address of the remote +server or computer
    • +
    +
    +
    +

    scp : secure copy

    +
      +
    • Copy files from a remote server or computer
    • +
    +
    scp [options] [source] [destination]
    +
      +
    • Copy from local to remote
    • +
    +
    scp /path/to/local/file.txt username@remote:/path/to/remote/directory/
    +
      +
    • Copy from remote to local
    • +
    +
    scp username@remote:/path/to/file.txt /path/to/local/directory/
    +
      +
    • -r : copy a whole folder
    • +
    +
    +
    +
    +

    AWK

    + +
    +
    +

    awk : processing structured data

    +
      +
    • A small programming language that is designed to work with +structured data
    • +
    • Has more complicated syntax but is faster at processing large +files
    • +
    • Designed to read a file or input stream line by line
    • +
    • Operates on records (lines) and +fields (columns)
    • +
    +

    Basic command:

    +
    awk options 'pattern {action}' input_file
    +
    +
    +

    Example : Sum the first 2 columns of a file

    +
    awk -F '\t' '{print $1+$2}' part_1/list_numbers.tsv
    +
    4
    +15
    +17
    +
      +
    • -F : provides the field separator
    • +
    • $1,$2 : the first and second fields
    • +
    +
    +
    +

    Example : Find the average of a column

    +
      +
    • For this example we only want the average if the 5th column equals +“RefSeq_mRNA”
    • +
    +
    gunzip -c part_2/homo_sapiens.refseq.tsv.gz | \
    +awk -F '\t' '$5 == "RefSeq_mRNA" {sum += $7; count++} \
    +END {print sum / count}'
    +
    +
    65.4642
    +
    +
    +

    Resources for learning AWK and sed

    + +
    +
    +
    +

    End of Part 2

    + +
    +
    +

    Survey

    + +
    +
    +

    Additional learning materials

    + +
    +
    +

    Upcoming Data Science Training Program Workshops

    +

    Linear +Mixed Effects Modeling
    +April 24-April 25, 2023 10:00am-12:00pm PDT

    +

    Machine +Learning
    +April 28, 2023 10:00am-12:00pm PDT

    +

    Advanced +Cytoscape Automation
    +May 2, 2023 1:00-4:00pm PDT

    +

    Introduction +to RNA-Seq Analysis
    +May 15-May 16, 2023 9:00am-12:00pm PDT

    +
    +
    +
    + + + + + + + + + + + + + + diff --git a/intro-unix-command-line/Intro_to_Unix_Part_1.Rmd b/intro-unix-command-line/Intro_to_Unix_Part_1.Rmd index f098dba..c1feae7 100644 --- a/intro-unix-command-line/Intro_to_Unix_Part_1.Rmd +++ b/intro-unix-command-line/Intro_to_Unix_Part_1.Rmd @@ -2,6 +2,12 @@ title: "Introduction to Unix Command-line - Part 1" author: "Natalie Elphick" date: "April 17th" +knit: (function(input, ...) { + rmarkdown::render( + input, + output_dir = "../docs" + ) + }) output: revealjs::revealjs_presentation: css: style.css @@ -119,9 +125,6 @@ Both bash and zsh should be able to run all of the commands in this workshop. Shell commands are basic instructions used to perform specific tasks. -
    - -Basic structure of commands: `command_name -[option(s)] [argument(s)]` @@ -132,6 +135,8 @@ ls -lah part_1 Here we are providing multiple options to the `ls` command and the directory **part_1** as an argument +- To cancel a command press CTRL+C + ## man: pull up the manual page for a command @@ -245,6 +250,9 @@ ls . ls -lah part_1 ``` + + + ## cd: move to a directory ```{r, engine='bash', eval=FALSE, results='markup', highlight=FALSE, comment=NA} @@ -344,10 +352,11 @@ du -h */* If you are following along with the commands we have run so far, this is the file structure you should have: ```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA} -ls ./* +ls * ``` -- `*` is a wildcard so `ls` will list and directories in the current one +- "*" represents any number of characters, including zero characters so this command runs ls on all of the folders + ## Text editors @@ -512,13 +521,31 @@ cat part_1/list_numbers.csv If you followed along with the commands we have run so far, you should have this directory structure: ```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA} -ls ./* +ls * ``` + + +## sort: sort values + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +cat part_1/list_numbers.csv | cut -d "," -f 1 | sort -n +``` + +- -n : sort numerically (default is alphabetical) + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +cat part_1/list_numbers.csv | cut -d "," -f 8 | sort -nu +``` + +- -u : sort and remove duplicates + + # End of Part 1 ## Other helpful commands {.small-bullets} +- `wc` : count lines and words - `chmod` : Change the permissions of a file or directory - `chown` : Change the owner of a file or directory - `df` : Display information about disk usage and available space @@ -573,4 +600,6 @@ rm -r new_directory rm part_2/homo_sapiens.refseq.tsv.gz rm part_1/subset_list_numbers.tsv rm part_1/list_numbers.csv -``` \ No newline at end of file +``` + + diff --git a/intro-unix-command-line/Intro_to_Unix_Part_1.html b/intro-unix-command-line/Intro_to_Unix_Part_1.html deleted file mode 100644 index 4af4529..0000000 --- a/intro-unix-command-line/Intro_to_Unix_Part_1.html +++ /dev/null @@ -1,1500 +0,0 @@ - - - - - - - Introduction to Unix Command-line - Part 1 - - - - - - - - - - - - - - - - - -
    -
    - -
    -

    Introduction to Unix Command-line - Part 1

    -

    Natalie Elphick

    -

    April 17th

    -
    - -
    -

    -
    -Press the ? key for tips on navigating these slides -
    -
    -
    -

    Introductions

    -

    Natalie Elphick
    -Bioinformatician I

    -


    -

    Yihang Xin (TA)
    -Software Engineer II

    -
    -
    -
    -

    The Unix Command-line

    - -
    -
    -

    What is Unix?

    -
      -
    • A family of operating systems that date back to 1970s
    • -
    • Designed to run on different types of computer hardware
    • -
    • macOS and Linux are descendants of Unix
    • -
    • Often used in industry and scientific research
    • -
    -
    -
    -

    What is the Command-line?

    -
      -
    • A text-based interface for interacting with the operating -system
    • -
    • Execute commands to perform various tasks like: -
        -
      • Navigating the file system
      • -
      • Editing files
      • -
    • -
    • Command-line interfaces (CLIs) are how users interacted with -computers before graphical interfaces like Windows
    • -
    -
    -
    -

    Advantages of using CLIs

    -
      -
    • Allows for automation and batch processing of data
    • -
    • Provides more control and flexibility over data manipulation
    • -
    • Interact with High-Performance Computing clusters (HPCs) like -Wynton
    • -
    • Speed
    • -
    -
    -
    -

    The Terminal

    -
      -
    • The software that provides access to the CLI

    • -
    • Open your terminal

      -
        -
      • Mac: press Cmd + Space and type “terminal”
      • -
      • Linux: press Ctrl+Alt+T
        -
      • -
      • Windows: open the Ubuntu app
      • -
    • -
    -
    -
    -

    Download the workshop materials

    -
      -
    • Copy and paste each of these into your terminal, press enter after -each one:
    • -
    -
    mkdir unix_workshop
    -
    cd unix_workshop
    -
    curl -L -o unix_workshop_2023.tar.gz 'https://www.dropbox.com/s/smb12au2y82jmvq/unix_workshop_2023.tar.gz?dl=0'
    -
    tar -xzf unix_workshop_2023.tar.gz
    -
    cd unix_workshop_2023
    -
    -
    -

    The shell

    -
      -
    • A shell is a specific type of CLI that provides access to the OS as -a whole
    • -
    • There are several different Unix shells -
        -
      • Bash : The most widely used and the default shell on most Linux -systems
      • -
      • Zsh : an extended version of Bash and is now the default on newer -versions of macOS
      • -
    • -
    -
    -
    -

    The shell

    -
      -
    • Check which shell you are using by typing:
    • -
    -
    echo $0
    -
    bash
    -

    Both bash and zsh should be able to run all of the commands in this -workshop.

    -
    -
    -

    Commands

    -

    Shell commands are basic instructions used to perform specific -tasks.

    -


    -

    Basic structure of commands:
    -command_name -[option(s)] [argument(s)]

    -

    Example:

    -
    ls -lah part_1
    -

    Here we are providing multiple options to the ls command -and the directory part_1 as an argument

    -
    -
    -

    man: pull up the manual page for a command

    -
    man echo
    -
    ECHO(1)             General Commands Manual            ECHO(1)
    -
    -NAME
    -     echo – write arguments to the standard output
    -
    -SYNOPSIS
    -     echo [-n] [string ...]
    -
    -DESCRIPTION
    -     The echo utility writes any specified operands, separated by single blank
    -     (‘ ’) characters and followed by a newline (‘\n’) character, to the
    -     standard output.
    -
    -     The following option is available:
    -
    -     -n    Do not print the trailing newline character.  This may also be
    -       achieved by appending ‘\c’ to the end of the string, as is done by
    -       iBCS2 compatible systems.  Note that this option as well as the
    -       effect of ‘\c’ are implementation-defined in IEEE Std 1003.1-2001
    -       (“POSIX.1”) as amended by Cor. 1-2002.  Applications aiming for
    -       maximum portability are strongly encouraged to use printf(1) to
    -       suppress the newline character.
    -
    -     Some shells may provide a builtin echo command which is similar or
    -     identical to this utility.  Most notably, the builtin echo in sh(1) does
    -     not accept the -n option.  Consult the builtin(1) manual page.
    -
    -EXIT STATUS
    -     The echo utility exits 0 on success, and >0 if an error occurs.
    -
    -SEE ALSO
    -     builtin(1), csh(1), printf(1), sh(1)
    -
    -STANDARDS
    -     The echo utility conforms to IEEE Std 1003.1-2001 (“POSIX.1”) as amended
    -     by Cor. 1-2002.
    -
    -macOS 13.2          April 12, 2003              macOS 13.2
    -
    -
    -

    Manual pages

    -

    Use the arrow keys to navigate the manual and press q to -close it

    -
    MAN(1)              General Commands Manual         MAN(1)
    -
    -NAME
    -     man, apropos, whatis – display online manual documentation pages
    -
    -SYNOPSIS
    -     man [-adho] [-t | -w] [-M manpath] [-P pager] [-S mansect]
    -     [-m arch[:machine]] [-p [eprtv]] [mansect] page ...
    -
    -     man -f [-d] [-M manpath] [-P pager] [-S mansect] keyword ...
    -     whatis [-d] [-s mansect] keyword ...
    -
    -     man -k [-d] [-M manpath] [-P pager] [-S mansect] keyword ...
    -     apropos [-d] [-s mansect] keyword ...
    -
    -DESCRIPTION
    -     The man utility finds and displays online manual documentation pages.  If
    -     mansect is provided, man restricts the search to the specific section of
    -     the manual.
    -
    -     The sections of the manual are:
    -       1.   General Commands Manual
    -       2.   System Calls Manual
    -       3.   Library Functions Manual
    -       4.   Kernel Interfaces Manual
    -       5.   File Formats Manual
    -       6.   Games Manual
    -       7.   Miscellaneous Information Manual
    -       8.   System Manager's Manual
    -       9.   Kernel Developer's Manual
    -
    -     Options that man understands:
    -
    -     -M manpath
    -         Forces a specific colon separated manual path instead of the
    -         default search path.  See manpath(1).  Overrides the MANPATH
    -         environment variable.
    -
    -     -P pager
    -         Use specified pager.  Defaults to “less -sR” if color support is
    -         enabled, or “less -s”.  Overrides the MANPAGER environment
    -         variable, which in turn overrides the PAGER environment variable.
    -
    -     -S mansect
    -         Restricts manual sections searched to the specified colon
    -         delimited list.  Defaults to “1:8:2:3:3lua:n:4:5:6:7:9:l”.
    -         Overrides the MANSECT environment variable.
    -
    -     -a      Display all manual pages instead of just the first found for each
    -         page argument.
    -
    -     -d      Print extra debugging information.  Repeat for increased
    -         verbosity.  Does not display the manual page.
    -
    -     -f      Emulate whatis(1).  Note that only a subset of options will have
    -         any effect when man is invoked in this mode.  See the below
    -         description of whatis options for details.
    -
    -     -h      Display short help message and exit.
    -
    -     -k      Emulate apropos(1).  Note that only a subset of options will have
    -         any effect when man is invoked in this mode.  See the below
    -         description of apropos options for details.
    -
    -     -m arch[:machine]
    -         Override the default architecture and machine settings allowing
    -         lookup of other platform specific manual pages.  This option is
    -         accepted, but not implemented, on macOS.
    -
    -     -o      Force use of non-localized manual pages.  See IMPLEMENTATION
    -         NOTES for how locale specific searches work.  Overrides the
    -         LC_ALL, LC_CTYPE, and LANG environment variables.
    -
    -     -p [eprtv]
    -         Use the list of given preprocessors before running nroff(1) or
    -         troff(1).  Valid preprocessors arguments:
    -
    -         e       eqn(1)
    -         p       pic(1)
    -         r       refer(1)
    -         t       tbl(1)
    -         v       vgrind(1)
    -
    -         Overrides the MANROFFSEQ environment variable.
    -
    -     -t      Send manual page source through troff(1) allowing transformation
    -         of the manual pages to other formats.
    -
    -     -w      Display the location of the manual page instead of the contents
    -         of the manual page.
    -
    -     Options that apropos and whatis understand:
    -
    -     -d      Same as the -d option for man.
    -
    -     -s      Same as the -S option for man.
    -
    -     When man is operated in apropos or whatis emulation mode, only a subset
    -     of its options will be honored.  Specifically, -d, -M, -P, and -S have
    -     equivalent functionality in the apropos and whatis implementation
    -     provided.  The MANPATH, MANSECT, and MANPAGER environment variables will
    -     similarly be honored.
    -
    -IMPLEMENTATION NOTES
    -   Locale Specific Searches
    -     The man utility supports manual pages in different locales.  The search
    -     behavior is dictated by the first of three environment variables with a
    -     nonempty string: LC_ALL, LC_CTYPE, or LANG.  If set, man will search for
    -     locale specific manual pages using the following logic:
    -
    -       lang_country.charset
    -       lang.charset
    -       en.charset
    -
    -     For example, if LC_ALL is set to “ja_JP.eucJP”, man will search the
    -     following paths when considering section 1 manual pages in
    -     /usr/share/man:
    -
    -       /usr/share/man/ja_JP.eucJP/man1
    -       /usr/share/man/ja.eucJP/man1
    -       /usr/share/man/en.eucJP/man1
    -       /usr/share/man/man1
    -
    -   Displaying Specific Manual Files
    -     The man utility also supports displaying a specific manual page if passed
    -     a path to the file as long as it contains a ‘/’ character.
    -
    -ENVIRONMENT
    -     The following environment variables affect the execution of man:
    -
    -     LC_ALL, LC_CTYPE, LANG
    -             Used to find locale specific manual pages.  Valid values
    -             can be found by running the locale(1) command.  See
    -             IMPLEMENTATION NOTES for details.  Influenced by the -o
    -             option.
    -
    -     MACHINE_ARCH, MACHINE
    -             Used to find platform specific manual pages.  If unset,
    -             the output of “sysctl hw.machine_arch” and “sysctl
    -             hw.machine” is used respectively.  See IMPLEMENTATION
    -             NOTES for details.  Corresponds to the -m option.
    -
    -     MANPATH         The standard search path used by man(1) may be changed by
    -             specifying a path in the MANPATH environment variable.
    -             Invalid paths, or paths without manual databases, are
    -             ignored.  Overridden by -M.  If MANPATH begins with a
    -             colon, it is appended to the default list; if it ends
    -             with a colon, it is prepended to the default list; or if
    -             it contains two adjacent colons, the standard search path
    -             is inserted between the colons.  If none of these
    -             conditions are met, it overrides the standard search
    -             path.
    -
    -     MANROFFSEQ      Used to determine the preprocessors for the manual source
    -             before running nroff(1) or troff(1).  If unset, defaults
    -             to tbl(1).  Corresponds to the -p option.
    -
    -     MANSECT         Restricts manual sections searched to the specified colon
    -             delimited list.  Corresponds to the -S option.
    -
    -     MANWIDTH        If set to a numeric value, used as the width manpages
    -             should be displayed.  Otherwise, if set to a special
    -             value “tty”, and output is to a terminal, the pages may
    -             be displayed over the whole width of the screen.
    -
    -     MANCOLOR        If set, enables color support.
    -
    -     MANPAGER        Program used to display files.
    -
    -             If unset, and color support is enabled, “less -sR” is
    -             used.
    -
    -             If unset, and color support is disabled, then PAGER is
    -             used.  If that has no value either, “less -s” is used.
    -
    -FILES
    -     /etc/man.conf
    -         System configuration file.
    -     /usr/local/etc/man.d/*.conf
    -         Local configuration files.
    -
    -EXIT STATUS
    -     The man utility exits 0 on success, and >0 if an error occurs.
    -
    -EXAMPLES
    -     Show the manual page for stat(2):
    -
    -       $ man 2 stat
    -
    -     Show all manual pages for ‘stat’.
    -
    -       $ man -a stat
    -
    -     List manual pages which match the regular expression either in the title
    -     or in the body:
    -
    -       $ man -k '\<copy\>.*archive'
    -
    -     Show the manual page for ls(1) and use cat(1) as pager:
    -
    -       $ man -P cat ls
    -
    -     Show the location of the ls(1) manual page:
    -
    -       $ man -w ls
    -
    -SEE ALSO
    -     apropos(1), intro(1), mandoc(1), manpath(1), whatis(1), intro(2),
    -     intro(3), intro(3lua), intro(4), intro(5), man.conf(5), intro(6),
    -     intro(7), mdoc(7), intro(8), intro(9)
    -
    -macOS 13.2          January 9, 2021             macOS 13.2
    -
    -
    -

    echo: print a string or value of a variable

    -
      -
    • Variables : a named container that holds a value or -data
    • -
    • Strings : sequence of characters
    • -
    -
    message="Hello, World!"
    -echo $message
    -
    Hello, World!
    -

    Here, we assign the string “Hello, World!” to the variable -message and use echo to print its value.

    -
    -
    -

    history: list previously run commands

    -
      -
    • You can also cycle through previously run commands using the up and -down arrow keys
    • -
    • By default bash stores the last 500 commands, zsh stores the last -1000
    • -
    -
    history
    -
      -
    • Use the command clear to clear the output from the -terminal
    • -
    -
    -
    - -
    -

    The File System

    -

    Unix File system

    -
    -
    -

    Paths

    -
      -
    • Root directory - / 
    • -
    • Current working directory  . 
      -
    • -
    • Directory above the current one  .. 
    • -
    • User home directory  ~ 
    • -
    • Absolute: The path starting from root
    • -
    -
    /data/file1.txt
    -
      -
    • Relative: Path from the current directory
    • -
    -
    file1.txt
    -./file1.txt
    -
    -
    -

    -
    -What is the absolute path to file2.txt? -
    -

    Unix File system

    -
    -
    -

    -
    -What is the realative path to file1.txt if the working -directory is /home/user ? -
    -

    Unix File system

    -
    -
    -

    pwd: print working directory

    -
    pwd
    -
    /Users/your_username/unix_workshop_2023
    -
      -
    • The default working directory when you log in or open a terminal is -your user home directory  ~ 
    • -
    -
    -
    -

    ls: list contents of a directory

    -
    ls .
    -
    part_1
    -part_2
    -
      -
    • -l show more information (file permissions and -size)
    • -
    • -a show all (hidden files)
    • -
    • -h file sizes in human readable format (e.g., 1K, -2G)
    • -
    -
    ls -lah part_1
    -
    total 8
    -drwx---rw-@ 4 nelphick  staff   128B Apr 14 11:26 .
    -drwxr-xr-x@ 5 nelphick  staff   160B Apr 14 11:26 ..
    --rw-r--rw-@ 1 nelphick  staff     0B Apr 11 16:29 .hidden_file.txt
    --rw-r--rw-@ 1 nelphick  staff    60B Apr 12 15:40 list_numbers.tsv
    -
    -
    -

    cd: move to a directory

    -
    cd unix_workshop_2023/part_1
    -ls -l
    -
    total 8
    --rw-r--rw-@ 1 nelphick  staff  60 Apr 12 15:40 list_numbers.tsv
    -
    cd ..
    -ls -l
    -
    total 0
    -drwx---rw-@ 4 nelphick  staff  128 Apr 14 11:26 part_1
    -drwxr-xr-x@ 2 nelphick  staff   64 Apr 14 11:26 part_2
    -
    -
    -
    -

    Creating and Altering Files

    - -
    -
    -

    File Permissions

    -

    File Permissions

    -
      -
    • Permissions for a file or folder are represented by 10 -characters
    • -
    • Each group of 3 represents the permissions for different users: -
        -
      • Owner of the file/folder
      • -
      • Group the owns the file
      • -
      • Others - everyone else
      • -
    • -
    -
    -
    -

    File Permissions

    -

    File Permissions

    -
      -
    • There are three types of permissions: -
        -
      • r - read/view the contents of a file/folder
      • -
      • w - write
      • -
      • x - execute the file or access a directory
      • -
    • -
    -
    -
    -

    touch: create an empty file

    -
    touch new_file.txt
    -
      -
    • If the file exists, it will update the time stamp
    • -
    -
    -
    -

    mkdir: make a directory

    -
    mkdir new_directory
    -
      -
    • -p make parent directories if they don’t exist
    • -
    -
    -
    -

    mv: move a file or folder

    -
    mv new_file.txt new_directory
    -
      -
    • Also used to rename files/folders
    • -
    -
    mv new_directory/new_file.txt new_directory/new_file1.txt
    -
    -
    -

    cp: copy a file or folder

    -
    cp new_directory/new_file1.txt new_directory/new_file2.txt
    -
      -
    • -r to copy a folder (recursive)
    • -
    -
    -
    -

    rm: remove a file permanently

    -
      -
    • This command should always be used with care since the files it is -used on cannot be recovered
    • -
    -
    rm new_directory/new_file2.txt
    -
    -
    -

    du: check the size of a file or folder

    -
    du -h */*
    -
      0B    new_directory/new_file1.txt
    -4.0K    part_1/list_numbers.tsv
    -

    -h - Displays the output in human readable format

    -
    -
    -

    Check-in

    -

    If you are following along with the commands we have run so far, this -is the file structure you should have:

    -
    ls ./*
    -
    ./new_directory:
    -new_file1.txt
    -
    -./part_1:
    -list_numbers.tsv
    -
    -./part_2:
    -
      -
    • * is a wildcard so ls will list and -directories in the current one
    • -
    -
    -
    -

    Text editors

    -
      -
    • Command-line text editors provide lots of key board shortcuts to -navigate and alter files
    • -
    • Some commonly used ones are: -
        -
      • Vim : Feature rich, steep learning curve
      • -
      • nano : Simple and user friendly
      • -
    • -
    -
    -
    -

    nano

    -
    nano new_directory/new_file1.txt
    -

    nano

    -
    -
    -

    Shortcuts for nano

    -
      -
    • Ctrl + X :  Exit nano
    • -
    • Ctrl + O :  Save the file (write Out)
    • -
    • Ctrl + W :  Search for a string or regular expression
    • -
    • Ctrl + K :  Cut (remove) the current line or selection
    • -
    • Ctrl + U :  Uncut (paste) the most recently cut text
    • -
    • Ctrl + A :  Move the cursor to the start of the current line
    • -
    • Ctrl + E :  Move the cursor to the end of the current line
    • -
    • Ctrl + G :  Show the help menu
    • -
    -
    -
    -
    -

    Installing Software

    - -
    -
    -

    Package managers

    -
      -
    • Used to install and manage software

    • -
    • macOS

      -
        -
      • homebrew - not included with the OS -and needs to be installed
      • -
    • -
    • WSL/Linux

      -
        -
      • apt-get -- included with Ubuntu
      • -
    • -
    -

    We will not install any software in this workshop but these are how -you would access additional software/commands.

    -
    -
    -
    -

    Downloading Files

    - -
    -
    -

    curl: download files from the internet

    -
      -
    • We used curl at the begining download the materials
    • -
    • curl supports multiple protocols but the most commonly used one is -HTTPS
    • -
    -
    curl -o part_2/homo_sapiens.refseq.tsv.gz https://ftp.ensembl.org/pub/current_tsv/homo_sapiens/Homo_sapiens.GRCh38.109.refseq.tsv.gz
    -
      -
    • -o gives the output file name and location
    • -
    -
    -
    -

    Other file transfer tools

    -
      -
    • wget : like curl, supports fewer protocols and is not always -installed
    • -
    • scp (secure copy) : used to encrypt and transfer files using SSH -protocol -
        -
      • Commonly used to transfer files on HPCs (Wynton) Example:
      • -
    • -
    -
    scp /path/to/local/file user@remote.host:/path/to/remote/directory
    -
    -
    -
    -

    Searching Files and Combining Commands

    - -
    -
    -

    grep: searching files with regular expressions

    -
      -
    • Search the contents of the input file and returns the lines that -have a match

    • -
    • Regular Expressions : sequence of characters -that forms a search pattern

    • -
    -
    grep "7" part_1/list_numbers.tsv
    -
    7   8   52  13  6   42  79  1
    -13  4   9   82  67  71  93  3
    -
    -
    -

    grep options

    -
      -
    • -i : ignore case
    • -
    • -v : invert match
    • -
    • -r : recursively searches in all files and subdirectories of a -directory
    • -
    • -c : counts the number of matches found in each file
    • -
    -
    -
    -

    Regular Expressions

    -
      -
    • For a more extensive overview of regular expressions click here
    • -
    • Some basic ones are: -
        -
      • ” ^ ” Matches the beginning of a line
      • -
      • ” . ” Matches any single character except newline
      • -
      • ” $ ” Matches the end of a line
      • -
    • -
    -
    grep "3$" part_1/list_numbers.tsv
    -
    1   3   6   10  11  22  0   3
    -13  4   9   82  67  71  93  3
    -
    -
    -

    head/tail: view the first or last n lines of a file

    -
    head -n 1 part_1/list_numbers.tsv
    -
    1   3   6   10  11  22  0   3
    -
    tail -n 1 part_1/list_numbers.tsv
    -
    13  4   9   82  67  71  93  3
    -
      -
    • The default n is 10
    • -
    • Useful for getting a look at the format of a file
    • -
    -
    -
    -

    cat: print the contents of a file

    -
    cat part_1/list_numbers.tsv
    -
    1   3   6   10  11  22  0   3
    -7   8   52  13  6   42  79  1
    -13  4   9   82  67  71  93  3
    -
    -
    -

    cut: get specific columns from a file

    -
    cut -f 1-3,6 part_1/list_numbers.tsv
    -
    1   3   6   22
    -7   8   52  42
    -13  4   9   71
    -
      -
    • -f : fields that should be returned
    • -
    • -d : delimiter - character that the columns are separated by
    • -
    -

    By default cut expects columns to be separated by tab -characters.

    -
    -
    -

    Combining Commands

    -
      -
    • Pipes “|” connect one command to another
    • -
    • The output of the previous command is used as the input for the next -one
    • -
    • Chaining commands allows you to do complex operations on text -streams
    • -
    -
    grep "3$" part_1/list_numbers.tsv | cut -f 1-3
    -
    1   3   6
    -13  4   9
    -
    -
    -

    Output to a file

    -
      -
    • The output of any command can be written to a file with the ” > ” -character
    • -
    -
    grep "3$" part_1/list_numbers.tsv | cut -f 1-3 > part_1/subset_list_numbers.tsv
    -
    -
    -

    tr: translate or substitute characters

    -
    cat part_1/list_numbers.tsv | tr "\t" "," > part_1/list_numbers.csv
    -cat part_1/list_numbers.csv
    -
    1,3,6,10,11,22,0,3
    -7,8,52,13,6,42,79,1
    -13,4,9,82,67,71,93,3
    -
    -
    -

    Check-in

    -

    If you followed along with the commands we have run so far, you -should have this directory structure:

    -
    ls ./*
    -
    ./new_directory:
    -new_file1.txt
    -
    -./part_1:
    -list_numbers.csv
    -list_numbers.tsv
    -subset_list_numbers.tsv
    -
    -./part_2:
    -homo_sapiens.refseq.tsv.gz
    -
    -
    -
    -

    End of Part 1

    - -
    -
    -

    Other helpful commands

    -
      -
    • chmod : Change the permissions of a file or -directory
    • -
    • chown : Change the owner of a file or directory
    • -
    • df : Display information about disk usage and available -space
    • -
    • ps : Display information about running processes
    • -
    • kill : Stop a running process
    • -
    • less : View the contents of a file one page at a -time
    • -
    • date : prints the date and time
    • -
    • curl wttr.in : check the weather
    • -
    -
    -
    -

    Survey

    - -
    -
    -

    Additional learning materials

    - -
    -
    -

    Upcoming Data Science Training Program Workshops

    -

    Linear -Mixed Effects Modeling
    -April 24-April 25, 2023 10:00am-12:00pm PDT

    -

    Machine -Learning
    -April 28, 2023 10:00am-12:00pm PDT

    -

    Advanced -Cytoscape Automation
    -May 2, 2023 1:00-4:00pm PDT

    -

    Introduction -to RNA-Seq Analysis
    -May 15-May 16, 2023 9:00am-12:00pm PDT

    -
    -
    -
    - - - - - - - - - - - - - - diff --git a/intro-unix-command-line/Intro_to_Unix_Part_2.Rmd b/intro-unix-command-line/Intro_to_Unix_Part_2.Rmd index 5770ea9..36cafce 100644 --- a/intro-unix-command-line/Intro_to_Unix_Part_2.Rmd +++ b/intro-unix-command-line/Intro_to_Unix_Part_2.Rmd @@ -2,6 +2,12 @@ title: "Introduction to Unix Command-line - Part 2" author: "Natalie Elphick" date: "April 18th" +knit: (function(input, ...) { + rmarkdown::render( + input, + output_dir = "../docs" + ) + }) output: revealjs::revealjs_presentation: css: style.css @@ -26,10 +32,407 @@ Bioinformatician I **Yihang Xin (TA)** Software Engineer II +
    + +# Setup + +Run the following commands if you did not attend part 1: + + +```{r, engine='bash', eval=FALSE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +mkdir unix_workshop +``` + + +```{r, engine='bash', eval=FALSE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +cd unix_workshop +``` + + +```{r, engine='bash', eval=FALSE, results="hide", highlight=FALSE, comment=NA, echo = TRUE} +curl -L -o unix_workshop_2023.tar.gz 'https://www.dropbox.com/s/smb12au2y82jmvq/unix_workshop_2023.tar.gz?dl=0' +``` + + +```{r, engine='bash', eval=FALSE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +tar -xzf unix_workshop_2023.tar.gz +``` + + +```{r, engine='bash', eval=FALSE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +cd unix_workshop_2023 +``` + +```{r, engine='bash', eval=TRUE, results="hide", highlight=FALSE, comment=NA, echo = TRUE} +curl -o part_2/homo_sapiens.refseq.tsv.gz https://ftp.ensembl.org/pub/current_tsv/homo_sapiens/Homo_sapiens.GRCh38.109.refseq.tsv.gz +``` # File Compression +## Command-line tools for compression + +- Compression reduces the size of a file +- `gzip` : compresses a file and replaces it with a compressed version (.gz) +- `tar` : create and manipulate archive files + +
    + +**Archive**: a single file that contains one or more files and/or folders that have been compressed + + + +## gzip/gunzip: compress/uncompress a file + + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +gunzip part_2/homo_sapiens.refseq.tsv.gz +du -h part_2/homo_sapiens.refseq.tsv +``` + +- The uncompressed file is 27 megabytes + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +gzip part_2/homo_sapiens.refseq.tsv +du -h part_2/homo_sapiens.refseq.tsv.gz +``` + +- Compressing it makes it a 10th of the size + +## Note + +- The magnitude of the compression depends on type of data +- The units for file sizes are not the same across all systems + - Some systems define a kilobyte as 1000 bytes, while others define it as 1024 bytes + + + +## tar: compressing folders into archives + +- Does not provide compression on its own, it uses gzip to create compressed archive files + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +tar -czf part_1.tar.gz part_1 +ls -l +``` + +- -c: create a new archive +- -f: specify the name of the archive file +- -z: compress the archive with gzip + +## Unarchiving + +- We did this in part 1 to unarchive the workshop folders + + +```{r, engine='bash', eval=FALSE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +tar -xzf part_1.tar.gz +``` +- -x: extract an archive +- -z: uncompress the archive with gzip +- -f: specify the name of the archive file + + +## gunzip -c: cat compressed files + +- To avoid uncompressing a large file just to read its contents, we can use `gunzip -c` +- This will output the the file to the terminal + + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +gunzip -c part_2/homo_sapiens.refseq.tsv.gz | head +``` + + + +# System Variables + +## What are system variables? + +- Special variables that contain information about the system's configuration and state +- Used by the OS and programs to change their behavior based on the system's state + +Example: +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +echo $HOME +``` + + +## Common System Variables + +- **$PWD** : The working directory +- **$HOME** : The current user's home directory +- **$PS1** : the shell prompt string +- **$TEMP** : location of temporary files + + + +## PATH: locations of executable files + +- When you enter a command, the OS searches the directories in the `$PATH` to find its associated executable file + +```{r, engine='bash', eval=FALSE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +echo $PATH +``` + + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE} +echo "/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin:/usr/local/mysql/bin" +``` + + + +- The OS will check these directories in the order they appear and use the first executable it finds + +## export: set system variables + +- Useful for setting variables you want to be used across programs +- You can add new software to your `$PATH` like this: + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE} +echo 'export PATH="/path/to/new/software:$PATH"' +``` + +- This will modify the `$PATH` for the current terminal session + +## Modifying the PATH for all future terminal sessions + +- Add the export line to your `~/.bashrc` or `~/.zshrc` +- **Proceed with caution** + - Make backups of these and read this [guide](https://tldp.org/LDP/Bash-Beginners-Guide/html/sect_03_01.html) + - Changing the `$PATH` incorrectly can break system functionality + + +## which: locate the executable associated with a command + +- This command shows the location of the executable that the OS finds + + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +which ls +``` + +- Useful to check if there are multiple versions of a software installed + +# Shell Scripting + + +## What is a script? + +- Scripts are executable files for reusing code +- By convention scripts end in `.sh` +- This first line of the script is called the shebang + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE} +echo "nano part_2/example_script.sh" +cat ../materials/example_script.sh > part_2/example_script.sh +``` + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE} +cat ../materials/example_script.sh | head -n 1 +``` + +- The text that follows `#!` tells the OS where the interpreter is + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +which bash +``` + +## chmod: making a script executable + +- By default, files are not executable + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +ls -l part_2/example_script.sh +``` + +- We can set the execute bit like this + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +chmod u+x part_2/example_script.sh +ls -l part_2/example_script.sh +``` + +## Example + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE} +cat ../materials/example_script.sh +``` + +## Let's run it + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +./part_2/example_script.sh part_2/homo_sapiens.refseq.tsv.gz +``` + +## Loops + +- Useful for iterating over lines of a file or lists + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +for i in {1..3} +do + +echo $i + +done +``` + +## While loops + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +count=0 + +while [ $count -lt 5 ] # loop while count is less than 5 +do + echo $count + count=$((count+1)) +done +``` + + +## If statements + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +x=5 + +if [ $x -gt 10 ] # check if x is greater than 10 +then + echo "x is greater than 10" +else + echo "x is not greater than 10" +fi # end if statement +``` + +# Other Useful Commands + +## sed : stream editor + +- Parses and transforms text, using a compact programming language +- It reads and modifies text line by line from a file or input stream +- Supports [regular expressions](https://v4.software-carpentry.org/regexp/index.html) +- Useful for replacing text + +Example: +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE} +echo "sed 's/search_string/replace_string/g' input.txt > output.txt" +``` + +## ssh : secure shell - conect to remote server + +- Logging in to a remote server +- Remote desktop for the terminal +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE} +echo "ssh username@remote" +``` +- The `username` would be your user on the remote server and `remote` is the hostname or IP address of the remote server or computer + +## scp : secure copy + +- Copy files from a remote server or computer + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE} +echo "scp [options] [source] [destination]" +``` +- Copy from local to remote + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE} +echo "scp /path/to/local/file.txt username@remote:/path/to/remote/directory/" +``` +- Copy from remote to local + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE} +echo "scp username@remote:/path/to/file.txt /path/to/local/directory/" +``` + +- -r : copy a whole folder + +# AWK + +## awk : processing structured data + +- A small programming language that is designed to work with structured data +- Has more complicated syntax but is faster at processing large files +- Designed to read a file or input stream line by line +- Operates on **records** (lines) and **fields** (columns) + +Basic command: + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = FALSE} +echo "awk options 'pattern {action}' input_file" +``` + +## Example : Sum the first 2 columns of a file + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +awk -F '\t' '{print $1+$2}' part_1/list_numbers.tsv +``` + +- -F : provides the field separator +- `$1,$2` : the first and second fields + +## Example : Find the average of a column + +- For this example we only want the average if the 5th column equals "RefSeq_mRNA" + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo = TRUE} +gunzip -c part_2/homo_sapiens.refseq.tsv.gz | \ +awk -F '\t' '$5 == "RefSeq_mRNA" {sum += $7; count++} \ +END {print sum / count}' + +``` + +## Resources for learning AWK and sed + +- [The GNU AWK manual](https://www.gnu.org/software/gawk/manual/gawk.html) +- [AWK Tutorial by Bruce Barnett](https://www.grymoire.com/Unix/Awk.html) +- [Sed Tutorial by Bruce Barnett](https://www.grymoire.com/Unix/Sed.html) + + + +# End of Part 2 +## Survey + +- Please take some time to fill out the workshop survey: +https://www.surveymonkey.com/r/DY7K5ZY + + +## Additional learning materials + +- Software carpentry provides a self paced course: + - [The Unix Shell](https://swcarpentry.github.io/shell-novice/) + +- Free online books: + - [The Unix Workbench](https://seankross.com/the-unix-workbench/index.html) + - [The Linux Command Line](http://linuxcommand.org/tlcl.php) + + + + +## Upcoming Data Science Training Program Workshops + +[Linear Mixed Effects Modeling](https://gladstone.org/index.php/events/linear-mixed-effects-modeling-0) +April 24-April 25, 2023 10:00am-12:00pm PDT + +[Machine Learning](https://gladstone.org/index.php/events/machine-learning) +April 28, 2023 10:00am-12:00pm PDT + +[Advanced Cytoscape Automation](https://gladstone.org/index.php/events/advanced-cytoscape-automation-2) +May 2, 2023 1:00-4:00pm PDT + +[Introduction to RNA-Seq Analysis](https://gladstone.org/index.php/events/introduction-rna-seq-analysis-4) +May 15-May 16, 2023 9:00am-12:00pm PDT + + + + +```{r, engine='bash', eval=TRUE, results='markup', highlight=FALSE, comment=NA, echo=FALSE} +rm part_2/example_script.sh +rm part_2/homo_sapiens.refseq.tsv* +rm part_1.tar.gz +``` + + diff --git a/intro-unix-command-line/materials/example_script.sh b/intro-unix-command-line/materials/example_script.sh new file mode 100644 index 0000000..88e7223 --- /dev/null +++ b/intro-unix-command-line/materials/example_script.sh @@ -0,0 +1,11 @@ +#!/bin/bash + +# This is a comment. Comments are ignored by the shell. + +# $1 is the first argument passed to the script +echo "Counting the genes in $1" + +# count the unique genes in the file +u_genes=$(gunzip -c $1 | cut -f 1 | sort -u | wc -l) + +echo "There are $u_genes unique genes in $1"