Source of Genome Center Cluster

  ## Requesting an account
Request an account by registering at the [genome center computing page](https://computing.genomecenter.ucdavis.edu/)

## Logging on

Use `ssh` or `mosh` to logon to the cluster head node at barbera.genomecenter.ucdavis.edu

first modify `.ssh/config` on your computer to contain

    GSSAPIAuthentication=yes

then at the Linux/Unix command line on your computer, enter

    kinit -l 14d  jmaloof@GENOMECENTER.UCDAVIS.EDU #14 day max; change the username to yours...

Once you logon to Barbera with `mosh` or `.ssha` you will find that you do not have permission for your home directory. To authorize yourself:

    kinit -l14d #only needs to be done once every 14 days
    aklog # enables your home directory.  If you get a message about not being authorized, then do kinit first.

## Working directory
Your home directory has little storage space.  For analyses, please use the Maloof Lab share

    cd /share/malooflab

Please create your own directory within `malooflab`

## Shared files
Please keep genome fasta files, etc at

    cd /share/malooflab/ref_genomes

## Using the cluster
Most analyses are done by submitting a batch script to the queue
__Do no analyses from the head node!!__

## modules

You will need to load _modules_ that contain the programs you want to use.  You can see what is available with:

    module avail

## Slurm Script

You need to create edit a slurm script to submit commands for processing.  These scripts can either be simple (a single job) or an array job.

### single job

The script below is to run STAR on one fastq file at a time, using 16 cpus

      #!/bin/bash
      #SBATCH --partition=production # partition to submit to
      #SBATCH --job-name=Sol150_Star_Run # Job name
      #SBATCH --nodes=1 # single node, anything more than 1 will not run
      #SBATCH --ntasks=16 # equivalent to cpus
      #SBATCH --mem=100000 # in MB, memory pool all cores, default is 2GB per cpu
      #SBATCH --time=1-00:00:00  # expected time of completion in days, hours, minutes, seconds, default 1-day
      #SBATCH --output=Sol150_Star_Run_single.out # STDOUT
      #SBATCH --error=Sol150_Star_Run_single.err # STDERR
      #SBATCH --mail-user=jnmaloof@ucdavis.edu #
      #SBATCH --mail-type=ALL #
      # This will be run once for a single process
      /bin/hostname

      start=`date +%s`

      # Load STAR Module 2.5.2b

      module load star/2.5.2b

      # Change directory

      cd /share/malooflab/Julin/Solanum/Sol150

      #files=`ls fastq`
      files=`ls -1 fastq_cat | head -n 1`
      for f in $files
      do
            fbase=`basename $f .fastq.gz`
            mkdir -p STAR/${fbase}.STARout
            STAR \
            --genomeDir /share/malooflab/ref_genomes/S_lycopersicum/SL3.00_STAR_REF \
            --readFilesIn /share/malooflab/Julin/Solanum/Sol150/fastq_cat/${f} \
            --quantMode TranscriptomeSAM GeneCounts \
            --twopassMode Basic \
            --alignIntronMax 10000 \
            --runThreadN 16 \
            --outSAMtype BAM SortedByCoordinate \
            --outFileNamePrefix ./STAR/${fbase}.STARout/${fbase}_ \
            --outReadsUnmapped Fastx \
            --outSAMattrRGline ID:${fbase} \
            --readFilesCommand zcat

      done

      end=`date +%s`
      runtime=$((end-start))
      echo $runtime seconds to completion



### array job
this script creates a separate job for each fastq file.

    #!/bin/bash
    #SBATCH --partition=production # partition to submit to
    #SBATCH --job-name=Brapa_Kallisto # Job name
    #SBATCH --array=0-63 #for this script adjust to match number of fastq files
    #SBATCH --nodes=1 # single node, anything more than 1 will not run
    #SBATCH --ntasks=01 # equivalent to cpus, stick to around 20 max on gc64, or gc128 nodes
    #SBATCH --mem=4000 # in MB, memory pool all cores, default is 2GB per cpu
    #SBATCH --time=0-01:00:00  # expected time of completion in hours, minutes, seconds, default 1-day
    #SBATCH --output=Kallisto_%A_%a.out # STDOUT
    #SBATCH --error=Kallisto_%A_%a.err # STDERR
    #SBATCH --mail-user=jnmaloof@ucdavis.edu #
    #SBATCH --mail-type=ALL #

    # This will be run once for a single process

    /bin/hostname

    start=`date +%s`

    # Load Kallisto

    module load kallisto

    # Change directory

    cd /share/malooflab/Julin/Brapa_microbes/20180202-samples/

    # Identify each array run

    echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID

    # create an array of file names:
    filelist=($(ls 20180202-data/raw-fastq/*/*gz))

    # now pick the file that corresponds to the current array
    # note that for this script the number of arrays should equal the number of files
    f=${filelist[${SLURM_ARRAY_TASK_ID}]}

    # trim off directory info and file extensions:
    outdir=$(basename $f .fastq.gz)
    echo "file stem: " $outdir

    kallisto quant \
        --index /share/malooflab/ref_genomes/B_rapa/V3.0/B_rapa_CDS_V3.0_k31_kallisto_index   \
        --output-dir 20180202-data/kallisto_outV3.0/$outdir \
        --plaintext \
        --single \
        -l 250 \
        -s 40 \
        $f

    end=`date +%s`
    runtime=$((end-start))
    echo $runtime seconds to completion

## submitting your script

```
sbatch script.slurm
```

## checking on your job status

```
squeue -u jmaloof #change to your username
```

## interactive session

If you need to install packages (i.e. for R) or compile programs, or move large files (i.e. sftp) you should start an interactive session.  Logon to Barbera first, and then from Barbera:

    screen
    srun -p production -N 1 -n 1 --time=0-04 --mem=4000 --pty /bin/bash
Genome Center Cluster genome_center_cluster