## Requesting an account Request an account by registering at the [genome center computing page](https://computing.genomecenter.ucdavis.edu/) ## Logging on Use `ssh` or `mosh` to logon to the cluster head node at barbera.genomecenter.ucdavis.edu first modify `.ssh/config` on your computer to contain GSSAPIAuthentication=yes then at the Linux/Unix command line on your computer, enter kinit -l 14d jmaloof@GENOMECENTER.UCDAVIS.EDU #14 day max; change the username to yours... Once you logon to Barbera with `mosh` or `.ssha` you will find that you do not have permission for your home directory. To authorize yourself: kinit -l14d #only needs to be done once every 14 days aklog # enables your home directory. If you get a message about not being authorized, then do kinit first. ## Working directory Your home directory has little storage space. For analyses, please use the Maloof Lab share cd /share/malooflab Please create your own directory within `malooflab` ## Shared files Please keep genome fasta files, etc at cd /share/malooflab/ref_genomes ## Using the cluster Most analyses are done by submitting a batch script to the queue __Do no analyses from the head node!!__ ## modules You will need to load _modules_ that contain the programs you want to use. You can see what is available with: module avail ## Slurm Script You need to create edit a slurm script to submit commands for processing. These scripts can either be simple (a single job) or an array job. ### single job The script below is to run STAR on one fastq file at a time, using 16 cpus #!/bin/bash #SBATCH --partition=production # partition to submit to #SBATCH --job-name=Sol150_Star_Run # Job name #SBATCH --nodes=1 # single node, anything more than 1 will not run #SBATCH --ntasks=16 # equivalent to cpus #SBATCH --mem=100000 # in MB, memory pool all cores, default is 2GB per cpu #SBATCH --time=1-00:00:00 # expected time of completion in days, hours, minutes, seconds, default 1-day #SBATCH --output=Sol150_Star_Run_single.out # STDOUT #SBATCH --error=Sol150_Star_Run_single.err # STDERR #SBATCH --mail-user=jnmaloof@ucdavis.edu # #SBATCH --mail-type=ALL # # This will be run once for a single process /bin/hostname start=`date +%s` # Load STAR Module 2.5.2b module load star/2.5.2b # Change directory cd /share/malooflab/Julin/Solanum/Sol150 #files=`ls fastq` files=`ls -1 fastq_cat | head -n 1` for f in $files do fbase=`basename $f .fastq.gz` mkdir -p STAR/${fbase}.STARout STAR \ --genomeDir /share/malooflab/ref_genomes/S_lycopersicum/SL3.00_STAR_REF \ --readFilesIn /share/malooflab/Julin/Solanum/Sol150/fastq_cat/${f} \ --quantMode TranscriptomeSAM GeneCounts \ --twopassMode Basic \ --alignIntronMax 10000 \ --runThreadN 16 \ --outSAMtype BAM SortedByCoordinate \ --outFileNamePrefix ./STAR/${fbase}.STARout/${fbase}_ \ --outReadsUnmapped Fastx \ --outSAMattrRGline ID:${fbase} \ --readFilesCommand zcat done end=`date +%s` runtime=$((end-start)) echo $runtime seconds to completion ### array job this script creates a separate job for each fastq file. #!/bin/bash #SBATCH --partition=production # partition to submit to #SBATCH --job-name=Brapa_Kallisto # Job name #SBATCH --array=0-63 #for this script adjust to match number of fastq files #SBATCH --nodes=1 # single node, anything more than 1 will not run #SBATCH --ntasks=01 # equivalent to cpus, stick to around 20 max on gc64, or gc128 nodes #SBATCH --mem=4000 # in MB, memory pool all cores, default is 2GB per cpu #SBATCH --time=0-01:00:00 # expected time of completion in hours, minutes, seconds, default 1-day #SBATCH --output=Kallisto_%A_%a.out # STDOUT #SBATCH --error=Kallisto_%A_%a.err # STDERR #SBATCH --mail-user=jnmaloof@ucdavis.edu # #SBATCH --mail-type=ALL # # This will be run once for a single process /bin/hostname start=`date +%s` # Load Kallisto module load kallisto # Change directory cd /share/malooflab/Julin/Brapa_microbes/20180202-samples/ # Identify each array run echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID # create an array of file names: filelist=($(ls 20180202-data/raw-fastq/*/*gz)) # now pick the file that corresponds to the current array # note that for this script the number of arrays should equal the number of files f=${filelist[${SLURM_ARRAY_TASK_ID}]} # trim off directory info and file extensions: outdir=$(basename $f .fastq.gz) echo "file stem: " $outdir kallisto quant \ --index /share/malooflab/ref_genomes/B_rapa/V3.0/B_rapa_CDS_V3.0_k31_kallisto_index \ --output-dir 20180202-data/kallisto_outV3.0/$outdir \ --plaintext \ --single \ -l 250 \ -s 40 \ $f end=`date +%s` runtime=$((end-start)) echo $runtime seconds to completion ## submitting your script ``` sbatch script.slurm ``` ## checking on your job status ``` squeue -u jmaloof #change to your username ``` ## interactive session If you need to install packages (i.e. for R) or compile programs, or move large files (i.e. sftp) you should start an interactive session. Logon to Barbera first, and then from Barbera: screen srun -p production -N 1 -n 1 --time=0-04 --mem=4000 --pty /bin/bash