Request an account by registering at the genome center computing page
Use ssh
or mosh
to logon to the cluster head node at barbera.genomecenter.ucdavis.edu
first modify .ssh/config
on your computer to contain
GSSAPIAuthentication=yes
then at the Linux/Unix command line on your computer, enter
kinit -l 14d jmaloof@GENOMECENTER.UCDAVIS.EDU #14 day max; change the username to yours...
Once you logon to Barbera with mosh
or .ssha
you will find that you do not have permission for your home directory. To authorize yourself:
kinit -l14d #only needs to be done once every 14 days aklog # enables your home directory. If you get a message about not being authorized, then do kinit first.
Your home directory has little storage space. For analyses, please use the Maloof Lab share
cd /share/malooflab
Please create your own directory within malooflab
Please keep genome fasta files, etc at
cd /share/malooflab/ref_genomes
Most analyses are done by submitting a batch script to the queue Do no analyses from the head node!!
You will need to load modules that contain the programs you want to use. You can see what is available with:
module avail
You need to create edit a slurm script to submit commands for processing. These scripts can either be simple (a single job) or an array job.
The script below is to run STAR on one fastq file at a time, using 16 cpus
#!/bin/bash #SBATCH --partition=production # partition to submit to #SBATCH --job-name=Sol150_Star_Run # Job name #SBATCH --nodes=1 # single node, anything more than 1 will not run #SBATCH --ntasks=16 # equivalent to cpus #SBATCH --mem=100000 # in MB, memory pool all cores, default is 2GB per cpu #SBATCH --time=1-00:00:00 # expected time of completion in days, hours, minutes, seconds, default 1-day #SBATCH --output=Sol150_Star_Run_single.out # STDOUT #SBATCH --error=Sol150_Star_Run_single.err # STDERR #SBATCH --mail-user=jnmaloof@ucdavis.edu # #SBATCH --mail-type=ALL # # This will be run once for a single process /bin/hostname start=`date +%s` # Load STAR Module 2.5.2b module load star/2.5.2b # Change directory cd /share/malooflab/Julin/Solanum/Sol150 #files=`ls fastq` files=`ls -1 fastq_cat | head -n 1` for f in $files do fbase=`basename $f .fastq.gz` mkdir -p STAR/${fbase}.STARout STAR \ --genomeDir /share/malooflab/ref_genomes/S_lycopersicum/SL3.00_STAR_REF \ --readFilesIn /share/malooflab/Julin/Solanum/Sol150/fastq_cat/${f} \ --quantMode TranscriptomeSAM GeneCounts \ --twopassMode Basic \ --alignIntronMax 10000 \ --runThreadN 16 \ --outSAMtype BAM SortedByCoordinate \ --outFileNamePrefix ./STAR/${fbase}.STARout/${fbase}_ \ --outReadsUnmapped Fastx \ --outSAMattrRGline ID:${fbase} \ --readFilesCommand zcat done end=`date +%s` runtime=$((end-start)) echo $runtime seconds to completion
this script creates a separate job for each fastq file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | #!/bin/bash #SBATCH --partition=production # partition to submit to #SBATCH --job-name=Brapa_Kallisto # Job name #SBATCH --array=0-63 #for this script adjust to match number of fastq files #SBATCH --nodes=1 # single node, anything more than 1 will not run #SBATCH --ntasks=01 # equivalent to cpus, stick to around 20 max on gc64, or gc128 nodes #SBATCH --mem=4000 # in MB, memory pool all cores, default is 2GB per cpu #SBATCH --time=0-01:00:00 # expected time of completion in hours, minutes, seconds, default 1-day #SBATCH --output=Kallisto_%A_%a.out # STDOUT #SBATCH --error=Kallisto_%A_%a.err # STDERR #SBATCH --mail-user=jnmaloof@ucdavis.edu # #SBATCH --mail-type=ALL # # This will be run once for a single process /bin/hostname start=`date +%s` # Load Kallisto module load kallisto # Change directory cd /share/malooflab/Julin/Brapa_microbes/20180202-samples/ # Identify each array run echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID # create an array of file names: filelist=($(ls 20180202-data/raw-fastq/*/*gz)) # now pick the file that corresponds to the current array # note that for this script the number of arrays should equal the number of files f=${filelist[${SLURM_ARRAY_TASK_ID}]} # trim off directory info and file extensions: outdir=$(basename $f .fastq.gz) echo "file stem: " $outdir kallisto quant \ --index /share/malooflab/ref_genomes/B_rapa/V3.0/B_rapa_CDS_V3.0_k31_kallisto_index \ --output-dir 20180202-data/kallisto_outV3.0/$outdir \ --plaintext \ --single \ -l 250 \ -s 40 \ $f end=`date +%s` runtime=$((end-start)) echo $runtime seconds to completion |
sbatch script.slurm
squeue -u jmaloof #change to your username
If you need to install packages (i.e. for R) or compile programs, or move large files (i.e. sftp) you should start an interactive session. Logon to Barbera first, and then from Barbera:
screen srun -p production -N 1 -n 1 --time=0-04 --mem=4000 --pty /bin/bash