Source of Sequencing Database and SRA submission

Download the templates from:

Example templates can be found at:

1. Add researcher to research table if not there
2. Add species to species table if not there
3. Add experiment info to experiment table
4. Create the sample sheet. Sample_ID should be a sequentially increasing number starting with one after the last one in the database.
5. Create the data file sheet. Sample_ID here needs to match the sample ID from step 4. Example R script is at XXX

mySQLworkbench notes:
1. On the mac, the import icon does not work (does not allow you to specify a table to ammend)
2. You can instead right-click on the table name under schema and choose data-import-wizard
3. If you are using the data-import-wizard, then you can have NULL values you in your columns and that is helpful.

The query command for generating an SRA submission table is:

```
SELECT Sample.*, Experiment.*, Data_file.*
FROM ((Sample
INNER JOIN Experiment ON Sample.Sample_experiment = Experiment.Experiment_name)
INNER JOIN Data_file ON Sample.Sample_id = Data_file.Data_file_sample_id)
WHERE Experiment_name = "Tomato_phosphorous_deprivation";
```
(Change the experiment name to match yours)

## Troubleshooting (Kazu, 2022. Some might be specific to Brassica microbe data specific).

* When I used "Import" function in mySQLworkbench, the output csv contained empty rows, which resulted errors in loading file in NCBI SRA submission step. Remove those rows or copy and paste the non-empty rows to create a new csv file solved the NCBI SRA submission errors.
* Fastq file names (for example) should be unique. For example I had duplicated file names in Brassica rapa validation experiment ("f1_5e_dead_cr_8_a142_S41_L002_R1_001.fastq.gz") in two folders. This is because the first sequences did not work well and tried the second sequences and Julin used both files for mapping. So Julin modified the file names in the original files. Accordingly changed all fastq files in that directory in data file ("data_file_B_rapa_validation_small.csv").
* sample_id is necessary for data file, so that I used textsplit(), vlookup(), or substitute() to add it to data file from sample file in excel. Add the sample_id under "Data_file_sample" column.

* For SRA submission we do not need to work on Mapping_analysis_to_file in mySQLworkbench.

* At SRA submission portal, I had bunch of errors in "Biosample Attributes"

* Upload "sample" Ecxel file to 5. biosample attributes. -> bunch of errors.
* Delete "sample_id" in the sample Ecxcel file
* Change format of Sample_date from 10/19/2017 to 2017-10-19 by changing "Format" in Excel tab.
* Add "age" and "developmental stage" as "Three weeks after germination" and "vegetagive"
* Add "FPsc" as "isolate"
* Change "Sample_SRA_tissue" into "tissue"
* Add "geo_location" column and input "Wyoming, USA" (error). "USA: Wyoming" (suggested)

* SRA Metadata
* Download template SRA_metadata.xlsx from SRA submission page
* Copy "sample_name" column in "sample_Br_microbiome_validation_export.xlsx" to "sample_name"
* Add needed info in SRA_metadata.xlsx (see an example)

* Submission was not accepted due to errors (I got messages from NCBI while I was on vacation). Julin solved the problems (The last raw in excel file was missing (I do not know why).

Sequencing Database and SRA submission sequencing_database_and_sra_submission