Skip to content

Latest commit

 

History

History
18 lines (13 loc) · 3.84 KB

File metadata and controls

18 lines (13 loc) · 3.84 KB

.ssf file details

Poseidon v2.7.0 added an option to specify sequencing source data. This is a tab-separated table, much like the .janno file, but following a different schema, typically with file ending .ssf for "Sequencing Source File". The primary entities in this table are sequencing entities (typically corresponding to DNA libraries or even multiple runs/lanes of the same library). The link to the samples listed in the .janno file are made through a foreign-key relationship from the column poseidon_IDs in this file to Poseidon_ID in the Janno-file. The relationship is many-to-many, so each row in the SSF file can contain multiple Poseidon_IDs, and multiple rows can link to the same Poseidon_ID.

Here is an example for such a file:

poseidon_IDs	udg	library_built	sample_accession	study_accession	run_accession	sample_alias	secondary_sample_accession	first_public	last_updated	instrument_model	library_layout	library_source	instrument_platform	library_name	library_strategy	fastq_aspera	fastq_bytes	fastq_md5	fastq_ftp	read_count	submitted_ftp
Ash033.SG	minus	ds	SAMEA7050454	PRJEB39316	ERR4331996	2	ERS4811084	2021-04-12	2020-07-09	Illumina HiSeq 2500	SINGLE	GENOMIC	ILLUMINA	Ash033_all	WGS	fasp.sra.ebi.ac.uk:/vol1/fastq/ERR433/006/ERR4331996/ERR4331996.fastq.gz	649563861	9bd0fceb5ab46cb894ea33765c122e83	ftp.sra.ebi.ac.uk/vol1/fastq/ERR433/006/ERR4331996/ERR4331996.fastq.gz	23386349	ftp.sra.ebi.ac.uk/vol1/run/ERR433/ERR4331996/Ash033_all.merged.hs37d5.fa.cons.90perc.bam
Ash002.SG	minus	ds	SAMEA7050404	PRJEB39316	ERR4332592	1	ERS4811035	2021-04-12	2020-07-10	Illumina HiSeq 2500	SINGLE	GENOMIC	ILLUMINA	Ash002_all	WGS	fasp.sra.ebi.ac.uk:/vol1/fastq/ERR433/002/ERR4332592/ERR4332592.fastq.gz	194164761	6d8831f5bb8ba9870cb55f834e98ab4d	ftp.sra.ebi.ac.uk/vol1/fastq/ERR433/002/ERR4332592/ERR4332592.fastq.gz	6471092	ftp.sra.ebi.ac.uk/vol1/run/ERR433/ERR4332592/Ash002_all.merged.hs37d5.fa.cons.90perc.bam
Ash040.SG	minus	ds	SAMEA7050455	PRJEB39316	ERR4332593	3	ERS4811085	2021-04-12	2020-07-10	Illumina HiSeq 2500	SINGLE	GENOMIC	ILLUMINA	Ash040_all	WGS	fasp.sra.ebi.ac.uk:/vol1/fastq/ERR433/003/ERR4332593/ERR4332593.fastq.gz	276693447	539852f3d7fb574b2a1e4f1c0059f163	ftp.sra.ebi.ac.uk/vol1/fastq/ERR433/003/ERR4332593/ERR4332593.fastq.gz	9442394	ftp.sra.ebi.ac.uk/vol1/run/ERR433/ERR4332593/Ash040_all.merged.hs37d5.fa.cons.90perc.bam
Ash129.SG	minus	ds	SAMEA7050457	PRJEB39316	ERR4332594	5	ERS4811087	2021-04-12	2020-07-10	Illumina HiSeq 2500	SINGLE	GENOMIC	ILLUMINA	Ash129_all	WGS	fasp.sra.ebi.ac.uk:/vol1/fastq/ERR433/004/ERR4332594/ERR4332594.fastq.gz	2503643788	4991d0030214687c7c1706c640a2b37e	ftp.sra.ebi.ac.uk/vol1/fastq/ERR433/004/ERR4332594/ERR4332594.fastq.gz	76891816	ftp.sra.ebi.ac.uk/vol1/run/ERR433/ERR4332594/Ash129_all.merged.hs37d5.fa.cons.90perc.bam
Ash131.SG	minus	ds	SAMEA7050458	PRJEB39316	ERR4332631	6	ERS4811088	2021-04-12	2020-07-10	Illumina HiSeq 2500	SINGLE	GENOMIC	ILLUMINA	Ash131_all	WGS	fasp.sra.ebi.ac.uk:/vol1/fastq/ERR433/001/ERR4332631/ERR4332631.fastq.gz	535696334	3c680096683defd93b35a5f4004e9e87	ftp.sra.ebi.ac.uk/vol1/fastq/ERR433/001/ERR4332631/ERR4332631.fastq.gz	16781594	ftp.sra.ebi.ac.uk/vol1/run/ERR433/ERR4332631/Ash131_all.merged.hs37d5.fa.cons.90perc.bam

To help with generating such a table from data at the ENA, we have a convenience script which helps downloading the table, except for the udg, library_built, and poseidon_IDs columns, which need to be added manually.

Note that due to the heterogenous way users submit data to these archives, many columns in this table are less formalised than they could be. We generally advice simply using the above script and copy whatever data is in the archives into the respective columns.