Poseidon v2.7.0 added an option to specify sequencing source data. This is a tab-separated table, much like the .janno file, but following a different schema, typically with file ending .ssf for "Sequencing Source File". The primary entities in this table are sequencing entities (typically corresponding to DNA libraries or even multiple runs/lanes of the same library). The link to the samples listed in the .janno file are made through a foreign-key relationship from the column poseidon_IDs in this file to Poseidon_ID in the Janno-file. The relationship is many-to-many, so each row in the SSF file can contain multiple Poseidon_IDs, and multiple rows can link to the same Poseidon_ID.
Here is an example for such a file:
poseidon_IDs udg library_built sample_accession study_accession run_accession sample_alias secondary_sample_accession first_public last_updated instrument_model library_layout library_source instrument_platform library_name library_strategy fastq_aspera fastq_bytes fastq_md5 fastq_ftp read_count submitted_ftp
Ash033.SG minus ds SAMEA7050454 PRJEB39316 ERR4331996 2 ERS4811084 2021-04-12 2020-07-09 Illumina HiSeq 2500 SINGLE GENOMIC ILLUMINA Ash033_all WGS fasp.sra.ebi.ac.uk:/vol1/fastq/ERR433/006/ERR4331996/ERR4331996.fastq.gz 649563861 9bd0fceb5ab46cb894ea33765c122e83 ftp.sra.ebi.ac.uk/vol1/fastq/ERR433/006/ERR4331996/ERR4331996.fastq.gz 23386349 ftp.sra.ebi.ac.uk/vol1/run/ERR433/ERR4331996/Ash033_all.merged.hs37d5.fa.cons.90perc.bam
Ash002.SG minus ds SAMEA7050404 PRJEB39316 ERR4332592 1 ERS4811035 2021-04-12 2020-07-10 Illumina HiSeq 2500 SINGLE GENOMIC ILLUMINA Ash002_all WGS fasp.sra.ebi.ac.uk:/vol1/fastq/ERR433/002/ERR4332592/ERR4332592.fastq.gz 194164761 6d8831f5bb8ba9870cb55f834e98ab4d ftp.sra.ebi.ac.uk/vol1/fastq/ERR433/002/ERR4332592/ERR4332592.fastq.gz 6471092 ftp.sra.ebi.ac.uk/vol1/run/ERR433/ERR4332592/Ash002_all.merged.hs37d5.fa.cons.90perc.bam
Ash040.SG minus ds SAMEA7050455 PRJEB39316 ERR4332593 3 ERS4811085 2021-04-12 2020-07-10 Illumina HiSeq 2500 SINGLE GENOMIC ILLUMINA Ash040_all WGS fasp.sra.ebi.ac.uk:/vol1/fastq/ERR433/003/ERR4332593/ERR4332593.fastq.gz 276693447 539852f3d7fb574b2a1e4f1c0059f163 ftp.sra.ebi.ac.uk/vol1/fastq/ERR433/003/ERR4332593/ERR4332593.fastq.gz 9442394 ftp.sra.ebi.ac.uk/vol1/run/ERR433/ERR4332593/Ash040_all.merged.hs37d5.fa.cons.90perc.bam
Ash129.SG minus ds SAMEA7050457 PRJEB39316 ERR4332594 5 ERS4811087 2021-04-12 2020-07-10 Illumina HiSeq 2500 SINGLE GENOMIC ILLUMINA Ash129_all WGS fasp.sra.ebi.ac.uk:/vol1/fastq/ERR433/004/ERR4332594/ERR4332594.fastq.gz 2503643788 4991d0030214687c7c1706c640a2b37e ftp.sra.ebi.ac.uk/vol1/fastq/ERR433/004/ERR4332594/ERR4332594.fastq.gz 76891816 ftp.sra.ebi.ac.uk/vol1/run/ERR433/ERR4332594/Ash129_all.merged.hs37d5.fa.cons.90perc.bam
Ash131.SG minus ds SAMEA7050458 PRJEB39316 ERR4332631 6 ERS4811088 2021-04-12 2020-07-10 Illumina HiSeq 2500 SINGLE GENOMIC ILLUMINA Ash131_all WGS fasp.sra.ebi.ac.uk:/vol1/fastq/ERR433/001/ERR4332631/ERR4332631.fastq.gz 535696334 3c680096683defd93b35a5f4004e9e87 ftp.sra.ebi.ac.uk/vol1/fastq/ERR433/001/ERR4332631/ERR4332631.fastq.gz 16781594 ftp.sra.ebi.ac.uk/vol1/run/ERR433/ERR4332631/Ash131_all.merged.hs37d5.fa.cons.90perc.bam
To help with generating such a table from data at the ENA, we have a convenience script which helps downloading the table, except for the udg, library_built, and poseidon_IDs columns, which need to be added manually.
Note that due to the heterogenous way users submit data to these archives, many columns in this table are less formalised than they could be. We generally advice simply using the above script and copy whatever data is in the archives into the respective columns.