Output files
Here, using the results produced in the Non-bacterial dataset section, we give users a glimpse over the main outputs produced by MpGAP. The command used in there wrote the results under the genome_assembly
directory.
Note
Please take note that the pipeline uses the directory set with the --output
parameter as a storage place in which it will create a folder for the final results, separated by sample, technology and assembly strategy.
Directory tree
After a successful execution, you will have something like this:
# Directory tree from the running dir
genome_assembly
├── aspergillus_fumigatus # directory containing the assembly results for a given sample these are written with the 'id' value. In our example we have only one, but if input data samplesheet had more samples we would have one sub-directory for each.
│ └── longreads_only # results for long reads only assembly. A sub-directory is created for results of each assembly strategy to allow you running multiple strategies at once
│ ├── 00_quality_assessment # QC reports
│ ├── canu # Canu assembly
│ ├── flye # Flye assembly
│ ├── medaka_polished_contigs # Assemblies of all assemblers polished with medaka
│ ├── raven # Raven assembly
│ ├── shasta # Shasta assembly
│ └── wtdbg2 # Shasta assembly
├── bacannot_samplesheet.yml # a template input ready for bacannot pipeline
├── final_assemblies # A folder contatining a copy of all the assemblies generated, raw and polished
│ ├── aspergillus_fumigatus_canu_assembly.fasta
│ ├── aspergillus_fumigatus_canu_medaka_consensus.fa
│ ├── aspergillus_fumigatus_flye_assembly.fasta
│ ├── aspergillus_fumigatus_flye_medaka_consensus.fa
│ ├── < ... > etc.
├── input.yml # Copy of given input samplesheet for data provenance
└── pipeline_info # directory containing the nextflow execution reports
├── mpgap_report_2023-12-28_12-25-18.html
├── mpgap_timeline_2023-12-28_12-25-18.html
└── mpgap_tracing_2023-12-28_12-25-18.txt
The pre-formatted Bacannot input samplesheet
Once finished, the pipeline also generates a file called bacannot_samplesheet.yml
(showed below). Basically this samplesheet defines all the minimum definitions in order to annotate these generated genomes using the Bacannot pipeline.
samplesheet:
- id: aspergillus_fumigatus_shasta
assembly: /array1/falmeida/git_repos/MpGAP/testing/results/final_assemblies/aspergillus_fumigatus_shasta_assembly.fasta
- id: aspergillus_fumigatus_flye
assembly: /array1/falmeida/git_repos/MpGAP/testing/results/final_assemblies/aspergillus_fumigatus_flye_assembly.fasta
- id: aspergillus_fumigatus_raven
assembly: /array1/falmeida/git_repos/MpGAP/testing/results/final_assemblies/aspergillus_fumigatus_raven_assembly.fasta
- id: aspergillus_fumigatus_wtdbg2
assembly: /array1/falmeida/git_repos/MpGAP/testing/results/final_assemblies/aspergillus_fumigatus_wtdbg2_assembly.fasta
- id: aspergillus_fumigatus_canu
assembly: /array1/falmeida/git_repos/MpGAP/testing/results/final_assemblies/aspergillus_fumigatus_canu_assembly.fasta
- id: aspergillus_fumigatus_shasta_medaka
assembly: /array1/falmeida/git_repos/MpGAP/testing/results/final_assemblies/aspergillus_fumigatus_shasta_medaka_consensus.fa
- id: aspergillus_fumigatus_canu_medaka
assembly: /array1/falmeida/git_repos/MpGAP/testing/results/final_assemblies/aspergillus_fumigatus_canu_medaka_consensus.fa
- id: aspergillus_fumigatus_raven_medaka
assembly: /array1/falmeida/git_repos/MpGAP/testing/results/final_assemblies/aspergillus_fumigatus_raven_medaka_consensus.fa
- id: aspergillus_fumigatus_flye_medaka
assembly: /array1/falmeida/git_repos/MpGAP/testing/results/final_assemblies/aspergillus_fumigatus_flye_medaka_consensus.fa
- id: aspergillus_fumigatus_wtdbg2_medaka
assembly: /array1/falmeida/git_repos/MpGAP/testing/results/final_assemblies/aspergillus_fumigatus_wtdbg2_medaka_consensus.fa
Note
One must keep in mind that, this template samplesheet contains only the bare minimum to launch bacannot but many other customizations are possible. For example, one can also set, for each input genome, a different resfinder panel if not wanting to run the same for all. And many other things. Therefore, users can/must use this output as a template for easily customization of the bacannot pipeline input to readily use the results of the mpgap pipeline.
For more information, please refer to the Bacannot documentation.
Example of QC outputs
Here I am going to display just a very few examples of results produced, focusing on the QC, as the main result is a normal assembly, performed by each assembler.
Summary of Assembly Statistics in TXT format
Open it here.
MultiQC Report - HTML
Open it here.
Quast Report of Flye assembly - HTML
Open it here.