Basecalling-pipeline
Overview
This Jenkins pipeline is designed for parallelizing the basecalling procedure on multiple nodes through the BC_software. Most commands are executed on the ORFEO cluster through SSH, utilizing the ‘orfeo_jenkins_onpexp’ credentials.
Pipeline Structure
The pipeline consists of the following stages:
1. Cleanup tmp dir
- Purpose: Cleans up from temporary files in logs, output, input, and BC_software directories on the cluster.
- Actions:
- Remove and recreate temporary directories for logs and output.
- Cleanup input directory from previous runs.
- Cleanup BC_software directories from the temporary directories called server_node_(node_name).
2. Pull project repository on the Cluster
- Purpose: Ensures the latest version of the BC-pipelines repository is pulled on the cluster.
- Actions:
- Execute ‘git pull’ in the BC-pipelines directory.
3. Generate setup based on configuration file
- Purpose: Uses the provided configuration file to create the sbatch file for the basecalling procedure.
- Actions:
- Execute configuration.py using the provided JSON file.
- Display the content of the generated script_resources.sh file.
4. Start the basecalling run
- Purpose: Launches the basecalling procedure on the cluster.
- Actions:
- Execute the bash script with the configured parameters.
- Capture the job ID for monitoring.
5. Wait for Basecalling to end
- Purpose: Pauses the pipeline until the basecalling job is completed.
- Actions:
- Use a waiting script to monitor the completion of the basecalling job.
6. Create Final file
- Purpose: Placeholder stage for potential step to recap data about the run.
- Actions:
- Display a message indicating the creation of the final file.
7. Send Report to User
- Purpose: Sends a report to the user, possibly through a messaging service.
- Actions:
- Display a message indicating the report is being sent.
Pipeline Parameters
- configFilePath: Path to the configuration JSON file. Default value is set to ‘/u/area/jenkins_onpexp/BC-pipelines/configurations/config_1_dgx.json’
Triggers
The pipeline is triggered periodically using the ‘pollSCM’ trigger every hour.
In the future, a trigger based on the live reading of a directory will be included.