Python

Python configuration file templates can be generated as follows:

distribute template python

At the time of writing, it outputs something like this:

---
meta:
  batch_name: your_jobset_name
  namespace: example_namespace
  matrix: ~
  capabilities:
    - gfortran
    - python3
    - apptainer
python:
  initialize:
    build_file: 
	  path: /path/to/build.py
    required_files:
      - path: /file/always/present/1.txt
        alias: optional_alias.txt
      - path: /another/file/2.json
        alias: ~
      - path: /maybe/python/utils_file.py
        alias: ~
  jobs:
    - name: job_1
      file: 
	    path: execute_job.py
      required_files:
        - path: job_configuration_file.json
          alias: ~
        - path: job_configuration_file_with_alias.json
          alias: input.json

meta section

The meta section is identical to the meta section of python.

The namespace field is a string that describes all of the batch_names that are a part of this namespace.

The batch_name is an a simple string to identify what jobs are in this batch.

If your distribute cluster has been configured by your adminstrator for matrix support, you can specify the matrix field to be your matrix username in the form of @your-user:matrix.org and you will receive a message when all of the jobs in this job batch finish.

The capabilities section is a list of dependencies that are required to be present on the compute node. Since you are responsible for bundling all your dependencies in your apptainer image, the only capabilities requires here are apptainer. If you require a GPU on the compute node, you can also specify the gpu capability.

On the server, each batch_name must be unique to each namespace. This means the results of submitting two batches of batch name name_xyz with identical namespaces is undefined. This is because on the server each batch name is a subdirectory of the namespace. For example, if we have run two job batches job_batch_01 and job_batch_02 under the namespace example_namespace, the directory structure might look like

example_namespace/
├── job_batch_01
│   ├── job_1
│   │   └── stdout.txt
│   ├── job_2
│   │   └── stdout.txt
│   └── job_3
│       └── stdout.txt
└── job_batch_02
    ├── job_1
    │   └── stdout.txt
    ├── job_2
    │   └── stdout.txt
    └── job_3
        └── stdout.txt

For example, if I am running many aeroacoustic studies with OpenFoam, a namespace might be austin_openfoam_aeroacoustic, and then I run batches with incremented names, supersonic_aeroacoustic_00X, subsonic_aeroacoustic_00X, transsonic_aeroacoustic_00X. My meta section for one of these job batches would look like:

meta:
  batch_name: subsonic_aeroacoustic_004
  namespace: austin_openfoam_aeroacoustic
  matrix: @my-username:matrix.org
  capabilities:
    - apptainer
    - gpu

python section

the python section has two subsections: initialize which handles compiling and setting up all code for execution, and jobs which specifies a list of jobs that are part of this job batch.

initialize section

The build_file key contains a path to a python file. This python file is responsible for compiling and setting up an environment on the system to execute all of the jobs in the job batch.

The required_files contains a list of files, each file with a path and alias. The path is a path to a file on disk, and the alias is a filename as it will appear in the /input directory when compiling all related tasks. An in depth explanation of input files is detailed in the input files section

jobs

This field contains the list of jobs, the file used to execute the job, and the location to other input files on disk. For each job, there are three configuration keys:

name: the name of the job in this batch. Job names must be unique to the batch_name field from the meta section: there cannot be two jobs with the same name in the same batch.

file: contains a path key to a python file on disk that will be used to operate on any input files and execute any required code for this job.

required_files: a list of files, each containing a path key and optional alias key. These files are identical in configuration to the required_files from the initialize field above. Each file has path (a location on disk) and an optional alias field, or the name of the file as it will be represented in the /input directory when that job is run.

For example, if the jobs section of the config file contained

  jobs:
    - name: job_1
      file: 
	    path: execute_job.py
      required_files:
        - path: job_configuration_file.json
          alias: ~
        - path: job_configuration_file_with_alias.json
          alias: input.json

then this batch would have one job named job_1. When this job is executed, the ./input directory of the apptainer image will contain a file job_configuration_file.json and input.json since the second file was aliased. Additionally, any files specified in the required_files of the initialize section will also be present in the ./input directory.

At the start of the execution, your execute_job.py file will be called and it will operate on the files in ./input to produce results. Results are stored in the ./distribute_save directory.