API Reference

The recommended way to use the API is to start by looking at the final function distribute_compute_config.write_config_to_file() and work backwards to all the constituent functions.

distribute_compute_config.apptainer_config(meta: Meta, description: Description, slurm: Slurm | None = None) → ApptainerConfig

assemble the metadata() and description() into a config file that can be written to disk

Parameters:

meta – output of metadata() function
description – output of description() function
slurm – global-level attribtues to pass to Slurm created from slurm(). These values will be overriden by job-level Slurm attributes specified in job(). If you do not intend to run this job set on a slurm cluster, you may ignore this value.

You can write this description object to disk with write_config_to_file()

distribute_compute_config.metadata(namespace: str, batch_name: str, capabilities: List[str], matrix_username: str | None = None) → Meta

construct the metadata Meta object for information about what this job batch will run

Parameters:

namespace (str) – namespace where several batch_name runs may live
batch_name (str) – the name of this batch of jobs
capabilities (List[str]) – required capabilities for your job
matrix_username (Optional[str]) – a matrix username in the format @your_username:homeserver_url

for an apptainer job the capabilities are is simply ["apptainer"]. If you need GPU capabilities, use ["apptainer", "gpu"].

Example:

import distribute_compute_config as distribute
matrix_user = "@matrx_id:matrix.org"
batch_name = "test_batch"
namespace = "test_namespace"
capabilities = ["apptainer"]

meta = distribute.metadata(namespace, batch_name, capabilities, matrix_user)

distribute_compute_config.description(initialize: Initialize, jobs: List[Job]) → Description

combines initialization information with a list of jobs to describe the solver files, how they should be started, and what they will run

Parameters:

initialize – You can create an Initialize struct with initialize()
jobs – information for each job can be created from the job() function.

Example:

import distribute_compute_config as distribute

initialize = distribute.initialize(
    sif_path="./some/path/to/file.sif",
    required_files=[],
    required_mounts=[]
)

job_1_config_file = distribute.file(
    "./path/to/config1.json",
    alias="config.json",
    relative=True
)

job_1 = distribute.job("job_1", [job_1_config_file])

description = distribute.description(initialize, jobs=[job_1])

distribute_compute_config.file(path: str, relative=False, alias=None) → File

create a file to appear in the /input directory of the solver

Parameters:

path – a path to the file on disk. path should be an absolute, unless relative=True is also specified. If relative=True is not speficied, the full directory structure to the file must already exist.
relative – a flag for if the path specified is a relative path or not. Defaults to False
alias – how the file should be renamed when it appears in the /input directory of your solver. If no alias is specified, the current name of the file will be used.

For example, a file with a path ./path/to/config_1.json will appear as /input/config_1.json. with alias="config.json", this file will appear as /input/config.json

Example

import distribute_compute_config as config

# config1.json appears as `/input/config.json`
config_file_1 = distribute.file(
    "./path/to/config1.json",
    alias="config.json",
    relative=True
)

# config2.json appears as `/input/config2.json`
config_file_2 = distribute.file(
    "./path/to/config2.json",
    relative=True
)

# config3.json appears as `/input/config3.json`, folder structure
# `/root/path/to/` must already exist
config_file_3 = distribute.file("/root/path/to/config3.json")

distribute_compute_config.initialize(sif_path: str, required_files: List[File], required_mounts: List[str]) → Initialize

create the initialize section for loading apptainer .sif files, required container mounts, and input files present in all runs.

Also see the documentation for creating a File with the file() function

Parameters:

sif_path – The path to the .sif file produced by apptainer build
required_files – These files appear in the /input directory of every job, in addition to the File specified in each Job
required_mounts – a list of strings to paths inside the container that should be mutable.

Example:

import distribute_compute_config as distribute

sif_path = "./path/to/some/container.sif"

initial_condition = distribute.file(
    "./path/to/some/file.h5",
    relative=True,
    alias="initial_condition.h5"
)
required_files = [initial_condition]

required_mounts = ["/solver/extra_mount"]

initialize = distribute.initialize(sif_path, required_files, required_mounts)

distribute_compute_config.job(name: str, required_files: List[File], slurm: Slurm | None = None) → Job

creates a job from its name and the required input files

once you have a list of jobs that should be run in the batch, move on to creating a description()

Also see the documentation for creating a File with the file() function

Parameters:

name – the name of the job. It should be unique in combination with the batch_name of this job.
required_files – a python list of files that should appear in the /input directory when this job is run, along with the required_files specified in initialize().
slurm – job-level attribtues to pass to Slurm created in slurm(). These values will override global Slurm attributes specified in apptainer_config(). If you do not intend to run this job set on a slurm cluster, you may ignore this value.

name should therefore be unique to this batch since batch_name remains constant. the name should be slightly descriptive of the content of what the job will be running. This will make it easier to use distribute pull to download the files later

File types can be constructed with the file() function

Example:

import distribute_compute_config as distribute

job_1_config_file = distribute.file(
    "./path/to/config1.json",
    alias="config.json",
    relative=True
)
job_1_required_files = [job_1_config_file]

job_1 = distribute.job("job_1", job_1_required_files)

distribute_compute_config.write_config_to_file(config: ApptainerConfig, path: str)

write an apptainer_config() to a path

Parameters:

config – output of apptainer_config() function
path – the path that the config file should be written to, usually with the name distribute-jobs.yaml

Example:

See the User Documentation page in on the python api for a worked example

Configure the attributes that will be passed to Slurm if you intend to run the job set on a cluster.

Parameters:

job_name – the name of the job as it will appear in Slurm. defaults to the job name chosen in the distribute configuration file
output – file name where where the stdout of the process will be dumped.
nodes – the number of Slurm nodes to use. This essentially corresponds to the number of physical CPUs units you would like your job to run across. On pronghorn, with no multithreading, more than 32 ntasks will require more than 1 node.
ntasks – the number of tasks to use in Slurm. This should correspond to the number of MPI processes you would like to use. If you would execute your job with mpirun -np 4, then this value would be 4.
cpus_per_task – The number of CPUs that each task will use. Most likely, this parameter should be set to 1. If you have 16 physical cores on a CPU, and ntasks=4, and you want to fully utilize the CPU, this number would be set to 4
mem_per_cpu – The amount of memory that each cpu should be allocated. To specify an amount in gigabytes (megabytes), use G (M) as the suffix. For example, requesting 100 megabytes of memory for each cpu would be 100M
hint – Any hints you want to pass to Slurm. This can be blank, or possibly nomultithread as well as any other valid Slurm hint.
time – The amount of time your job will require, in the format of HH:MM:SS. For a 1 hour and 20 minute job, this would be 01:20:00
partition – The slurm partition you wish to run on. This is probably cpu-core-0 on pronghorn for CPU tasks.
account – The billing account attached to this job.
mail_user – An email address to send mail to after the job completes.
mail_type – Email type. Possibly ALL

When generating slurm routines, distribute will default all values to the root level slurm value specified in apptainer_config(), and then override of these values with the values of job(). With this, you may set global attribtues (such as the number of tasks to use, the number of nodes to request, your email, etc), and then override them at the job-level with specifics that each job requires.

If all jobs that you are submitting in this batch are homogeneous (for example, same grid size, time step, etc), then there is little need to specify job-level slurm attribtues.

Example:

slurm = distribute.slurm(
    output = "output.txt", 
    nodes = 1, 
    ntasks = 4, 
    cpus_per_task = 1, 
    # 10 megabytes of memory allocated
    mem_per_cpu = "10M",
    hint = "nomultithread",
    # 30 minutes of runtime
    time = "00:30:00",
    partition = "cpu-core-0",
    account = "my_account"
)