Configuration#

Most of the choices that HyperShell makes about timing, task bundling, coordination, logging, and such are configurable by the user. This configuration is loaded when HyperShell starts and is constructed from several sources including an ordered merger of files, environment variables, and command-line options.

In order of precedence (lowest to highest), three files are loaded:


Level

BSD/Linux

Windows

System

/etc/hypershell.toml

%ProgramData%\HyperShell\Config.toml

User

~/.hypershell/config.toml

%AppData%\HyperShell\Config.toml

Local

./.hypershell/config.toml

.\.hypershell\Config.toml


The TOML format is modern and minimal.

Every configurable option can be set in one of these files. Further, every option can also be set by an environment variable, where the name aligns to the path to that option, delimited by underscores.

For example, set the logging level at the user level with a command:

Set user-level configuration option

hyper-shell config set logging.level info --user

The file should now look something like this:

~/.hypershell/config.toml

# File automatically created on 2022-07-02 11:57:29.332993
# Settings here are merged automatically with defaults and environment variables

[logging]
level = "info"

Alternatively, you can set an environment variable and the runtime configuration would be equivalent:

Define environment variable

export HYPERSHELL_LOGGING_LEVEL=INFO

Finally, any option defined within a configuration file that ends with _env or _eval is automatically expanded by the given environment variable or shell expression, respectively. This is useful as both a dynamic feature but also as a means to obfuscate sensitive information, such as database connection details.

~/.hypershell/config.toml

# File automatically created on 2022-07-02 11:57:29.332993
# Settings here are merged automatically with defaults and environment variables

[logging]
level = "info"

[database]
provider = "postgres"
database = "hypershell"
host = "my.instance.university.edu"
user = "me"
password_eval = "pass hypershell/database/password"  # Decrypt using GNU Pass


Parameter Reference#

[logging]

Logging configuration. See also logging section.

.level

One of DEVEL, TRACE, DEBUG, INFO, WARNING, ERROR, or CRITICAL (default: WARNING)

INFO level messages are reserved for clients when tasks begin running. There are numerous WARNING events (e.g., non-zero exit status of a task). DEBUG level messages signal component thread start/stop and individual task level behavior. TRACE contains detailed information on all other behavior, particular iterative messages while components are waiting for something.

ERROR messages track when things fail but the application can continue; e.g., when command template expansion fails on an individual task.

CRITICAL messages are emitted when the application will halt or crash. Some of these are expected (such as incorrect command-line arguments) but in the event of an uncaught exception within the application a full traceback is written to a file and logged.

DEVEL messages are meant for development purposes and track every single state-transition in all component threads.

.datefmt

Date/time format, standard codes apply (default: '%Y-%m-%d %H:%M:%S'`)

.format

Log message format.

Default set by the “default” logging.style. See the available attributes defined by the underlying Python logging interface.

Additional attributes provided beyond the standard include app_id, hostname, hostname_short, relative_name, time formats in elapsed, elapsed_ms, elapsed_delta, and elapsed_hms, as well as all ANSI colors and formats as ansi_x where x is one of reset, bold, faint, italic, underline, black, red, green, yellow, blue, magenta, cyan, white, and ansi_level contains the standard color for the current message severity level.

.style

Presets for logging.format which can be difficult to define correctly. Options are default, detailed, detailed-compact, and system.

[database]

Database configuration and connection details. See also database section.

.provider

Database provider (default: ‘sqlite’). Supported alternatives include ‘postgres’ (or compatible). Support for other providers may be considered in the future.

.file

Only applicable for SQLite provider. SQLite does not understand any other connection detail.

.database

Name for database. Not applicable for SQLite.

.schema

Not applicable for all RDMS providers. For PostgreSQL the default schema is public. Specifying the schema may be useful for having multiple instances within the same database.

.host

Hostname or address of database server (default: localhost).

.port

Port number to connect with database server. The default value depends on the provider, e.g., 5432 for PostgreSQL.

.user

Username for databaser server account. If provided a password must also be provided. Default is the local account.

.password

Password for database server account. If provided a user must also be provided. Default is the local account.

See also note on _env and _eval.

.echo

Special parameter enables verbose logging of all database transactions.

[connection_args]

Specify additional connection details for the underlying SQL dialect provider, e.g., sqlite3 or psycopg2.

*

Any additional arguments are forwarded to the provider, e.g., encoding = 'utf-8'.

[server]

Section for server workflow parameters.

.bind

Bind address (default: localhost).

When running locally, the default is recommended. To allow remote clients to connect over the network, bind the server to 0.0.0.0.

.port

Port number (default: 50001).

This is an arbitrary choice and simply must be an available port. The default option chosen here is typically available on most platforms and is not expected by any known major software.

.auth

Cryptographic authorization key to connect with server (default: <not secure>).

The default KEY used by the server and client is not secure and only a place holder. It is expected that the user choose a secure KEY. The cluster automatically generates a secure one-time KEY.

.queuesize

Maximum number of task bundles on the shared queue (default: 1).

This blocks the next bundle from being published by the scheduler until a client has taken the current prepared bundle. On smaller scales this is probably best and is only of modest performance impact, limiting the scheduler from getting so far ahead of the currently running tasks.

On large scale workflows with many clients (e.g., 100) it may be advantageous to allow the scheduler to work ahead in selecting new tasks.

.bundlesize

Size of task bundle (default: 1).

The default value allows for greater concurrency and responsiveness on small scales. This is used by the submit thread to accumulate bundles for either database commits and/or publishing to the queue. If a database is in use, the scheduler thread selects tasks from the database in batches of this size.

Using larger bundles is a good idea for large distributed workflows; specifically, it is best to coordinate bundle size with the number of executors in use by each client.

See also -b/--bundlesize command-line option.

.attempts

Attempts for auto-retry on failed tasks (default: 1).

If a database is in use, then there is an opportunity to automatically retry failed tasks. A task is considered to have failed if it has a non-zero exit status. The original is not over-written, a new task is submitted and later scheduled.

Counterpart to the -r/--max-retries command-line option. Setting --max-retries 1 is equivalent to setting .attempts to 2.

See also .eager.

.eager

Schedule failed tasks before new tasks (default: false).

If .attempts is greater than one, this option defines the appetite for re-submitting failed tasks. By default, failed tasks will only be scheduled when there are no more remaining novel tasks.

.wait

Polling interval in seconds for database queries during scheduling (default: 5). This waiting only occurs when no tasks are returned by the query.

.evict

Eviction period in seconds for clients (default: 600).

If a client fails to register a heartbeat after this period of time it is considered defunct and is evicted. When there are no more tasks to schedule the server sends a disconnect request to all registered clients, and waits until a confirmation is returned for each. If a client is defunct, this will hang the shutdown process.

[client]

Section for client workflow parameters.

.bundlesize

Size of task bundle (default: 1).

The default value allows for greater concurrency and responsiveness on small scales.

Using larger bundles is a good idea for larger distributed workflows; specifically, it is best to coordinate bundle size with the number of executors in use by each client. It is also a good idea to coordinate bundle size between the client and server so that the client returns the same sized bundles that it receives.

See also -b/--bundlesize command-line option.

.bundlewait

Seconds to wait before flushing task bundle (default: 5).

If this period of time expires since the previous bundle was returned to the server, the current group of finished tasks will be pushed regardless of bundlesize.

For larger distributed workflows it is a good idea to make this waiting period sufficiently long so that most bundles are returned whole.

See also -w/--bundlewait command-line option.

.heartrate

Interval in seconds between heartbeats sent to server (default 10).

Even on the largest scales the default interval should be fine.

.timeout

Timeout in seconds for client. Automatically shutdown if no tasks received (default: never).

This feature allows for gracefully scaling down a cluster when task throughput subsides.

[submit]

Section for submit workflow parameters.

.bundlesize

Size of task bundle (default: 1).

The default value allows for greater concurrency and responsiveness on small scales. Using larger bundles is a good idea for large distributed workflows; specifically, it is best to coordinate bundle size with the number of executors in use by each client.

See also -b/--bundlesize command-line option.

.bundlewait

Seconds to wait before flushing tasks (default: 5).

If this period of time expires since the previous bundle was pushed to the database, the current bundle will be pushed regardless of how many tasks have been accumulated.

See also -w/--bundlewait command-line option.

[task]

Section for task runtime settings.

.cwd

Explicitly set the working directory for all tasks.

.timeout

Task-level walltime limit (default: none).

Executors will send a progression of SIGINT, SIGTERM, and SIGKILL. If the process still persists the executor itself will shutdown.

[ssh]

SSH configuration section.

.args

SSH connection arguments; e.g., -i ~/.ssh/some.key. It is preferable to configure SSH directly however, in ~/.ssh/config.

[nodelist]

This can be a single list of hostnames or a section when multiple named lists. Reference named groups from the command-line with --ssh-group.

Such as,

.mycluster = ['mycluster-01', 'mycluster-02', 'mycluster-03']

[autoscale]

Define an autoscaling policy and parameters.

.policy

Either fixed or dynamic.

A fixed policy will seek to maintain a definite size and allows for recovery in the event that clients halt for some reason (e.g., due to expected faults or timeouts).

A dynamic policy maintains a minimum size and grows up to some maximum size depending on the observed task pressure given the specified scaling factor.

See also .factor, .period, .size.init, .size.min, and .size.max.

.factor

Scaling factor (default: 1).

A dimensionless quantity used by the dynamic policy. This value expresses some multiple of the average task duration in seconds.

The autoscaler periodically checks toc / (factor x avg_duration), where toc is the estimated time of completion for all remaining tasks given current throughput of active clients. This ratio is referred to as task pressure, and if it exceeds 1, the pressure is considered high and we will add another client if we are not already at the maximum size of the cluster.

For example, if the average task length is 30 minutes, and we set factor = 2, then if the estimated time of completion of remaining tasks given currently connected executors exceeds 1 hour, we will scale up by one unit.

See also .period.

.period

Scaling period in seconds (default: 60).

The autoscaler waits for this period of time in between checks and scaling events. A shorter period makes the scaling behavior more responsive but can effect database performance if checks happen too rapidly.

[size]
.init

Initial size of cluster (default: 1).

When the the cluster starts, this number of clients will be launched. For a fixed policy cluster, this should be given with a .min size, and likely the same value.

.min

Minimum size of cluster (default: 0).

Regardless of autoscaling policy, if the number of launched clients drops below this value we will scale up by one. Allowing min = 0 is an important feature for efficient use of computing resources in the absence of tasks.

.max

Maximum size of cluster (default: 2).

For a dynamic autoscaling policy, this sets an upper limit on the number of launched clients. WHen this number is reached, scaling stops regardless of task pressure.

[console]

Rich text display and output parameters.

.theme

Color scheme to use by default in output (such as with task info and task search).

This option is passed to the rich library.

[export]

Any variable defined here is injected as an environment variable for tasks.

Example,

foo = 1

The environment variable FOO=1 would defined for all tasks.



Task Environment#

A few common environment variables are defined for every task.


TASK_ID

Universal identifier (UUID) for the current task.

TASK_ARGS

Original input command-line argument line. Equivalent to {}, see templates section.

TASK_SUBMIT_ID

Universal identifier (UUID) for submitting application instance.

TASK_SUBMIT_HOST

Hostname of submitting application instance.

TASK_SUBMIT_TIME

Timestamp task was submitted.

TASK_SERVER_ID

Universal identifier (UUID) for server application instance.

TASK_SERVER_HOST

Hostname of server application instance.

TASK_SCHEDULE_TIME

Timestamp task was scheduled by server.

TASK_CLIENT_ID

Universal identifier (UUID) for client application instance.

TASK_CLIENT_HOST

Hostname of client application instance.

TASK_COMMAND

Final command line for task.

TASK_ATTEMPT

Integer number of attempts for current task (starts at 1).

TASK_PREVIOUS_ID

Universal identifier (UUID) for previous attempt (if any).

TASK_CWD

Current working directory for the current task.

TASK_OUTPATH

Absolute file path where standard output is directed (if defined).

TASK_ERRPATH

Absolute file path where standard error is directed (if defined).


Further, any environment variable starting with HYPERSHELL_EXPORT_ will be injected into the task environment sans prefix; e.g., HYPERSHELL_EXPORT_FOO would define FOO in the task environment. You can also define such variables in the export section of your configuration file(s); e.g.,

~/.hypershell/config.toml

# File automatically created on 2022-07-02 11:57:29.332993
# Settings here are merged automatically with defaults and environment variables

[logging]
level = "info"

# Options defined as a list will be joined with a ":" on BSD/Linux or ";" on Windows
# Environment variables will be in all-caps (e.g., FOO and PATH).
[export]
foo = "value"
path = ["/some/bin", "/some/other/bin"]