hyper-shell#
Release v2.3.0 (Getting Started)
HyperShell is an elegant, cross-platform, high-performance computing utility for processing shell commands over a distributed, asynchronous queue. It is a highly scalable workflow automation tool for many-task scenarios.
Several tools offer similar functionality but not all together in a single tool with the user ergonomics we provide. Novel design elements include but are not limited to (1) cross-platform, (2) client-server design, (3) staggered launch for large scales, (4) persistent hosting of the server, and optionally (5) a database in-the-loop for persisting task metadata and automated retries.
HyperShell is pure Python and is tested on Linux, macOS, and Windows 10 in Python 3.9+ environments. The server and client don’t even need to use the same platform simultaneously.
Features#
Simple, Scalable
Take a listing of shell commands and process them in parallel.
In this example, we use the -t
option to specify a template for the input arguments
which are not fully formed shell commands. Larger workloads will want to use a database
for managing tasks and scheduling. In this case, we can run this small example with
--no-db
to disable the database and submit tasks directly to the shared queue.
Hello World
seq 4 | hyper-shell cluster -N2 -t 'echo {}' --no-db
Output
0
1
2
4
Scale out to remote servers with SSH and even define groups in your configuration file.
By default, all command stdout and stderr are joined and written out directly.
Capture individual task stdout and stderr with --capture
.
Set the logging level to INFO
to see each task start or DEBUG
to
see additional detail about what is running, where, and when.
Distributed Cluster over SSH
hyper-shell cluster tasks.in -N4 --ssh-group=xyz --capture
Logs
2022-03-14 12:29:19.659 a00.cluster.xyz INFO [hypershell.client] Running task (5fb74a31-fc38-4535-8b45-c19bc3dbedee)
2022-03-14 12:29:19.665 a01.cluster.xyz INFO [hypershell.client] Running task (c1d32c32-3e76-48e0-b2c3-9420ea20b41b)
2022-03-14 12:29:19.668 a02.cluster.xyz INFO [hypershell.client] Running task (4a6e19ec-d325-468f-a55b-03a797eb51d5)
2022-03-14 12:29:19.671 a03.cluster.xyz INFO [hypershell.client] Running task (09587f55-4b50-4e2b-a528-55c60667b62a)
2022-03-14 12:29:19.674 a04.cluster.xyz INFO [hypershell.client] Running task (1336f778-c9ab-4111-810e-229d572be62e)
Flexible
One of several novel features of hyper-shell, however, is the ability to independently stand up the server on one machine and then connect to that server using a client from a different environment.
Start the hyper-shell server and set the bind address to 0.0.0.0
to allow remote connections.
The server schedules tasks on a distributed queue. It is recommended that you protect your instance
with a private key (-k/--auth
).
Server
hyper-shell server -H '0.0.0.0' -k '<AUTHKEY>' --print < tasks.in > tasks.failed
Connect to the running server from a different host (even from a different platform, e.g., Windows). You can connect with any number of clients from any number of hosts. The separate client connections will each pull tasks off the queue asynchronously, balancing the load.
Client
hyper-shell client -H '<HOSTNAME>' -k '<AUTHKEY>'
Dynamic
Special variables are automatically defined for each individual task. For example, TASK_ID
gives
a unique UUID for each task (regardless of which client executes the task).
Further, any environment variable defined with the HYPERSHELL_EXPORT_
prefix will be injected into
the environment of each task, sans prefix.
Use -t
(short for --template
) to expand a template, {}
can be used to insert the incoming
task arguments (alternatively, use TASK_ARGS
). Be sure to use single quotes to delay the variable
expansion. Many meta-patterns are supported (see full overview of templates):
File operations (e.g., the basename
'{/}'
)Slicing on whitespace (e.g., first
'{[0]}'
, first three'{[:3]}'
, every other'{[::2]}'
)Sub-commands (e.g.,
'{% dirname @ %}'
)Lambda expressions in x (e.g.,
'{= x + 1 =}'
)
Templates
hyper-shell cluster tasks.in -N12 -t './some_program.py {} >outputs/{/-}.out'
Capturing stdout and stderr is supported directly in fact with the --capture
option.
See the full documentation for environment variables under configuration.