pytasuku
is a task execution system implemented in Python. Think of it as a tool similar to GNU Make but you have to implement the command line interface yourself.
The code should work with Python version 3.8 or later. The UI is implemented with tkinter
, which should come automatically with your Python distribution.
You can just copy the src/pytasuku
to your source code repository, or you can also install it with the following tools.
pip install git+https://github.com/pkhungurn/pytasuku.git
poetry add git+https://github.com/pkhungurn/pytasuku.git
tasuku
allows you to define "tasks." A task is a piece of computation that you want to run like like compiling some code, linking some programs, creating/removing files, and so on. A task can be dependent on other tasks, which means that the dependent task can only be executed only after all of its dependencies have been executed. In this way, you can create a dependency graph between tasks in which tasks are vertices, and you draw a directed edge from a dependency to each task that depends on it. tasuku
ensures that the graph is well formed; that is, the graph has no loops. When you use tasuku
to execute a task, it takes care to traverse the dependency graph and execute tasks in the right topological order.
Similar to Make, there two main types of tasks.
- A command tasks is a task that is always executed when invoked or when one of its dependencies need to be executed.
- A file tasks is a task that produces a file. It is executed if (1) the output file does not exist, (2) the output file's timestamp is older than one of its (transitive) dependency file tasks, or (3) one of its dependency was invoked. The idea is that a file task is only executed when it is needed to be updated.
It is not advisable to make a file task dependent on a command task because the file task will be always be executed regardless of the file's timestamp.
A task name is similar to a path name of files. The path is always relative to the current directory. For example, you can have
a.txt
b/c.txt
b/d/e.txt
b/create_all
b/remove_all
The first three tasks are supposed to be file tasks (which generally take the name of their output files), and the last two command tasks. We can see that tasks can form directory structures like files, and this is a nice way to organize tasks when there's a mixture of file and command tasks.
Executing a tasks requires you to know its name. Remembering a task's name when you have create several tens of them can be daunting. tasuku
comes with a UI that helps you navigate the task directory structure in order to pick one task to execute.
I use Poetry to maintain dependencies. Follow the instruction here to install it into your system.
Next, you need to create a Python environment with Python of at least version 3.8. For example, I used Anaconda to do the job. After installing Anaconda, I ran the following command in my shell.
conda create -n pytasuku python=3.8
Then, you can activate the environment by running the command below.
conda activate pytasuku
Next, clone this repository, and change your working directory to the repo's directory. Invoke Poetry to install the package.
poetry intall
There's example code in the src/example
directory. Execute
poetry run python src/example/run_ui.py
to run the task-picking UI. To execute a task directly, run a command like
poetry run python src/example/run.py <task-name>
For example, to create all the files prepared as parts the example run:
poetry run python src/example/run.py data/create_all
To delete all the files to start the process over, run:
poetry run python src/example/run.py data/delete_all
A workspace is an object that keeps track of tasks and their dependencies. It allows you to execute tasks in the correct topological order. Before you can define any tasks, you need to create an instance of the Workspace
class.
from pytasuku import Workspace
workspace = Workspace()
A task is just a Python function that has no arguments and returns nothing.
A command task can be defined by decorating a function with the @command_task
decorator. The decorator takes three arguments, in order.
- A workspace that is going to hold the task.
- The task name.
- A list of names of the task's dependencies.
Below, we create two tasks. The second depends on the first.
from pytasuku import command_task
@command_task(workspace, 'task_0', [])
def run_task_0():
print("Running task_0")
@command_task(workspace, 'task_1', ['task_0'])
def run_task_1():
print("Running task_1")
Similarly, a file task can be created using the @file_task
decorator, and it takes in the same argument as the @command_task
decorator. One thing to keep in mind is that you are responsible for making sure that the task you define actually the file that is the name of the task. It won't work otherwise.
from pytasuku import file_task
@file_task(workspace, 'a.txt', [])
def create_a_txt():
with open('a.txt', 'wt') as fout:
fout.write("AAA")
@file_task(workspace, 'b.txt', ['a.txt'])
def create_b_txt():
with open('b.txt', 'wt') as fout:
fout.write("BBB")
@command_task(workspace, 'create_files', ['a.txt', 'b.txt'])
def create_files():
pass
In the above listing, we create two file tasks, a.txt
and b.txt
, where the second depends on the first. Note that the functions for these tasks actually write new files whose names are the task names. IT IS THE RESPONSIBILITY OF YOU, THE USER, TO DO THIS. Lastly, we created a command task that depends on the file tasks. The command task itself does not do anything. However, since it depends on the two file tasks, the system will check whether the files exists and are updated every time you invoke the command task. If not, the files will be created.
To run a task, you need to start a session. It is a time period dedicated to running tasks. When you start a session, the system will build a dependency graph of the task, freeze it, and checks whether the graph is well-formed or not. If not, the system will complain and throw an exception. If not you, can run tasks by calling the run
method of the Workspace
class.
The Workspace
class has the start_session
and end_session
methods that do what their name say. So, the following snippet to run the create_files
task we just defined.
workspace.start_session()
workspace.run("create_files")
workspace.end_session()
However, there's also the session
method that can be used with Python's with
clause. This method is a context manager that starts a session when you enter the method and ends it when you leave.
with workspace.session():
workspace.run("create_files")
After a session ends, you can start creating tasks again.
If you do not want to specify a task's name programmatically, you can run a task selector UI that allows you to pick a task, and the UI will take care of creating a session and running it. Invoking the UI is very simple, just give the workspace (after you have created all the tasks) to the run_task_selector_ui
function.
from pytasuku.task_selector_ui import run_task_selector_ui
run_task_selector_ui(workspace)
The src/example
directory contains an example of how I usually organize my project. It has three main files.
src/example/tasks.py
is responsible for defining the tasks, which is done in thedefine_tasks
function that takes aWorkspace
as an argument.src/example/run.py
is the command line interface of the system.src/example/run_ui.py
runs the task selector UI.
tasuku
is one of my self-created software libraries that I always rely on when I have to manage multi-step computation. As an example, in machine learning research, your workflow might look like the following.
- Download some raw data from the web.
- Split the data into training, validation, and test datasets.
- Train ML models on the training dataset with under several hyperparameter settings.
- Evaluate the models using the validation dataset.
- Pick the best model according to some metrics.
- Evaluate the best model using the test dataset.
You can see that each step (except for the first one) depends on those that come before it. Moreover, some of the steps (like Step 3) can take a very long time to complete.
It does not make sense to implement the steps in a single program that runs them sequentially. Some of the steps can fail (e.g., because of bugs in your code, or because a blackout while you are training your models), and you might want to retry them again. A sequential program would redo everything from scratch, not just the only parts that you want to retry. tasuku
allows you to take advantage of "cached" results, pretty much like Make would only build only parts of a program that need to be changed when you modify a source file.
I created tasuku
instead of using other build tools such as Make, Rake, Gradle, or Bazel because I would like to have more control on the system. This gives me freedom to define what tasks are, dicate the format or task names, and build the task-picking UI without having to study existing systems in details.
- [2022/01/03] v0.1.0: First release.