This is a collection of tools to generate random sample data in a given directory and simulate file operations on the generated data.
The suite of tools can be deployed on a Kubernetes cluster using the provided Ansible Playbook.
file-generator.py is a Python program that generates sample data in any given directory. It uses following command line arguments:
Argument | Type | Required | Description |
---|---|---|---|
--size | string | Yes | Total size of random data to create in supported units: b,Ki,Mi,Gi,Ti |
--max-files | int | Yes | Maximum number of files to create |
--min-files | int | No | Minimum number of files to create (Defaults to 1) |
--dest-dir | string | Yes | Destination directory for generated data |
--help, -h | - | No | Print help |
When --min-files
and --max-files
are provided, the program spreads the total size --size
over number of files in the range [min-files, max-files].
file-operations.py is a Python program that performs different file operations on files in the given directory. The operations are performed randomly.
The supported file operations are:
- Creating a new file
- Appending data to an existing file
- Deleting an existing file
- Removing bytes from an existing file
- Changing permissions on an existing file
It uses following command line arguments:
Argument | Type | Required | Description |
---|---|---|---|
--buffer | string | Yes | Extra wiggle room for file operations in supported units: b,Ki,Mi,Gi,Ti |
--dest-dir | string | Yes | Destination directory for generated data |
--help, -h | - | No | Print help |
In some of the file operations, the program may create additional data in existing files. --buffer
option allows setting an upper limit on the additional data created by the program.
Sometimes, you might want to pause the file operations. You can do that by setting PAUSE_OPERATIONS
environment variable to True
. The operations will resume when it is set to False
.
The file operations has a scanner thread running in the background which periodically updates the list of the files. You can set a custom time interval for scanner using SCANNER_INTERVAL
environment variable. By default, it is set to 120
in seconds. If the destination directory contains a huge number of files, consider setting this to a higher value. For Kubernetes deployment, both of the above environment variables are passed through configmap settings
:
kind: ConfigMap
apiVersion: v1
metadata:
name: settings
data:
OPERATOR_PAUSE: False
SCANNER_INTERVAL: 600
To deploy the above workloads on a Kubernetes cluster, simply run the Ansible Playbook:
ansible-playbook playbook.yml
The above playbook will create a deployment which launches a Pod with 2 containers, one of them runs file-generator.py to create random data in a Persistent Volume, while the other one runs file-operations.py to perform random operations on the generated data.
The playbook uses defaults.yml for configuration. Here are the available options to configure the playbook:
Variable | Description |
---|---|
file_size | Sets --size option |
max_files | Sets --max-files option |
min_files | Sets --min-files option |
pvc_size | Size of volume (needs to be greater than or equal to file_size option) |
buffer | Sets --buffer option |
namespace | Namespace for workload |
deployment_name | Name of the workload deployment |
image | Workload docker image (See this section to build your own image) |
destroy | Deletes the workload when set to true |
To build your own image, simply run:
docker build -t <your_image> -f Dockerfile .
To push, run:
docker push <your_image>
Use image
variable to use your own image in Ansible Playbook for the workload.