no symlinks can be added to this repo because: 1) this repo can be mirrored to storage platforms that do not support symlinks 2) pythons zipfile lib does not support symlinks
rdg_datasets
- <name_of_some_rdg>/
- [conditional] import.py
- [conditional] generate.py
- [conditional] migrate.py
- [conditional] README.md
- storage_format_version_N/<rdg_contents>
- storage_format_version_N+1/<rdg_contents>
- storage_format_version_N+2/<rdg_contents>
csv_datasets
- <name_of_some_csv_dataset>/
- <csv_dataset_contents>
misc_datasets
- <name_of_some_misc_dataset>
- README.md describing what this misc dataset is and what it can be used for
if a rdg can be imported from csv, if possible, its directory should be named identically to the csv datasets directory
misc_datasets should be used only when the dataset does not fall into one of the other categories
The following scripts are conditionally present for each RDG. One of the following options must be present.
Script to import this RDG from CSV When to use: This is the preferred option. If the RDG can be imported from CSV, do this.
Script to generate this RDG in special way When to use: To be used if this RDG is not importable from CSV. If generation is difficult to automate, consider using migrate.
Script to migrate this RDG from its current storage_format_version to the latest When to use: Some special RDGs are not easy to generate in an automatible fashion, but can be easily migrated to the latest storage_format_version by loading/storing them.
Describes how to generate the RDG manually if it is not feasible to generate or migrate it. If this RDG cannot be created by any of the above scripts, the steps to create it must be described in detail here. If this is a special RDG, created to cover a specific test case, describe how it is special and the test case(s) here.
- the "main" function must be called uprev to be found by the global uprev script
- the uprev function must return the path to where the new rdg can be found
- the scripts must keep the organization outlined above
- ensure the most recent master commit of this repo is checked out:
git checkout main; git pull
- run
./uprev build_tools --build_dir=<katana_build_dir>
- run
./uprev rdgs --help
to see the required args
- ex:
./uprev rdgs --storage_format_version <N> --build_dir <katana_build_dir>
- ensure
./uprev validate
passes for all rdgs - make a new commit with the message
upreved rdgs to storage_format_version_M
- create a katana repo PR to bump up the version of this submodule
Required information:
- name of your rdg
storage_format_version
of your rdg- you can see what storage_format_version you rdg is by running
grep -rni "storage_format_version" *
in the directory containing your rdg - if there are no matches, your rdg is
storage_format_version_1
- you can see what storage_format_version you rdg is by running
- wherever you see
<rdg-name>
replace it with the name of your rdg
- take a look at the organization section above, specifically the
rdg_datasets
section - create a directory in
rdg_datasets
called<rdg-name>
- create a
storage_format_version_#
directory in your<rdg-name>
directory
- ensure the
storage_format_verion_3
matches the version in the rdg
- put the rdg contents in the
storage_format_verison_#
directory
- it is important that the contents of the rdg are directly in the
storage_format_version_#
directory, and not nested inside another directory
- create a
README.md
in your<rdg-name>
directory for your rdg with general notes about what this rdg tests, and how it was created. - copy one of
[migrate.py, import.py, generate.py]
from another rdg to your<rdg-name>
directory
- take a look at the definitions of each of these scripts above to see which is appropriate
- modify the scripts variables to match your rdg
Now your rdg can be easily upreved to the latest storage_format_version