d3m-primitives
BYU-DML machine learning algorithms or primitives created for DARPA's D3M project. These primitives are wrapped to fit within the D3M ecosystem.
metafeature_extraction
Extracts metafeatures from tabular data.
random_sampling_imputer
Imputes missing values in tabular data by randomly sampling other known values from the same column.
How to update
When Cloning This Repo
- Clone this repository.
cd
into it. Clone the primitives git submodulegit submodule update --init --recursive
. cd
into thesubmission/primitives
directory, then verify it is synced to the current state of the D3M primitives repo source, and not just to the BYU's fork. See configuring a remote fork and syncing a fork.- Create a
.env
file in the repo root. Populate theDATASETS
andWORKER_ID
environment variables.DATASETS
should be a path to the root folder of the D3M datasets repo.WORKER_ID
should be the ID uniquely identifying the machine any pipelines are run on.
If The Repo Is Already Cloned
- Update the primitives submodule (skip if the submodule was just cloned)
git submodule update --recursive
. - Update the primitives submodule
cd submission/primitives
- Pull the master branch of the parent repository into the byu-dml branch
git pull https://gitlab.com/datadrivendiscovery/primitives
Remaining Steps
- Update
Dockerfile
with the latest tag from D3M. Pull the image and start the container:docker-compose up -d --build
. Note that the image will change, but the tag will not, as primitive authors submit their primitives. When this happens, one solution is to delete the image withdocker rmi <image id>
and pull it again. - Update the primitives, if necessary. At the least you'll likely need to update the dependencies in this repo to honor the dependencies and their version ranges found in the D3M core package. Be sure to update the version numbers in
byudml/__init__.py
. - Next, to run the tests, generate the primitive json files, generate the pipelines, and run the pipelines:
- Execute
docker exec -it test-d3m-primitives bash
to enter the docker container. ./run_tests.sh
This command will verify that nothing is broken, generate new pipeline and primitive jsons with updated digests and versions, run the pipelines, and place them in the correct folder in the submodule of theprimitives
repo. NOTE: Verify that the glob pattern insubmission.utils.get_new_d3m_path
will correctly capture the D3M version in theprimitives
submodule.exit
- Execute
- Commit the updated primitive jsons and pipelines in the submodule i.e. our fork of the D3M primitives repo. Note: Do not commit straight to the master branch, but to a branch that semantically represents the new D3M package version and our organization.
- Update this repo by committing the changes to the submodule
git add submission/primitives/
andgit add
,git commit
, andgit push
. - Release this package.
- Push the primitives submodule
git push origin byu-dml
(push to https://gitlab.com/byu-dml/primitives) and verify that the CI passes. If this fails, start over at step 4. NOTE: this package must be released before it can be tested with CI. - Create a merge request from the byu-dml branch of https://gitlab.com/byu-dml/primitives to the master branch of https://gitlab.com/datadrivendiscovery/primitives.