/NEXT

NEXT is a machine learning system that runs in the cloud and makes it easy to develop, evaluate, and apply active learning in the real-world. Ask better questions. Get better results. Faster. Automated.

Primary LanguagePythonApache License 2.0Apache-2.0

Have a question? Ask us on Gitter! gitter We encourage asking the dev team questions

Documentation: https://github.com/nextml/NEXT/wiki

Website: http://nextml.org

NEXT is a system that makes it easy to develop, evaluate, and apply active learning.

Talks give a good brief introduction to NEXT at the highest level. For scientists and develoeprs, we most recommend the PyData Ann Arbor talk. It's an enhanced and refined version of the SciPy talk.

Venue Audience Length Link
PyData Ann Arbor Scientists and developers 1 hour https://www.youtube.com/watch?v=rTyu4QTXZTc
SciPy 2017 Scientific Python developers 30 minutes https://www.youtube.com/watch?v=blPjDYCvppY
Simons Institute conference on Interactive Learning Machine learning researchers 30 minutes https://youtu.be/ESXgbZQ1ZTk?t=1732

We give more detail on the items on launching experiments and getting setup in the SciPy 2017 proceedings: http://conference.scipy.org/proceedings/scipy2017/pdfs/scott_sievert.pdf.

This readme contains a quick start to launch the NEXT system on EC2, and to replicate and launch the experiments from the NEXT paper. There are more detailed launch instructions here.

For more information, in-depth tutorials, and API docs, we recommend visiting our GitHub wiki here. You can contact us at contact@nextml.org

We have an experimental AMI that can be used to run NEXT in a purely application based rather than development environment. Included in the AMI is a basic version of our frontend. The AMI is still highly experimental and we give no guarantees on it being up to date with the current code. For more info please visit here.

Testing

Run py.test from NEXT/next. Tests will be run from your local machine but will ping an EC2 server to simulate a client.

Individual files can also be run with py.test. Running py.test test_api.py will only run test_api.py and allow relative imports (which allows from next.utils import timeit).

stdout can be captured with the -s flag for py.test.

pytest is installable with pip install pytest and has a strict backwards compatibility policy.

Getting the code

You can download the latest version of NEXT from github with the following clone command:

$ git clone https://github.com/nextml/NEXT.git

We are actively working to develop and improve NEXT, but users should be aware of the following caveats:

  • NEXT currently supports only UNIX based OS (e.g. Windows compatibility is not yet available).
  • An Amazon Web Services account is needed to launch NEXT on EC2; we have worked hard to make this process as simple as possible, at cost of ease of running the full NEXT stack on a local machine. We plan to make NEXT usable on a personal computer in the future.

Launching NEXT on EC2

First, you must set your Amazon Web Services (AWS) account credentials as enviornment variables. If you don't already have AWS account, you can follow our AWS account quickstart here or the official AWS account set-up guide here for an in-depth introduction. Make sure to have access to

  • AWS access key id
  • AWS secret access key
  • Key Pair (pem file)

Make sure to note down the region that your key pair was made in. By default, the script assumes the region is Oregon (us-west-2). If you choose to use a different region, every time you use the next_ec2.py script, make sure to specify the region --region=<region> (i.e., --region=us-west-2). For example, after selecting the regions "Oregon," the region us-west-2 is specified on the EC2 dashboard. If another region is used, an --ami option has to be included. For ease, we recommend using the Oregon region.

Export your AWS credentials as environment variables using:

$ export AWS_SECRET_ACCESS_KEY=[your_secret_aws_access_key_here]
$ export AWS_ACCESS_KEY_ID=[your_aws_access_key_id_here]

Note that you'll need to use your AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID again later, so save them in a secure place for convenient reference later.

Install the local python packages needed for NEXT:

$ cd NEXT
$ sudo pip install -r local_requirements.txt

Throughout the rest of this tutorial, we will be using the next_ec2.py startup script heavily. For more options and instructions, run python next_ec2.py without any arguments. Additionally, python next_ec2.py -h will provide helper options.

For persistent data storage, we first need to create a bucket in AWS S3 using:

$ cd ec2
$ python next_ec2.py --key-pair=[keypair] --identity-file=[key-file] createbucket [cluster-name]

where:

  • [keypair] is the name of your EC2 key pair
  • [key-file] is the private key file for your key pair
  • [cluster-name] is the custom name you create and assign to your cluster

This will print out another environment variable command export AWS_BUCKET_NAME=[bucket_uid]. Copy and paste this command into your terminal.

You will also need to use your bucket_uid later, so save it in a file along side your AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID for later reference.

Now you are ready to fire up the NEXT system using our launch command. This command will create a new EC2 instance, pull the NEXT repository to that instance, install all of the relevant Docker images, and finally run all Docker containers.

WARNING: Users should note that this script launches a single m3.large machine, the current default NEXT EC2 instance type. This instance type costs $0.14 per hour to run. For more detailed EC2 pricing information, refer to this AWS page. You can change specify the instance type you want to with the --instance-type option.

$ python next_ec2.py --key-pair=[keypair] --identity-file=[key-file] launch [cluster-name]

Once your terminal shows a stream of many multi-colored docker appliances, you are successfully running the NEXT system!

Replicating NEXT adaptive learning experiments

Because NEXT aims to make it easy to reproduce empirical active learning results, we provide a simple command to initialize the experiments performed in this study.

First, in a new terminal, export your AWS credentials and use get-master to obtain your public EC2 DNS.

$ export AWS_BUCKET_NAME=[your_aws_bucket_name_here]
$ cd NEXT/ec2
$ python next_ec2.py --key-pair=[keypair] --identity-file=[key-file] get-master [cluster-name]

Then export this public EC2 DNS.

$ export NEXT_BACKEND_GLOBAL_HOST=[your_public_ec2_DNS_here]
$ export NEXT_BACKEND_GLOBAL_PORT=8000

Now you can execute run_examples.py to initialize and launch the NEXT experiments.

$ cd ../examples
$ python run_examples.py

Once initialized, this script will return a link that you can distribute yourself or post as a HIT on Mechanical Turk. Visit:

http://your_public_ec2_DNS_here:8000/query/query_page/query_page/[exp_uid]/[exp_key]

where [exp_uid] and [exp_key] are unique identifiers for each of the respective Dueling Bandits Pure Exploration, Active Non-Metric Multidimensional Scaling (MDS), and Tuple Bandits Pure Exploration experiments respectively. See this wiki page for a little more information.

Navigate to the strange_fruit_triplet query link (the last one that printed out to your terminal) and answer some questions! Doing so will provide the system with data you can view and interact with in the next step.

Accessing NEXT experiment results, dashboards, and data visualizations

You can access interactive experiment dashboards and data visualizations at by clicking experiments at:

  • http://your_public_ec2_DNS:8000/dashboard/experiment_list

And obtain all logs for an experiment through our RESTful API, visit:

  • http://your_public_ec2_DNS:8000/api/experiment/[exp_uid]/[exp_key]/logs

Where, again, [exp_uid] corresponds to the unique Experiment ID shown on the experiment dashboard pages.

If you'd like to backup your database to access your data later, refer to this wiki for detailed steps.

Finally, you can terminate your EC2 instance and shutdown NEXT using:

$ cd ../ec2
$ python next_ec2.py --key-pair=[keypair] --identity-file=[key-file] destroy [cluster-name]