Cannot mount data to a specific directory nor add multiple datasets
jmsmkn opened this issue · 2 comments
Three problems here:
- You cannot specify the data mount directory with the CLI
- Data is mounted at
/input
and not/<ID>
- Which leads to not being able to mount more than 1 dataset
In the docs it shows us how to mount datasets to specific directories, but doing this with the CLI results in an error. If we take for example, the public kaggle cats and dogs dataset with id: SyccinddLDdS7p3vzcwGQ2
Command:
$ floyd run --data SyccinddLDdS7p3vzcwGQ2:dataset
Expected output:
The experiment launches with the data mounted at /dataset
Actual output:
$ floyd run --data SyccinddLDdS7p3vzcwGQ2:dataset
New version of CLI (0.9.7) is now available. To upgrade run:
pip install -U floyd-cli
Creating project run. Total upload size: 9.9MiB
Syncing code ...
Error: One or more request parameter is incorrect.
If I take away the dataset
key it works:
floyd run --mode jupyter --data SyccinddLDdS7p3vzcwGQ2
New version of CLI (0.9.7) is now available. To upgrade run:
pip install -U floyd-cli
Creating project run. Total upload size: 9.9MiB
Syncing code ...
RUN ID NAME VERSION
---------------------- ------------------ ---------
[Truncated]
However, the dataset is not mounted at /<ID>
as stated in the docs, but at /input
. On the instance, the data is there:
# ls /
bin code etc input lib64 mnt output root run_jupyter.sh srv tmp var
boot dev home lib media opt proc run sbin sys usr
# ls /input
test train
Now, if I add a second dataset, say the public MNIST dataset with ID Gbya2j64ApqjSHt3vDpdSh
, this gets mounted at /input
, and takes precedent over the first dataset (so I never actually have access to the SyccinddLDdS7p3vzcwGQ2
data):
$ floyd run --mode jupyter --data SyccinddLDdS7p3vzcwGQ2 --data Gbya2j64ApqjSHt3vDpdSh
Then on the instance:
# ls /
bin code etc input lib64 mnt output root run_jupyter.sh srv tmp var
boot dev home lib media opt proc run sbin sys usr
# ls /input
t10k-images-idx3-ubyte.gz t10k-labels-idx1-ubyte.gz train-images-idx3-ubyte.gz train-labels-idx1-ubyte.gz
Note that test
and train
from the kaggle dataset is missing.
System info: floyd-cli 0.9.9, Python 3.5, see #51
Hi, sorry that the documentation is a little bit confusing, we are working on updating it. There are two things:
- if you only specify one data input, it will always be mounted at "/input"
- if you specify multiple data inputs, you need to tag each of them, for example:
--data SyccinddLDdS7p3vzcwGQ2:foo --data Gbya2j64ApqjSHt3vDpdSh:bar
. Then you should be able to access them at "/foo" and "/bar" directory.
We will be releasing a new documentation this week to clarify this.
I've seen that single data sets can have mountpoints again too, thank you for fixing it so quickly!