Docker setup fails with permissions error
bhtucker opened this issue · 7 comments
Summary
After a fresh install, attempted to run arthur:
./bin/run_arthur.sh
++ pwd
+ docker run --rm --interactive --tty {volumes omitted} --env DATA_WAREHOUSE_CONFIG=/opt/data-warehouse/warehouse_config --env ARTHUR_DEFAULT_PREFIX=bhtucker arthur-redshift-etl:latest
+ cd /opt/src/arthur-redshift-etl
+ python3 setup.py --quiet develop
error: could not create 'python/redshift_etl.egg-info': Permission denied
Details
Prior steps were only git clone
and ./bin/build_arthur.sh
Propose label: Bug, maybe documentation bug?
Here's what I ran:
docker image rm arthur-redshift-etl
docker system prune
rm -rf arthur-redshift-etl
git clone git@github.com:harrystech/arthur-redshift-etl.git
cd arthur-redshift-etl
bin/build_arthur.sh
bin/run_arthur.sh
This sequence of commands puts me into a Docker image running Arthur.
Here's the state of next
:
6265949 (HEAD -> next, origin/next, origin/HEAD) Merge branch 'master' into next
9e1568c Merge pull request #236 from harrystech/flake8-fixes
d32b550 (tag: v1.28.0, origin/master) Merge remote-tracking branch 'origin/next'
(The additional commit on next
just changed some comments, not related to our Docker setup.)
Please be sure to be on the latest version as shown above. "permission denied" may have popped up during development on next
when the user arthur
inside the Docker container wasn't the owner of /opt/src
.
To debug, please re-run the docker
command and add: --entrypoint bash
right before the image tag.
In the shell, please take a look at:
ls -la /opt/src/arthur-redshift-etl/
cd /opt/src/arthur-redshift-etl/
touch hello
python setup.py develop
and let me know what error messages show up.
Will do, thanks @tvogels01 !
Some more info:
First, the 'debug probe' commands on the fresh image, no volumes linked:
(venv) (aws:, prefix:) $
(venv) (aws:, prefix:) $ ls -la /opt/src/arthur-redshift-etl/
total 116
drwxr-xr-x 1 arthur arthur 4096 Aug 28 16:46 .
drwxr-xr-x 1 arthur arthur 4096 Aug 28 16:45 ..
drwxrwxr-x 4 arthur arthur 4096 Aug 27 22:14 .arthurenv
-rw-rw-r-- 1 arthur arthur 430 Aug 27 22:06 .dockerignore
-rw-rw-r-- 1 arthur arthur 459 Aug 27 22:06 .editorconfig
-rw-rw-r-- 1 arthur arthur 2728 Aug 27 22:06 Dockerfile
-rw-rw-r-- 1 arthur arthur 5731 Aug 27 22:06 INSTALL.md
-rw-rw-r-- 1 arthur arthur 1070 Aug 27 22:06 LICENSE
-rw-rw-r-- 1 arthur arthur 19208 Aug 27 22:06 README.md
-rw-rw-r-- 1 arthur arthur 440 Aug 27 22:06 TODO.md
drwxrwxr-x 2 arthur arthur 4096 Aug 27 22:06 bin
drwxrwxr-x 2 arthur arthur 4096 Aug 27 22:06 cloudformation
drwxrwxr-x 2 arthur arthur 4096 Aug 27 22:06 etc
drwxrwxr-x 2 arthur arthur 4096 Aug 27 22:06 githooks
drwxrwxr-x 3 arthur arthur 4096 Aug 27 22:06 log_processing
drwxrwxr-x 1 arthur arthur 4096 Aug 27 22:15 python
-rw-rw-r-- 1 arthur arthur 2469 Aug 27 22:06 readme_release.md
-rw-rw-r-- 1 arthur arthur 149 Aug 27 22:06 requirements-dev.txt
-rw-rw-r-- 1 arthur arthur 131 Aug 27 22:06 requirements-linters.txt
-rw-rw-r-- 1 arthur arthur 218 Aug 27 22:06 requirements.txt
drwxrwxr-x 2 arthur arthur 4096 Aug 27 22:52 schemas
-rw-rw-r-- 1 arthur arthur 1543 Aug 28 16:44 setup.cfg
-rw-rw-r-- 1 arthur arthur 1565 Aug 27 22:06 setup.py
drwxrwxr-x 2 arthur arthur 4096 Aug 27 22:06 sql
(venv) (aws:, prefix:) $ cd /opt/src/arthur-redshift-etl/
(venv) (aws:, prefix:) $ touch hello
(venv) (aws:, prefix:) $ python setup.py develop
running develop
running egg_info
writing entry points to python/redshift_etl.egg-info/entry_points.txt
writing dependency_links to python/redshift_etl.egg-info/dependency_links.txt
writing top-level names to python/redshift_etl.egg-info/top_level.txt
writing python/redshift_etl.egg-info/PKG-INFO
reading manifest file 'python/redshift_etl.egg-info/SOURCES.txt'
writing manifest file 'python/redshift_etl.egg-info/SOURCES.txt'
running build_ext
Creating /opt/local/redshift_etl/venv/lib/python3.5/site-packages/redshift-etl.egg-link (link to python)
Removing redshift-etl 1.28.0 from easy-install.pth file
Adding redshift-etl 1.28.0 to easy-install.pth file
Installing run_tests.py script to /opt/local/redshift_etl/venv/bin
Installing arthur.py script to /opt/local/redshift_etl/venv/bin
Installing compare_events.py script to /opt/local/redshift_etl/venv/bin
Installing install_extraction_pipeline.sh script to /opt/local/redshift_etl/venv/bin
Installing install_pizza_load_pipeline.sh script to /opt/local/redshift_etl/venv/bin
Installing install_rebuild_pipeline.sh script to /opt/local/redshift_etl/venv/bin
Installing install_refresh_pipeline.sh script to /opt/local/redshift_etl/venv/bin
Installing install_upgrade_pipeline.sh script to /opt/local/redshift_etl/venv/bin
Installing install_validation_pipeline.sh script to /opt/local/redshift_etl/venv/bin
Installing launch_ec2_instance.sh script to /opt/local/redshift_etl/venv/bin
Installing launch_emr_cluster.sh script to /opt/local/redshift_etl/venv/bin
Installing re_run_partial_pipeline.py script to /opt/local/redshift_etl/venv/bin
Installing sns_subscribe.sh script to /opt/local/redshift_etl/venv/bin
Installing submit_arthur.sh script to /opt/local/redshift_etl/venv/bin
Installing terminate_emr_cluster.sh script to /opt/local/redshift_etl/venv/bin
Installed /opt/src/arthur-redshift-etl/python
Processing dependencies for redshift-etl==1.28.0
Finished processing dependencies for redshift-etl==1.28.0
(venv) (aws:, prefix:) $ exit
Works as expected.
Then, the run_arthur.sh
test:
./bin/run_arthur.sh
You must set DATA_WAREHOUSE_CONFIG when not specifying the config directory.
Ok, fair enough, I'll set one (to an existing directory on my machine):
export DATA_WAREHOUSE_CONFIG=/home/bhtucker/third_party/warehouse_config/
$ ./bin/run_arthur.sh
++ pwd
+ docker run --rm --interactive --tty --volume /home/bhtucker/third_party:/opt/data-warehouse --volume /home/bhtucker/third_party/arthur-redshift-etl:/opt/src/arthur-redshift-etl --volume /home/bhtucker/.aws:/home/arthur/.aws --volume /home/bhtucker/.ssh:/home/arthur/.ssh:ro --env DATA_WAREHOUSE_CONFIG=/opt/data-warehouse/warehouse_config --env ARTHUR_DEFAULT_PREFIX=bhtucker arthur-redshift-etl:latest
+ cd /opt/src/arthur-redshift-etl
+ python3 setup.py --quiet develop
error: [Errno 13] Permission denied
With 'probes':
docker run --rm --interactive --tty --volume /home/bhtucker/third_party:/opt/data-warehouse --volume /home/bhtucker/third_party/arthur-redshift-etl:/opt/src/arthur-redshift-etl --volume /home/bhtucker/.aws:/home/arthur/.aws --volume /home/bhtucker/.ssh:/home/arthur/.ssh:ro --env DATA_WAREHOUSE_CONFIG=/opt/data-warehouse/warehouse_config --env ARTHUR_DEFAULT_PREFIX=bhtucker --entrypoint bash arthur-redshift-etl:latest
(venv) (aws:, prefix:bhtucker) $ ls -la /opt/src/arthur-redshift-etl/
total 516
drwxrwxr-x 15 1002 1005 4096 Aug 28 16:44 .
drwxr-xr-x 1 arthur arthur 4096 Aug 28 16:45 ..
drwxrwxr-x 4 1002 1005 4096 Aug 27 22:14 .arthurenv
-rw-rw-r-- 1 1002 1005 430 Aug 27 22:06 .dockerignore
-rw-rw-r-- 1 1002 1005 459 Aug 27 22:06 .editorconfig
drwxrwxr-x 8 1002 1005 4096 Aug 28 16:44 .git
drwxrwxr-x 3 1002 1005 4096 Aug 27 22:06 .github
-rw-rw-r-- 1 1002 1005 406 Aug 27 22:06 .gitignore
-rw-rw-r-- 1 1002 1005 2728 Aug 27 22:06 Dockerfile
-rw-rw-r-- 1 1002 1005 5731 Aug 27 22:06 INSTALL.md
-rw-rw-r-- 1 1002 1005 1070 Aug 27 22:06 LICENSE
-rw-rw-r-- 1 1002 1005 19208 Aug 27 22:06 README.md
-rw-rw-r-- 1 1002 1005 440 Aug 27 22:06 TODO.md
-rw-rw-r-- 1 1002 1005 381244 Aug 27 22:51 arthur.log
drwxrwxr-x 2 1002 1005 4096 Aug 27 22:06 bin
drwxrwxr-x 2 1002 1005 4096 Aug 27 22:06 cloudformation
drwxrwxr-x 2 1002 1005 4096 Aug 27 23:07 dist
drwxrwxr-x 2 1002 1005 4096 Aug 27 22:06 etc
drwxrwxr-x 2 1002 1005 4096 Aug 27 22:06 githooks
drwxrwxr-x 3 1002 1005 4096 Aug 27 22:06 log_processing
drwxrwxr-x 5 1002 1005 4096 Aug 27 22:15 python
-rw-rw-r-- 1 1002 1005 2469 Aug 27 22:06 readme_release.md
-rw-rw-r-- 1 1002 1005 149 Aug 27 22:06 requirements-dev.txt
-rw-rw-r-- 1 1002 1005 131 Aug 27 22:06 requirements-linters.txt
-rw-rw-r-- 1 1002 1005 218 Aug 27 22:06 requirements.txt
drwxrwxr-x 2 1002 1005 4096 Aug 27 22:52 schemas
-rw-rw-r-- 1 1002 1005 1543 Aug 28 16:44 setup.cfg
-rw-rw-r-- 1 1002 1005 1565 Aug 27 22:06 setup.py
drwxrwxr-x 2 1002 1005 4096 Aug 27 22:06 sql
drwxrwxr-x 2 1002 1005 4096 Aug 27 22:06 wiki
(venv) (aws:, prefix:bhtucker) $ cd /opt/src/arthur-redshift-etl/
(venv) (aws:, prefix:bhtucker) $ touch hello
touch: cannot touch 'hello': Permission denied
(venv) (aws:, prefix:bhtucker) $ python setup.py develop
running develop
running egg_info
error: [Errno 13] Permission denied
So I suppose arthur
the container user isn't allowed to talk back out to my src
dir. I confess I don't use volumes
for anything but read-only config files so don't know how this should work.
I'll have to dig in to Docker volumes to see what might cause this issue. For me, the owner of /opt/src
stays arthur
.
To unblock you, I'd suggest that you switch into your warehouse directory before starting arthur. The reason is that this will put the image into "standalone" mode which means it won't try to run python setup.py develop
.
Here's what happens on my laptop:
~/repos/harrystech/arthur-redshift-etl/bin/run_arthur.sh
Did not find source path (looked for setup.py) -- switching to standalone mode.
Changes to code in /opt/src/arthur-redshift-etl will not be preservd between runs.
However, changes to your schemas or config will be reflected in your local filesystem.
+ docker run --rm --interactive --tty ... --env DATA_WAREHOUSE_CONFIG=/opt/data-warehouse/config_data_development --env ARTHUR_DEFAULT_PREFIX=tom ... arthur-redshift-etl:latest
Also, I just realized that directories get mounted multiple times in your setup. Maybe that's part of the issue? Please create a new directory above the config so that it's parallel to this repo.
--volume /home/bhtucker/third_party:/opt/data-warehouse --volume /home/bhtucker/third_party/arthur-redshift-etl:/opt/src/arthur-redshift-etl
Notice how third-party
is in both places.
Goal is something like this:
--volume /home/bhtucker/third_party/warehouse_repo:/opt/data-warehouse --volume /home/bhtucker/third_party/arthur-redshift-etl:/opt/src/arthur-redshift-etl
Assuming that the configuration directory is now in /home/bhtucker/third_party/warehouse_repo/warehouse_config
Running from the 'warehouse' directory and not from arthur source directory seems wise.
I also forgot the sibling setup is 'repo' then 'config' adjacent to e.g. 'sources'. Added that layer.
In fact I did a pip install -e .
before running in Docker (muscle memory). So I cleared it out and tried again without that; the error simply changes slightly to error: could not create 'python/redshift_etl.egg-info': Permission denied
.
Perhaps I have some global docker settings or version info I don't know about. Does classic the virtualenv
setup still work? That's my preference anyway :)
Using a virtual env might work but I haven't tested in a while. I don't see why it wouldn't.
Here's what the permissions should look like:
$ ls -lad /opt/ /opt/data-warehouse/ /opt/local/ /opt/src/ /opt/src/arthur-redshift-etl/python/redshift_etl.egg-info/
drwxr-xr-x 1 root root 4096 Aug 28 11:40 /opt/
drwxr-xr-x 37 arthur arthur 1184 Aug 28 16:55 /opt/data-warehouse/
drwxr-xr-x 1 arthur arthur 4096 Aug 28 11:40 /opt/local/
drwxr-xr-x 1 arthur arthur 4096 Aug 28 11:40 /opt/src/
drwxr-xr-x 8 arthur arthur 256 Aug 28 17:32 /opt/src/arthur-redshift-etl/python/redshift_etl.egg-info/
Note how the user stays arthur
.
In the end, this is the directory structure that you're aiming for:
top/warehouse/config/
top/warehouse/schemas/
top/arthur-redshift-etl/bin/
top/arthur-redshift-etl/etc/
top/arthur-redshift-etl/python/
...
For now, what happens if you simply comment out the line python setup.py develop
in bin/entrypoint.sh
?
It's not needed unless you develop code inside the Docker container. The other lines (setting PATH
and activating the virtual env) are needed.