higlass/higlass-manage

Avoid creating a dupicate copy of data in media directory

annashcherbina opened this issue · 10 comments

Thank you for this amazing tool!

I have searched the docs, and perhaps I am missing this, but it seems that a dataset must be copied to the media directory for it to be ingested into higlass with the --no-upload flag. Alternatively, if the flag is not provided, the data will be uploaded to the internal higlass databae and copied to the media folder? I tried to symlink my datasets to the media folder, but it looks like the links are not preserved within the higlass instance. I also tried creating sshfs mounts to datasets on a different server in the media folder, but that didn't work either. Is there a way to avoid copyng the data to visualize it in higlass? I have about 1000 bigwig tracks to load, so would like to avoid creating a redundant copy of the files if possible. Thank you!

Hey! Symlinks unfortunately don't work because Docker can't follow them back to their origin.

One option is to set the media directory to the directory that has your bigiwig files using the --media-dir option. You can then upload using using the --no-upload function.

Thank you for the quick response. I did try the latter option with --no-upload, but the challenge is that the tracks are on a different server than the one that higlass is running on. I tried to create an sshfs mount to the server with the tracks and set that mount to the --media-dir , but looks like the sshfs mount does not get preserved within the docker container either.

It looks like passing the --privileged flag can allow for propagation of sshfs mounts to docker, but when running higlass-manage start to create the server, the docker container gets created implicitly it seems, so there is no way to pass this flag? Is there a way to run higlass-manage start such that the sshfs mountpoint gets parsed as the media dir?

Is there a way to run higlass-manage start such that the sshfs mountpoint gets parsed as the media dir?

So that's what --media-dir should be doing but if it doesn't work then that's a problem. I know of people that have used it with other network mounts. If the data is accessible via http, there may be another solution but I'll have a double check to make sure it works.

With tag v0.6.34:

  1. running the server with higlass-manage:
sshfs annashch@remoteserver:/path/to/my/data /srv/ssd/higlass/media2 
higlass-manage start -t /srv/ssd/higlass/tmp \
               -d /srv/ssd/higlass/data \
               --site-url 171.67.96.244 \
               -p 8989 \
               -n atlas \
               -m /srv/ssd/higlass/media2 \
               --public-data \
               --workers 40 \
               --use-redis \
               --version v0.6.34 \
               --redis-dir /srv/ssd/higlass/redis

Output:

Pulling redis:5.0.3-alpine
done
Pulling latest image... 
done
Data directory: /srv/ssd/higlass/data
Temp directory: ()
Starting... atlas 8989
Traceback (most recent call last):
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/docker/api/client.py", line 261, in _raise_for_status
    response.raise_for_status()
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.35/containers/09dacf127143768b22c646dcd6cccc2d1832c56a07d66e2739ec649a22b7b726/start

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/users/annashch/miniconda3/envs/highglass/bin/higlass-manage", line 8, in <module>
    sys.exit(cli())
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/higlass_manage/start.py", line 105, in start
    redis_port)
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/higlass_manage/start.py", line 278, in _start
    detach=True)
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/docker/models/containers.py", line 809, in run
    container.start()
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/docker/models/containers.py", line 400, in start
    return self.client.api.start(self.id, **kwargs)
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/docker/utils/decorators.py", line 19, in wrapped
    return f(self, resource_id, *args, **kwargs)
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/docker/api/container.py", line 1095, in start
    self._raise_for_status(res)
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/docker/api/client.py", line 263, in _raise_for_status
    raise create_api_error_from_http_exception(e)
  File "/users/annashch/miniconda3/envs/highglass/lib/python3.7/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
    raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 500 Server Error: Internal Server Error ("error while creating mount source path '/srv/ssd/higlass/media2': mkdir /srv/ssd/higlass/media2: file exists")
(highglass) annashch@nandi:/users/annashch/higlass$ 

  1. running with docker :
docker run --detach \
       --publish 8989:80 \
       --volume /srv/ssd/higlass/data:/data \
       --volume /srv/ssd/higlass/media2:/media \
       --volume /srv/ssd/higlass/tmp:/tmp \
       --volume /srv/ssd/higlass/redis:/redis \
       --name atlas \
       -e SITE_URL=171.67.96.244 \
       --privileged \
       higlass/higlass-docker:v0.6.34

Gives:

docker: Error response from daemon: error while creating mount source path '/srv/ssd/higlass/media2': mkdir /srv/ssd/higlass/media2: file exists.

(Sorry, had a typo in my docker command, fixed it above). It seems that case 2 (with docker run) also gives the error.

Just to be sure, do you get that same error if you don't mount /srv/ssd/higlass/media as an sshfs filesystem?

If I don't mount /srv/ssd/higlass/media , there is no error:

Pulling redis:5.0.3-alpine
done
Pulling latest image...
done
Data directory: /srv/ssd/higlass/data
Temp directory: ()
Starting... atlas 8989
Docker started: higlass-manage-container-atlas
sending request 1
Waiting to start (tilesets)...
sending request 2
Waiting to start (tilesets)...
sending request 3
Waiting to start (tilesets)...
sending request 4
Waiting to start (tilesets)...
sending request 5
Waiting to start (tilesets)...
sending request 6
Non 200 status code returned (502), waiting...
sending request 7
Non 200 status code returned (502), waiting...
sending request 8
Non 200 status code returned (502), waiting...
sending request 9
Non 200 status code returned (502), waiting...
sending request 10
Non 200 status code returned (502), waiting...
sending request 11
Non 200 status code returned (502), waiting...
sending request 12
Non 200 status code returned (502), waiting...
sending request 13
Non 200 status code returned (502), waiting...
sending request 14
Non 200 status code returned (502), waiting...
sending request 15
Non 200 status code returned (502), waiting...
sending request 16
Non 200 status code returned (502), waiting...
sending request 17
Non 200 status code returned (502), waiting...
sending request 18
Non 200 status code returned (502), waiting...
sending request 19
Non 200 status code returned (502), waiting...
sending request 20
Non 200 status code returned (502), waiting...
sending request 21
Non 200 status code returned (502), waiting...
sending request 22
Non 200 status code returned (502), waiting...
sending request 23
Non 200 status code returned (502), waiting...
sending request 24
Non 200 status code returned (502), waiting...
sending request 25
Non 200 status code returned (502), waiting...
sending request 26
Non 200 status code returned (502), waiting...
sending request 27
Non 200 status code returned (502), waiting...
sending request 28
Non 200 status code returned (502), waiting...
sending request 29
Non 200 status code returned (502), waiting...
sending request 30
Non 200 status code returned (502), waiting...
sending request 31
public_data: True
Replaced js file
Started

I got around the issue by modifying the source code of start.py to pass "privileged=True" to client.containers.run

This allowed me to run the sshfs mount command from within the running docker container.

Oh, fantastic! Is there anything else to be done for this issue then?

I guess the workaround is ok, but definitely subideal to have to modify the source code and re-run the mount manually from within the running docker container. Is this not a common use case (i.e. the tracks to visualize are not located on the same server that's running higlass)? Would you consider a pull request to provide a "--privileged" flag to higlass-manage start so that there is no need to modify the start.py file to ingest remote tracks?

Is this not a common use case (i.e. the tracks to visualize are not located on the same server that's running higlass)?

It has certainly come up before. That's actually where the separate --media-dir option for higlass-server came from.

Would you consider a pull request to provide a "--privileged" flag to higlass-manage start so that there is no need to modify the start.py file to ingest remote tracks?

Absolutely! In fact, I'd be very grateful for it. Thanks for taking the time to investigate and make this work!!