tuna/tunasync

bug: shell file is not mapped into docker volume while using "command provider"

r00t1900 opened this issue · 4 comments

Env

  • tunasync: 0.8.0
  • tunasynctl: 0.8.0
  • tunasync-scripts: master@7817785
  • system: debian 10
  • arch: amd64
  • docker: 20.10.9
  • tunathu/bandersnatch: latest

Description

tunasync can not run custom shell file with proper path:

tunasync worker -c worker.conf -v --debug:

[22-01-01 11:06:18][DEBUG][runner.go:53] volume: /tmp/tunasync/pypi:/tmp/tunasync/pypi                                        
[22-01-01 11:06:18][DEBUG][runner.go:127] Command start: [docker run --rm -a STDOUT -a STDERR --name tunasync-job-pypi -w /tmp
/tunasync/pypi -u 0:0 -v /tmp/tunasync/log/tunasync/pypi:/tmp/tunasync/log/tunasync/pypi -v /tmp/tunasync/log/tunasync/pypi/py
pi_2022-01-01_11_06.log:/tmp/tunasync/log/tunasync/pypi/pypi_2022-01-01_11_06.log -v /tmp/tunasync/pypi:/tmp/tunasync/pypi -e 
TUNASYNC_MIRROR_NAME=pypi -e TUNASYNC_WORKING_DIR=/tmp/tunasync/pypi -e TUNASYNC_UPSTREAM_URL=https://pypi.python.org/ -e TUNA
SYNC_LOG_DIR=/tmp/tunasync/log/tunasync/pypi -e TUNASYNC_LOG_FILE=/tmp/tunasync/log/tunasync/pypi/pypi_2022-01-01_11_06.log tunathu/bandersnatch:latest /home/scripts/pypi.sh]
[22-01-01 11:06:18][DEBUG][cmd_provider.go:145] set isRunning to true: pypi                                                   
[22-01-01 11:06:18][DEBUG][base_provider.go:168] calling Wait: pypi                                                          
[22-01-01 11:06:18][DEBUG][job.go:169] provider started                                                                 
[22-01-01 11:06:18][DEBUG][worker.go:469] reporting on manager url: http://localhost:12345/workers/test_worker/schedules      
[22-01-01 11:06:18][DEBUG][worker.go:448] reporting on manager url: http://localhost:12345/workers/test_worker/jobs/pypi      
[22-01-01 11:06:18][DEBUG][worker.go:469] reporting on manager url: http://localhost:12345/workers/test_worker/schedules
[22-01-01 11:06:18][DEBUG][base_provider.go:165] set isRunning to false: pypi
[22-01-01 11:06:18][DEBUG][job.go:180] syncing done
[22-01-01 11:06:18][WARNIN][job.go:213] failed syncing pypi: exit status 127
[22-01-01 11:06:18][DEBUG][job.go:215] post-fail hooks

/tmp/tunasync/log/tunasync/pypi/pypi_2022-01-01_11_06.log:

root@tuna-docker-supported:~# cat /tmp/tunasync/log/tunasync/pypi/pypi_2022-01-01_11_20.log.fail                        
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec
: "/home/scripts/pypi.sh": stat /home/scripts/pypi.sh: no such file or directory: unknown.                          
time="2022-01-01T11:20:33+08:00" level=error msg="error waiting for container: context canceled"

Analysis

According to these debug information, I noticed that the docker commands did not map pypi.sh into docker filesystem, which might be the reason of no such file or directory.

Solution

I try to append -v /home/scripts/pypi.sh:/home/scripts/pypi.sh to the docker commands and then manually execute it, and it shows that it works well:

docker run --rm -a STDOUT -a STDERR --name tunasync-job-pypi -w /tmp/tunasync/pypi -u 0:0 \
# add this below volume mapping args
-v /home/scripts/pypi.sh:/home/scripts/pypi.sh \
-v /tmp/tunasync/log/tunasync/pypi:/tmp/tunasync/log/tunasync/pypi \
-v /tmp/tunasync/log/tunasync/pypi/pypi_2022-01-01_11_06.log:/tmp/tunasync/log/tunasync/pypi/pypi_2022-01-01_11_06.log \
-v /tmp/tunasync/pypi:/tmp/tunasync/pypi \
-e TUNASYNC_MIRROR_NAME=pypi \
-e TUNASYNC_WORKING_DIR=/tmp/tunasync/pypi \
-e TUNASYNC_UPSTREAM_URL=https://pypi.python.org/ \
-e TUNASYNC_LOG_DIR=/tmp/tunasync/log/tunasync/pypi \
-e TUNASYNC_LOG_FILE=/tmp/tunasync/log/tunasync/pypi/pypi_2022-01-01_11_06.log \
tunathu/bandersnatch:latest /home/scripts/pypi.sh

command output:

Syncing to /tmp/tunasync/pypi
2022-01-01 04:06:26,421 INFO: Selected storage backend: filesystem (configuration.py:128)
2022-01-01 04:06:26,421 INFO: Selected compare method: stat (configuration.py:174)
2022-01-01 04:06:26,740 INFO: Initialized project plugin allowlist_project, filtering ['tf-nightly-cpu'] (allowlist_name.py:31
)
2022-01-01 04:06:26,744 INFO: Initialized project plugin blocklist_project, filtering [] (blocklist_name.py:27)
2022-01-01 04:06:26,800 INFO: Status file /tmp/tunasync/pypi/status missing. Starting over. (mirror.py:601)
2022-01-01 04:06:26,800 INFO: Syncing with https://pypi.python.org/. (mirror.py:56)
2022-01-01 04:06:26,800 INFO: Current mirror serial: 0 (mirror.py:267)
2022-01-01 04:06:26,800 INFO: Syncing all packages. (mirror.py:282)
2022-01-01 04:06:43,845 INFO: Package 'tf-nightly-cpu' is allowlisted (allowlist_name.py:88)                                 
2022-01-01 04:06:43,955 INFO: Trying to reach serial: 12451048 (mirror.py:299)                                               
2022-01-01 04:06:43,955 INFO: 1 packages to sync. (mirror.py:301)                                                            
2022-01-01 04:06:43,978 INFO: No metadata filters are enabled. Skipping metadata filtering (mirror.py:75)                    
2022-01-01 04:06:43,978 INFO: No release filters are enabled. Skipping release filtering (mirror.py:77)                      
2022-01-01 04:06:43,978 INFO: No release file filters are enabled. Skipping release file filtering (mirror.py:79)            
2022-01-01 04:06:43,981 INFO: Fetching metadata for package: tf-nightly-cpu (serial 12447857) (package.py:57)                
2022-01-01 04:06:44,648 INFO: Downloading: https://files.pythonhosted.org/packages/46/2a/07af15a0d8ca3f75a53621dab60f92f72d704
6c511dbeeee303cb947b187/tf_nightly_cpu-2.7.0.dev20210701-cp36-cp36m-macosx_10_14_x86_64.whl (mirror.py:933)

Further

  • Why we need to manually add this mapping? And how the current mirror web is running? Whether this is a bug or not?
  • Can we just change upstream from https://pypi.org to https://pypi.tuna.tsinghua.edu.cn? We try to boost our mirroring speed rate but receive these error:
  File "/usr/local/lib/python3.9/site-packages/bandersnatch/master.py", line 216, in rpc                                                                              
    return await method()  File "/usr/local/lib/python3.9/site-packages/aiohttp_xmlrpc/client.py", line 121, in __remote_call                                                                  
    return self._parse_response((await response.read()), method_name)  File "/usr/local/lib/python3.9/site-packages/aiohttp_xmlrpc/client.py", line 82, in _parse_response                                                                 
    response = etree.fromstring(body, parser)
  File "src/lxml/etree.pyx", line 3252, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1912, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1800, in lxml.etree._parseDoc  File "src/lxml/parser.pxi", line 1141, in lxml.etree._BaseParser._parseDoc
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc                                                                              File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "<string>", line 7lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: meta line 5 and head, line 7, column 8

worker.conf:

[global]
name = "test_worker"
log_dir = "/tmp/tunasync/log/tunasync/{{.Name}}"
mirror_dir = "/tmp/tunasync"
concurrent = 10
interval = 1

[docker]
enable = true

[manager]
api_base = "http://localhost:12345"
token = ""
ca_cert = ""

[cgroup]
enable = false
base_path = "/sys/fs/cgroup"
group = "tunasync"

[server]
hostname = "localhost"
listen_addr = "127.0.0.1"
listen_port = 6000
ssl_cert = ""
ssl_key = ""

[[mirrors]]
name = "pypi"
provider = "command"
upstream = "https://pypi.tuna.tsinghua.edu.cn/"
command = "/home/scripts/pypi.sh"
docker_image = "tunathu/bandersnatch:latest"
interval = 5

manger.conf:

debug = false

[server]
addr = "127.0.0.1"
port = 12345
ssl_cert = ""
ssl_key = ""

[files]
db_type = "bolt"
db_file = "/tmp/tunasync/manager.db"
ca_cert = ""

/home/scripts/pypi.sh:

#!/bin/bash
set -e
BANDERSNATCH=${BANDERSNATCH:-"/usr/local/bin/bandersnatch"}
TUNASYNC_UPSTREAM=${TUNASYNC_UPSTREAM_URL:-"https://pypi.tuna.tsinghua.edu.cn/"}
CONF="/tmp/bandersnatch.conf"
INIT=${INIT:-"0"}

if [ ! -d "$TUNASYNC_WORKING_DIR" ]; then
        mkdir -p $TUNASYNC_WORKING_DIR
        INIT="1"
fi

echo "Syncing to $TUNASYNC_WORKING_DIR"

if [[ $INIT == "0" ]]; then
(
        cat << EOF
[mirror]
directory = ${TUNASYNC_WORKING_DIR}
master = ${TUNASYNC_UPSTREAM}
json = true
timeout = 300
workers = 5
hash-index = false
stop-on-error = false
delete-packages = true
compare-method = stat

[plugins]
enabled =
    blocklist_project
    allowlist_project

[allowlist]
packages =
    tf-nightly-cpu
EOF
        for i in $PYPI_EXCLUDE; do
                echo "    $i"
        done
) > $CONF
        exec $BANDERSNATCH -c $CONF mirror 
else
        cat > $CONF << EOF
[mirror]
directory = ${TUNASYNC_WORKING_DIR}
master = ${TUNASYNC_UPSTREAM}
json = true
timeout = 15
workers = 10
hash-index = false
stop-on-error = false
delete-packages = false
EOF

        exec $BANDERSNATCH -c $CONF mirror
fi

Thanks for viewing.

Your analysis is correct. It is not a bug but a feature, because tunasync does not know how to setup the mapping. Actually, the script configured in command field is executed in the docker image. It can be directly built into the image or mapped from other location. The mapping can be declared in the [docker] section so that no repeated separated config is needed. For example:

[docker]
volumes = [
        "/path/to/tunasync-scripts:/home/scripts:ro",
]

[[mirrors]]
name = "foo"
provider = "command"
upstream = "xxxxx"
command = "/home/scripts/foo.sh"
docker_image = "foo_image:latest"
docker_volumes = [
  "/path/to/additional_volume1:/path/to/mountpoint:ro",
  "/path/to/additional_volume2:/path/to/mountpoint2:ro"
]

Bandersnatch relies on xml-rpc interface provided by official pypi.org, and as a result cannot sync pypi repository from an alternative source. However, in its latest release, a new config is added entitled download-mirror, to fetch package metadata from the rpc interface on pypi.org and actual packages from an alternative source.

Bandersnatch relies on xml-rpc interface provided by official pypi.org, and as a result cannot sync pypi repository from an alternative source. However, in its latest release, a new config is added entitled download-mirror, to fetch package metadata from the rpc interface on pypi.org and actual packages from an alternative source.

Thank you for replying. This really help a lot, bravo!

Your analysis is correct. It is not a bug but a feature, because tunasync does not know how to setup the mapping. Actually, the script configured in command field is executed in the docker image. It can be directly built into the image or mapped from other location. The mapping can be declared in the [docker] section so that no repeated separated config is needed. For example:

[docker]
volumes = [
        "/path/to/tunasync-scripts:/home/scripts:ro",
]

[[mirrors]]
name = "foo"
provider = "command"
upstream = "xxxxx"
command = "/home/scripts/foo.sh"
docker_image = "foo_image:latest"
docker_volumes = [
  "/path/to/additional_volume1:/path/to/mountpoint:ro",
  "/path/to/additional_volume2:/path/to/mountpoint2:ro"
]

Thank you, your mind and step are both right, problem solved :)