builder does not detect finished image in subfolder
jafaruddinlie opened this issue · 50 comments
Links
- container collection: https://singularity-hub.org/collections/4402
- GitHub repository or recipe file: https://github.com/jafaruddinlie/shub/blob/master/vardict/Singularity.vardict_1
Version of Singularity
Local: 3.5.3
shub: singularity-builder-3.4.1-100GB
Behavior when Building Locally
Builds fine.
Error on Singularity Hub
The build looks OK but exited with this error:
�[34mINFO: �[0m Adding labels
�[33mWARNING:�[0m Label: APPLICATION_NAME already exists and force option is false, not overwriting
�[33mWARNING:�[0m Label: APPLICATION_VERSION already exists and force option is false, not overwriting
�[33mWARNING:�[0m Label: MAINTAINER_NAME already exists and force option is false, not overwriting
�[33mWARNING:�[0m Label: MAINTAINER_EMAIL already exists and force option is false, not overwriting
�[34mINFO: �[0m Adding environment to container
�[34mINFO: �[0m Adding runscript
�[34mINFO: �[0m Creating SIF file...
�[34mINFO: �[0m Build complete: /root/build/container.sif
ERROR Final image does not exist.
What do you think is going on?
Not really sure, version of Singularity not supported?
Could you please attach the entire log here as a text file?
I don’t see the end message about the finish time which concerns me - perhaps the builder was killed because of memory or filesystem limits. Could you please run again and:
- enable debug mode in your collection settings
- report the total build time on Singularity hub
- report the memory and build time on your local machine
And then again include the log here! It’s after midnight so I’m off to bed but I’ll take a look tomorrow. Err, today but later :)
Hope you had a good night sleep!
Here's the log as requested (debug mode turned on)
I didn't manage to note down the time it takes to build on shub, but on local machine it is around 30 minutes, size of the container 3.7GB, and memory used during the build around 1GB.
singularity-build-log-jafaruddinlie_shub-June 4, 2020, 2_35 a.m..txt
)
Thank you! The build time looks ok, similar to on your host:
Start Time: Thu Jun 4 07:07:44 UTC 2020.
End Time: Thu Jun 4 07:35:30 UTC 2020
I'll need to debug this interactively, if not tomorrow then over the weekend (it's already after dinner time the next day since you originally posted, so time to relax!)
hey @jafaruddinlie ! I've tested your build, and the container does complete successfully. There are two issues that I found (that we can debug separately) to figure out which is leading to the failure. The first is the container test. To test the container, we execute the "ls" command. However, your container doesn't seem to have an ls:
Singularity 64ae85d53aaa966cc99ad7793127893992a82bf8617936fb0923c3aaa6270919.sif:/> ls
bash: ls: command not found
So even if there is an issue before that, the test would fail at this step. Do you know why your image doesn't have ls? The next issue is a potential bug with the path (oy vey) for the builder, and I'm not sure why it hasn't happened before. Your resulting image gets placed in the same directory as the recipe file, but is looked for one above it. So - to test this would you mind trying to put the recipe one folder up (in the root of the repo). If that turns out to be the issue, and you figure out the test command with ls to get it running, then I'll need to create a new builder with the fix (and then ask you to test). In summary:
- figure out why your container can't run an ls
- try a recipe build from the root
Thanks!
- I found out that one of the export PATH had a typo in the %environment section, I've fixed this and re-uploaded the Singularity file but it still has the same error.
- Same updated recipe, built from the root, works.
Both logs are attached.
singularity-build-log-jafaruddinlie_shub-vardict_notroot_dir.txt
singularity-build-log-jafaruddinlie_shub-vardict_rootdir..txt
Great! So we know the exact bug now. I update builders when there is a major release of Singularity and it coincides with a server restart, so unfortunately that won't be any time soon. I'll rename this issue to be with respect to the image path not detected when from a subfolder, and in the meantime you'll have to do builds with a recipe in the root. Thanks for reporting this issue!
@jafaruddinlie if you could, might we be able to keep the recipe around for use when I develop the builder? I can ping you again when that time comes to keep you updated about the process.
Yep, not a problem!
I can confirm this problem as well. It seems to be an issue with the newer version builder only as I had a recipe that built successfully with the 2-5 builder. I moved to trying the 3-4-2 builder and could not get an error free build. When I found this issue I moved my recipe to the top directory of my GitHub repo and the build was successful.
Yep thanks for reporting! I’ll be able to update the builder to address this bug for the next round of server work. In the meantime, your approach to move the recipe to root is what I suggest.
hey @jafaruddinlie and @singular55 - I took a look at the server, and work to update for newer singularity 3.6.1, and that particular task is substantial enough that I'm going to wait for the larger server refactor closer to the winter. However, that doesn't mean that we can't fix this issue for the current latest version on shub, which is 3.4.2! So I have prepared a fixed image singularity-builder-v3-4-2
that should be able to handle the subfolder recipes that I plan to roll out for the next round of server work, which I've scheduled for two weeks from today, Friday August 21st. So - what I'll do then is make this image available as an option for your collection, and if it works for you, I'll ping you on here for you to test the images! If the test cases (building from root and from a subfolder) are good, we can remove the old builder and make this one default. Thanks in advance for your help! <3
hey @jafaruddinlie and @singular55, I have a development builder for your to test! If you go to your collection settings, there should be a new entry (the last one in the list) for a builder with version 3.4.2 (no size mentioned). Could you please give this a test with 1) a recipe in the base of the repository (to make sure nothing was broken) and 2) a recipe in a subfolder? Please take your time!
@singular55 I think you attached it to an email (which doesn't come through) could you show it here? I also deleted the various headers / other email bits that probably shouldn't be on here.
Also could you please make sure to have debug turned on in your collection settings? It might not add additional info, but just in case, it can be helpful.
Funny thing, when you say "show it here", that comes through in the email as well...
Here's a paste inline. Debug is already on.
Start Time: Fri Aug 28 15:21:03 UTC 2020.Cloning into '/tmp/tmpa5bl_laj'...warning: redirecting to https://github.com/singular55/container01.git/Switched to a new branch 'sing_353_shub'Branch 'sing_353_shub' set up to track remote branch 'sing_353_shub' from 'origin'.Return value of 137.Killed: Fri Aug 28 17:21:03 UTC 2020.
This branch builds with other builders, IIRC.
You're right! The log was so small I thought it was email signature leftovers :) Thanks for the report, I'll take a look when I can clear up some time.
Yeah, looks like it hung up right away. Path seems wrong in the GitHub link, now that I look. Thanks!
@singular55 it looks like the build has an extensive setup section, and that there wasn't output / change for 2 hours. Do you have a simple recipe you could test?
I'm working on a new one now that looks like it built successfully, I'll double check and see if that one works.
Normally the recipe I ran earlier is about a 20-25min build on shub, so 2 hours wouldn't be normal (I guess unless one of the wget's is non-responsive for some reason.)
@vsoch Yes, it works with smaller recipe and both root and subdir!
Awesome! 😎 Just curious, what is your reasoning to do so much of the build in %setup instead of %post? I think we might have seen the full output of the timeout or other issue if it was done there.
Good question, I'm not really a Singularity expert, but I thought from my initial attempts that package installs that copied files from other outside sources only worked in %setup for me. Should they work after the yum install instructions in %post? I recall having problems with that and the wget transfer, extract steps.
Hi @vsoch , can confirm both builds work fine with 3.4.2 that you set for us.
Awesome @jafaruddinlie! @singular55 I’ll take a look at your recipe tomorrow and test building locally, and also give a shot at adding those sections to post. Have a good evening (morning? afternoon?) everyone!
hey @singular55! I built your container like this:
Bootstrap:docker
From:centos:7
%labels
MAINTAINER singular55
%environment
LANG=C.UTF-8
# couldn't change LC_ALL on target
#LC_ALL=C.UTF-8
PATH=/bin_override:$PATH
LIBRARY_PATH=/lib_override:$LIBRARY_PATH
LD_LIBRARY_PATH=/lib_override:$LD_LIBRARY_PATH
#WORKDIR=/work
WRITEABLE=~/Container_Writeable
#export LC_ALL LANG PATH LIBRARY_PATH LD_LIBRARY_PATH WORKDIR
export LANG PATH LIBRARY_PATH LD_LIBRARY_PATH WRITEABLE
%files
eclipse.ini /eclipse.ini
eclipse-parallel.ini /eclipse-parallel.ini
%post
mkdir -p /lib_override
mkdir -p /bin_override
#mkdir -p /work
yum -y install epel-release
yum repolist
yum install -y git meld wget kdiff3 firefox
# mysql uses libnuma
yum install -y numactl-libs
# fix some X / DBus issues?
dbus-uuidgen > /var/lib/dbus/machine-id
## Eclipse for Scientific Computing
# https://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/2020-03/R/eclipse-parallel-2020-03-R-linux-gtk-x86_64.tar.gz
wget http://ftp.osuosl.org/pub/eclipse/technology/epp/downloads/release/2020-03/R/eclipse-parallel-2020-03-R-linux-gtk-x86_64.tar.gz -O eclipse-parallel.tar.gz
tar -xf eclipse-parallel.tar.gz -C /bin_override
rm eclipse-parallel.tar.gz
# TODO - ini file
cp /eclipse-parallel.ini /bin_override/eclipse/eclipse.ini
mv /bin_override/eclipse /bin_override/eclipse-parallel
# eclipse
wget http://ftp.osuosl.org/pub/eclipse/technology/epp/downloads/release/2019-09/R/eclipse-jee-2019-09-R-linux-gtk-x86_64.tar.gz -O eclipse.tar.gz
tar -xf eclipse.tar.gz -C /bin_override
rm eclipse.tar.gz
cp /eclipse.ini /bin_override/eclipse/
## agraph
# http://franz.com/ftp/pri/acl/ag/ag6.4.0/linuxamd64.64/agraph-6.4.0-linuxamd64.64.tar.gz
wget http://franz.com/ftp/pri/acl/ag/ag6.4.0/linuxamd64.64/agraph-6.4.0-linuxamd64.64.tar.gz -O agraph.tar.gz
tar -xf agraph.tar.gz -C /bin_override
rm /agraph.tar.gz
## tomcat
# https://archive.apache.org/dist/tomcat/tomcat-8/v8.5.47/bin/apache-tomcat-8.5.47.tar.gz
wget https://archive.apache.org/dist/tomcat/tomcat-8/v8.5.47/bin/apache-tomcat-8.5.47.tar.gz -O tomcat.tar.gz
tar -xf tomcat.tar.gz -C /bin_override
# Apache installed as u:root g:root with no group or other permissions. For us to run apache from the
# container we need other permissions, it looks like.
chmod -R o+rx /bin_override/apache-tomcat-8.5.47
rm tomcat.tar.gz
## mysql - full install
# https://dev.mysql.com/downloads/file/?id=495278 - login
# https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.20-linux-glibc2.12-x86_64.tar.xz
wget https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.20-linux-glibc2.12-x86_64.tar.xz -O mysql.tar.xz
tar -xf mysql.tar.xz -C /bin_override
rm mysql.tar.xz
## lite/minimal mysql
wget https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.20-linux-x86_64-minimal.tar.xz -O mysql.tar.xz
tar -xf mysql.tar.xz -C /bin_override
rm mysql.tar.xz
#eclipse jee
##wget http://ftp.osuosl.org/pub/eclipse/technology/epp/downloads/release/2019-09/R/eclipse-jee-2019-09-R-linux-gtk-x86_64.tar.gz -O /tmp/eclipse.tar.gz
#tar -xf /tmp/eclipse.tar.gz -C /opt
#tar -xf /tmp/eclipse.tar.gz -C ${SINGULARITY_ROOTFS}
# ~ is /root for singularity hub
##tar -xf /tmp/eclipse.tar.gz -C ~
##rm /tmp/eclipse.tar.gz
#moved from /opt/eclipse
##cp eclipse.ini ~/eclipse/
%files
#eclipse.ini /opt/eclipse/
eclipse.ini eclipse.ini
eclipse-parallel.ini eclipse-parallel.ini
%runscript
#exec /bin/echo "Hi there, container runscript!"
#exec /usr/bin/meld
mkdir -p ${WRITEABLE}
touch ${WRITEABLE}/HiThere
/bin/echo "Config files should go in ${WRITEABLE}."
%apprun meld
exec meld "$@"
%apprun firefox
exec firefox "$@"
%apprun eclipse
exec /bin_override/eclipse/eclipse "$@"
%apprun kdiff3
exec kdiff3 "$@"
%apprun eclipse-parallel
exec /bin_override/eclipse-parallel/eclipse "$@"
When I tested your recipe before change, it also exit with 127 because wget could not be found. I have it on my machine so likely it's not available to the build at that step.
Thanks to you both for testing this out! I'm going to close the issue - there is a lot of server work to do at the end of the year, but hopefully this should hold over until then.
Thanks for the tips! I recall having some trouble like that in the past. I wonder if the newer 3.x versions of Singularity have changed behavior somewhat.
I have seen changed behavior - it used to be that you could copy files from the host to /tmp, and have it work. I was testing old recipes from a few years back and the files were no longer found there, so I wonder if that could have been the issue. It also seems to be that the host's software is not accessible to the build, which makes sense because it could have escalated privileged and do something on the host.
Yes, that's what I saw. I think that was the issue. In 2.5, I needed wget, and had to install it prior to use, but then I needed to use it as well. I think in the 2.5 builder I could not get wget to run in the same section after the 'yum install'. Sounds like an improvement if that works now. Thanks for that. I have not made changes to the recipe since starting to use 3.x versions of builder/HPC installs (which was only recently.)
Hi @vsoch. I am having the same issue as described in #221, which pointed me to here. I have read through the conversation but am still unsure what the solution is. Could you help me please? The definition file can be found here, and the builder log is attached below. I am simply trying to build a container with a bunch of Python packages. Both singularity-builder-3-4-2-100gb
and singularity-builder-3-2-1-100gb-private
give the same error, and for some unknown reasons the build took more than two hours was terminated if I use singularity-builder-v3-4-2
.
singularity-build-log-HERA-Team_hera-rtp-singularity-Sept. 12, 2020, 10 10 a.m..txt
You’ll want to use the last builder (the one you report taking more than 2 hours). If indeed that’s the case, it likely is low on memory when converting to sif and you won’t be able to use Singularity hub for such a large image.
Let me clarify. The build takes about 20 minutes on my laptop and the output image is about 1.4GB. The same definition file take extremely long time to build with singularity-builder-v3-4-2
on Singularity Hub. By the time that the build was terminated after two hours, it has not finished installing Ubuntu packages, which I found really strange.
How much memory does your laptop have?
And could you please include the log for the build that times out?
Here is the build log for the build that times out.
singularity-build-log-HERA-Team_hera-rtp-singularity-Sept. 12, 2020, 3 40 p.m..txt
And my laptop has 8GB of memory.
That should be comparable then! Let's try a few things to debug, because it shouldn't just hang like that:
- Remove
apt-get clean
- in case this removes something needed in /tmp. - Remove any sections of the build recipe that are empty / you don't use
- Try installing from ubuntu:18.04 from
docker
instead oflibrary
- If those don't give insight, try removing the vim install to see if it times out on something else?
The only insight that we have is that it's hanging on something, so we need to figure out that.
I have tried all of the above, including switching from Ubuntu to CentOS, with no success. The builds always stop somewhere while installing OS packages. Here is the debug log of the CentOS recipe.
Is it possible that there is an issue with my account? Are there some good definition files that are guarantee to build that I can use for testing.
There aren't any differences between accounts - the only customization you can do is to specify the builder (which you already know about!)
The way I'd debug this is to start bare bones - literally just have your recipe like this:
Bootstrap: docker
From: centos:8
And then slowly add one command at a time. Your recipe is hugely complex, and what we need to do is figure out the exact line that is triggering the timeout. Once we know that, we'll have something to work with!
Hi. I did several more tests.
- A barebone Ubuntu recipe, i.e. just copying the OS image, build fine as expected.
- Copying the OS and updating the package list with
apt-get update
also build fine as it should be. - Installing packages with
apt-get install
give confusing errors- It build fine with when
git wget
(andvim
) were install. osonly-test log - It times out with return code 1 if only
git
orwget
was installed. I did not try installing other packages. git-test log
- It build fine with when
- Adding
wget ...
line afterapt-get install git wget vim
times out with return code 137, but I am unsure if the cause iswget
because the debug log seems to terminate duringapt-get install
. conda-test log
All definition files mentioned can be found in this repo and all tests were build with singularity-builder-v3-4-2
builder.
At this point I am very much dumbfounded by the inconsistency. Is there a way to track the build in more details in the debug log such as to make it print the executed command? I tried adding echo ...
lines but it doesn't show in the build log.
Did you verify it’s building the commit you think it’s building? You should see all print statements in the log (eg your echos).
I have noticed that errors and operations seem to be interleaved in the build logs with the latest builder. At least from my experience it looks like a failure will occur higher in the log than the end of the log file, and installer operations will still be shown after that point until the build ends. Look for your echo statements to be embedded in the log earlier than you expected and see if you can find them.
@vsoch Hi. Yes, I double checked and it was building the right commit. However, the echo statements do not appear in the log.
@piyanatk please point me to the exact commit (recipe) and the builder you are using, and I'll try to reproduce your error. I'm not sure how else to help.
@vsoch Please see for the recipe: https://github.com/HERA-Team/hera-rtp-singularity/blob/287f83e344e81462ba5bc1b5bf0d16593b788f8d/hera-rtp/Singularity.hera-rtp-ubuntu-conda2
And here is the error log that I got using singularity-builder-v3-4-2
builder.
singularity-build-log-HERA-Team_hera-rtp-singularity-Sept. 20, 2020, 5 05 a.m.(1).txt
hey @piyanatk - I've done the build with:
- the older 2.5.1 and that worked
- 3.2.1 also worked.
- 3.4.2 I reproduced the issue with the same recipe and 3.4.2 (newer builder), and indeed it hangs when unpacking wget. While on the server I updated the recipe to use 16.04 and it worked, but it didn't reproduce when I ran it again. However, the previous builder for 3.4.2 (the one with the 100gb size) works like a charm.
For the hanging builder, I don't insight for you here but there are records of this sort of thing happening - if you have insights for something I could try I'd be happy to, but figuring out the specifics is beyond the level of support that I can offer you, at least until we have many more reports of this issue.
Why do you not want to use any of the working builders? The older 3.4.2, as long as the recipe is in the root, works.
If you want to try the Sylabs library that's also an option, as is Google Cloud Build. You can also build a docker container and pull down to Singularity, either via Docker Hub or Quay.io (my goto choice typically). Good luck!
Hi @vsoch. Thank you for checking on this! I think when I tried the older builder, the image was not saved (exactly this issue I think), and you suggested that I used the newer builder. I will give 3.2.1 a try and will report back.
Unfortunately I do not have any insight as I am a novice on this myself. I am just trying to get a container build for the collaboration that I am involved with so that we can test it on a cluster.
You might consider an automated build to Docker Hub or Quay, and then pulling down to Singularity, e.g., for the repository vanessa/salad
singularity pull docker://vanessa/salad
this is especially useful for development containers that warrant many builds a day, as Singularity Hub is more intended to build final / "I want to publish this" containers.