Installation issues
Closed this issue · 23 comments
With v2020-08-04, I encountered several installation issues after following the native install guide by cloning, running sudo make deps-ubuntu
followed by make all
:
-
sudo make deps-ubuntu
creates.../ocrd_all/.git
and~/.parallel
with root permissions, requiring manually setting the correct file ownership viachown -R user:user
-
When I now run
parallel --citation
followed bywill cite
and re-trigger the build viamake all
, it hangs immediately at the first line withsem --fg --id ocrd_all git submodule sync cor-asv-ann
-
After deactivating
cor-asv-ann
and continuing withmake all
I discovered that all modules are emtpy (*** No rule to make target 'install'.
) and need to be individually cloned viagit submodule update --init
and then againmake all
to arrive at executable modules.
Thanks @bertsky. When I apply the correct permissions and start the build from your fix-sub-venv branch with sudo make deps-ubuntu
, it hangs for me again at sem --fg --id ocrd_all_git git submodule sync core
though.
When I apply the correct permissions and start the build from your fix-sub-venv branch with
sudo make deps-ubuntu
, it hangs for me again atsem --fg --id ocrd_all_git git submodule sync core
though.
Strange! This indicates there still is a problem with parallel
.
Sorry, I need your help:
- Does
echo 1 | parallel echo
print "1" (and nothing else) for you? - How does
~/.parallel
look like? - Does
sudo bash -c "echo ~"
show your user's home, or/root
?
Does
echo 1 | parallel echo
print "1" (and nothing else) for you?
Yes, just 1
and nothing else.
How does
~/.parallel
look like?
drwxr-xr-x 4 cnd cnd 4096 Aug 14 17:01 .
drwxr-xr-x 47 cnd cnd 4096 Aug 17 14:06 ..
drwxr-xr-x 3 cnd cnd 4096 Aug 17 14:05 semaphores
drwxr-xr-x 2 cnd cnd 4096 Aug 14 16:50 tmp
-rw-r--r-- 1 cnd cnd 0 Aug 14 17:01 will-cite
Does
sudo bash -c "echo ~"
show your user's home, or /root?
It prints my user homedir (/home/cnd
).
Thx for your help!
Thanks – this looks good regarding "registration" state.
And is there anything under ~/.parallel/semaphores
(while sudo make deps-ubuntu
hangs)?
If so, then for each of these directories, you should see the PID of the process in this semaphore in one of the file names, so you could pstree -as PID
them. We are looking for another process that might block the semaphore. (Otherwise I am currently out of ideas why the semaphore cannot be entered.)
When I trigger another run of sudo make deps-ubuntu
, id-ocrd_all_git
emerges under ~/.parallel/semaphores
. Within that, I have 10778@XYZ
- so I assume 10778
is the PID here?
pstree -as 10778
prints a long tree, what should I be looking for here exactly?
When I trigger another run of
sudo make deps-ubuntu
,id-ocrd_all_git
emerges under~/.parallel/semaphores
. Within that, I have10778@XYZ
- so I assume10778
is the PID here?
Yes, exactly.
pstree -as 10778
prints a long tree, what should I be looking for here exactly?
It's fine as long as there is no other process for our sema name id-ocrd_all_git
. So there should be a git
beneath sem
at the leaf.
Now the question becomes why that git
process is blocking. Could you show me cat /proc/PID/wchan
of that child PID?
So there should be a
git
beneathsem
at the leaf.
Sorry not sure I can follow...it looks like this (missing git
leaf?):
[...]
│ │ ├─zsh
│ │ │ └─sudo make deps-ubuntu
│ │ │ └─make deps-ubuntu
│ │ │ └─perl /usr/bin/sem --fg --id ocrd_all_git git submodule sync core
[...]
The output of cat /proc/10778/wchan
tells me there is "No such file or directory"...
it looks like this (missing
git
leaf?
yes! This means the problem is on parallel
's part. But you saw no other users of that semaphore.
This might be an instance of this bug. What does sem --version
say?
The output of
cat /proc/10778/wchan
tells me there is "No such file or directory"...
Probably not Linux, but nevermind (see above).
But you saw no other users of that semaphore.
Correct.
Output of sem --version
:
GNU parallel 20161222
Copyright (C) 2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Ole Tange and Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
GNU parallel comes with no warranty.
Web site: http://www.gnu.org/software/parallel
When using programs that use GNU Parallel to process data for publication
please cite as described in 'parallel --citation'.
I'm on regular Ubuntu 18.04 btw.
Output of
sem --version
:GNU parallel 20161222
Oh, ok. I have the same version, but cannot reproduce.
I am running out of ideas. Could you please try make deinit
to remove all submodules, and then try sudo make --trace deps-ubuntu | tee build.log
again?
After running make deinit
, sudo make --trace deps-ubuntu | tee build.log
fails with make: *** No rule to make target 'deps-ubuntu'. Stop.
I guess I will just wait until #145 is merged and try again from scratch...
fails with
make: *** No rule to make target 'deps-ubuntu'. Stop.
Sorry, didn't anticipate that either! I'd be very much interested in that build.log I must say ...
Tried again from a fresh clone of fix-sub-venv, same issue again. It halts at sem --fg --id ocrd_all_git git submodule sync core
.
I then ctrl+c
that, run make deinit
which returns me
make -C ocrd_olena clean-olena BUILD_DIR=/home/cnd/tmp/dev/ocrd/ocrd_all/venv/build/ocrd_olena
make[1]: Entering directory '/home/cnd/tmp/dev/ocrd/ocrd_all/ocrd_olena'
make[1]: No rule to make target 'clean-olena'. Stop.
make[1]: Leaving directory '/home/cnd/tmp/dev/ocrd/ocrd_all/ocrd_olena'
Makefile:282: recipe for target 'clean-olena' failed
make: [clean-olena] Error 2
Running sudo make --trace deps-ubuntu | tee build.log
enters the same thing again, build.log not very verbose but contains Makefile:116: target 'core' does not exist
. When I run make modules
instead, it halts with sem --fg --id ocrd_all_git git submodule sync cor-asv-ann
.
Tried again from a fresh clone of fix-sub-venv, same issue again. It halts at
sem --fg --id ocrd_all_git git submodule sync core
.
It does take a while to fetch every submodule (esp. the large/recursive ones), but it continues to after that (in a fresh clone on Ubuntu 18.04) here.
I then
ctrl+c
that, runmake deinit
which returns memake -C ocrd_olena clean-olena BUILD_DIR=/home/cnd/tmp/dev/ocrd/ocrd_all/venv/build/ocrd_olena make[1]: Entering directory '/home/cnd/tmp/dev/ocrd/ocrd_all/ocrd_olena' make[1]: No rule to make target 'clean-olena'. Stop. make[1]: Leaving directory '/home/cnd/tmp/dev/ocrd/ocrd_all/ocrd_olena' Makefile:282: recipe for target 'clean-olena' failed make: [clean-olena] Error 2
ah, should make clean-olena
ignore errors... (consider make -k deinit
)
Running
sudo make --trace deps-ubuntu | tee build.log
enters the same thing again, build.log not very verbose but containsMakefile:116: target 'core' does not exist
.
Sure – it needs to checkout the submodule first.
When I run
make modules
instead, it halts withsem --fg --id ocrd_all_git git submodule sync cor-asv-ann
.
So, depending on whether your submodules get updated as part of deps-ubuntu
or part of modules
, the first module it "stops" at is different. That's nothing unusual either.
Again: are you sure you just didn't stop a running git clone
there?
Pretty much had it running with 0% CPU and disk since roughly 11:15 today, so yes I assume it hangs - also the 2nd run is still at the first step ;)
Thanks anway for all your efforts!
make -k deinit
also fails because of same error(s) with olena (Target 'deinit' not remade because of errors.
).
Pretty much had it running with 0% CPU and disk since roughly 11:15 today, so yes I assume it hangs - also the 2nd run is still at the first step ;)
Oh, I see. That brings in another perspective: Your first run (which I assume started before your original posting above) failed because of #140 and the other problems identified down the road and solved in #145. But you didn't cancel it, so the original sem
call was still waiting for user input. It also still claimed the semaphore, preventing any other calls (from later attempts) to enter it. (Although I don't understand why we didn't see the original semaphore in your ~/.parallel
. Perhaps that directory was already gone and replaced by a new one of the same name?) Under that hypothesis, as soon as you actually interrupt the original run, all further runs on https://github.com/bertsky/ocrd_all/tree/fix-sub-venv should work.
I did cancel the 1st build though before starting the 2nd one!
So anyways I now stopped also the 2nd run, rm -rf
'd my clone and rebooted, to be sure. After that, I checked out your branch once more but kept running into the exact same issue again.
However, after I manually removed any contents under ./parallel/semaphores
, I am now able to get the build working again.
I did cancel the 1st build though before starting the 2nd one!
Then I misread you.
So anyways I now stopped also the 2nd run,
rm -rf
'd my clone and rebooted, to be sure. After that, I checked out your branch once more but kept running into the exact same issue again.However, after I manually removed any contents under
./parallel/semaphores
, I am now able to get the build working again.
Wow! Good to know. So parallel
does not invalidate its lock files, even after reboot. I'll try to add some respective clean-up to #145.
Anyway, thanks @cneud for being thorough!
Thank you for all your help and the quick fix! This took already quite a bite of our day, I cannot wait to see #145 merged ;)
Btw I assume the files under ./parallel/semaphores
were even created in a previous failed build before my 1st attempt of today, which would explain that cat /proc/PID/wchan
returned "Not found".
This took already quite a bite of our day, I cannot wait to see #145 merged ;)
Same here, but we'll need a consensus first.
Btw I assume the files under
./parallel/semaphores
were even created in a previous failed build before my 1st attempt of today, which would explain thatcat /proc/PID/wchan
returned "Not found".
Yup, that might very well have been just that!