OCR-D/ocrd_all

Installation issues

Closed this issue · 23 comments

cneud commented

With v2020-08-04, I encountered several installation issues after following the native install guide by cloning, running sudo make deps-ubuntu followed by make all:

  • sudo make deps-ubuntu creates .../ocrd_all/.git and ~/.parallel with root permissions, requiring manually setting the correct file ownership via chown -R user:user

  • When I now run parallel --citation followed by will cite and re-trigger the build via make all, it hangs immediately at the first line with sem --fg --id ocrd_all git submodule sync cor-asv-ann

  • After deactivating cor-asv-ann and continuing with make all I discovered that all modules are emtpy (*** No rule to make target 'install'.) and need to be individually cloned via git submodule update --init and then again make all to arrive at executable modules.

All this looks like #140 and follow-up errors. Please try with the fixes in #145 (i.e. reset directory ownership, pull from PR, then sudo make deps-ubuntu and make all)

cneud commented

Thanks @bertsky. When I apply the correct permissions and start the build from your fix-sub-venv branch with sudo make deps-ubuntu, it hangs for me again at sem --fg --id ocrd_all_git git submodule sync core though.

When I apply the correct permissions and start the build from your fix-sub-venv branch with sudo make deps-ubuntu, it hangs for me again at sem --fg --id ocrd_all_git git submodule sync core though.

Strange! This indicates there still is a problem with parallel.

Sorry, I need your help:

  • Does echo 1 | parallel echo print "1" (and nothing else) for you?
  • How does ~/.parallel look like?
  • Does sudo bash -c "echo ~" show your user's home, or /root?
cneud commented

Does echo 1 | parallel echo print "1" (and nothing else) for you?

Yes, just 1 and nothing else.

How does ~/.parallel look like?

drwxr-xr-x  4 cnd cnd 4096 Aug 14 17:01 .
drwxr-xr-x 47 cnd cnd 4096 Aug 17 14:06 ..
drwxr-xr-x  3 cnd cnd 4096 Aug 17 14:05 semaphores
drwxr-xr-x  2 cnd cnd 4096 Aug 14 16:50 tmp
-rw-r--r--  1 cnd cnd    0 Aug 14 17:01 will-cite

Does sudo bash -c "echo ~" show your user's home, or /root?

It prints my user homedir (/home/cnd).

Thx for your help!

Thanks – this looks good regarding "registration" state.

And is there anything under ~/.parallel/semaphores (while sudo make deps-ubuntu hangs)?

If so, then for each of these directories, you should see the PID of the process in this semaphore in one of the file names, so you could pstree -as PID them. We are looking for another process that might block the semaphore. (Otherwise I am currently out of ideas why the semaphore cannot be entered.)

cneud commented

When I trigger another run of sudo make deps-ubuntu, id-ocrd_all_git emerges under ~/.parallel/semaphores. Within that, I have 10778@XYZ - so I assume 10778 is the PID here?

pstree -as 10778 prints a long tree, what should I be looking for here exactly?

When I trigger another run of sudo make deps-ubuntu, id-ocrd_all_git emerges under ~/.parallel/semaphores. Within that, I have 10778@XYZ - so I assume 10778 is the PID here?

Yes, exactly.

pstree -as 10778 prints a long tree, what should I be looking for here exactly?

It's fine as long as there is no other process for our sema name id-ocrd_all_git. So there should be a git beneath sem at the leaf.

Now the question becomes why that git process is blocking. Could you show me cat /proc/PID/wchan of that child PID?

cneud commented

So there should be a git beneath sem at the leaf.

Sorry not sure I can follow...it looks like this (missing git leaf?):

[...]
  │   │   ├─zsh
  │   │   │   └─sudo make deps-ubuntu
  │   │   │       └─make deps-ubuntu
  │   │   │           └─perl /usr/bin/sem --fg --id ocrd_all_git git submodule sync core
[...]

The output of cat /proc/10778/wchan tells me there is "No such file or directory"...

it looks like this (missing git leaf?

yes! This means the problem is on parallel's part. But you saw no other users of that semaphore.

This might be an instance of this bug. What does sem --version say?

The output of cat /proc/10778/wchan tells me there is "No such file or directory"...

Probably not Linux, but nevermind (see above).

cneud commented

But you saw no other users of that semaphore.

Correct.

Output of sem --version:

GNU parallel 20161222
Copyright (C) 2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Ole Tange and Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
GNU parallel comes with no warranty.

Web site: http://www.gnu.org/software/parallel

When using programs that use GNU Parallel to process data for publication
please cite as described in 'parallel --citation'.

I'm on regular Ubuntu 18.04 btw.

Output of sem --version:

GNU parallel 20161222

Oh, ok. I have the same version, but cannot reproduce.

I am running out of ideas. Could you please try make deinit to remove all submodules, and then try sudo make --trace deps-ubuntu | tee build.log again?

cneud commented

After running make deinit, sudo make --trace deps-ubuntu | tee build.log fails with make: *** No rule to make target 'deps-ubuntu'. Stop.

I guess I will just wait until #145 is merged and try again from scratch...

fails with make: *** No rule to make target 'deps-ubuntu'. Stop.

Sorry, didn't anticipate that either! I'd be very much interested in that build.log I must say ...

cneud commented

Tried again from a fresh clone of fix-sub-venv, same issue again. It halts at sem --fg --id ocrd_all_git git submodule sync core.

I then ctrl+c that, run make deinit which returns me

make -C ocrd_olena clean-olena BUILD_DIR=/home/cnd/tmp/dev/ocrd/ocrd_all/venv/build/ocrd_olena
make[1]: Entering directory '/home/cnd/tmp/dev/ocrd/ocrd_all/ocrd_olena'
make[1]: No rule to make target 'clean-olena'. Stop.
make[1]: Leaving directory '/home/cnd/tmp/dev/ocrd/ocrd_all/ocrd_olena'
Makefile:282: recipe for target 'clean-olena' failed
make: [clean-olena] Error 2

Running sudo make --trace deps-ubuntu | tee build.log enters the same thing again, build.log not very verbose but contains Makefile:116: target 'core' does not exist. When I run make modules instead, it halts with sem --fg --id ocrd_all_git git submodule sync cor-asv-ann.

Tried again from a fresh clone of fix-sub-venv, same issue again. It halts at sem --fg --id ocrd_all_git git submodule sync core.

It does take a while to fetch every submodule (esp. the large/recursive ones), but it continues to after that (in a fresh clone on Ubuntu 18.04) here.

I then ctrl+c that, run make deinit which returns me

make -C ocrd_olena clean-olena BUILD_DIR=/home/cnd/tmp/dev/ocrd/ocrd_all/venv/build/ocrd_olena
make[1]: Entering directory '/home/cnd/tmp/dev/ocrd/ocrd_all/ocrd_olena'
make[1]: No rule to make target 'clean-olena'. Stop.
make[1]: Leaving directory '/home/cnd/tmp/dev/ocrd/ocrd_all/ocrd_olena'
Makefile:282: recipe for target 'clean-olena' failed
make: [clean-olena] Error 2

ah, should make clean-olena ignore errors... (consider make -k deinit)

Running sudo make --trace deps-ubuntu | tee build.log enters the same thing again, build.log not very verbose but contains Makefile:116: target 'core' does not exist.

Sure – it needs to checkout the submodule first.

When I run make modules instead, it halts with sem --fg --id ocrd_all_git git submodule sync cor-asv-ann.

So, depending on whether your submodules get updated as part of deps-ubuntu or part of modules, the first module it "stops" at is different. That's nothing unusual either.

Again: are you sure you just didn't stop a running git clone there?

cneud commented

Pretty much had it running with 0% CPU and disk since roughly 11:15 today, so yes I assume it hangs - also the 2nd run is still at the first step ;)

Thanks anway for all your efforts!

cneud commented

make -k deinit also fails because of same error(s) with olena (Target 'deinit' not remade because of errors.).

Pretty much had it running with 0% CPU and disk since roughly 11:15 today, so yes I assume it hangs - also the 2nd run is still at the first step ;)

Oh, I see. That brings in another perspective: Your first run (which I assume started before your original posting above) failed because of #140 and the other problems identified down the road and solved in #145. But you didn't cancel it, so the original sem call was still waiting for user input. It also still claimed the semaphore, preventing any other calls (from later attempts) to enter it. (Although I don't understand why we didn't see the original semaphore in your ~/.parallel. Perhaps that directory was already gone and replaced by a new one of the same name?) Under that hypothesis, as soon as you actually interrupt the original run, all further runs on https://github.com/bertsky/ocrd_all/tree/fix-sub-venv should work.

cneud commented

I did cancel the 1st build though before starting the 2nd one!

So anyways I now stopped also the 2nd run, rm -rf'd my clone and rebooted, to be sure. After that, I checked out your branch once more but kept running into the exact same issue again.

However, after I manually removed any contents under ./parallel/semaphores, I am now able to get the build working again.

I did cancel the 1st build though before starting the 2nd one!

Then I misread you.

So anyways I now stopped also the 2nd run, rm -rf'd my clone and rebooted, to be sure. After that, I checked out your branch once more but kept running into the exact same issue again.

However, after I manually removed any contents under ./parallel/semaphores, I am now able to get the build working again.

Wow! Good to know. So parallel does not invalidate its lock files, even after reboot. I'll try to add some respective clean-up to #145.

Anyway, thanks @cneud for being thorough!

cneud commented

Thank you for all your help and the quick fix! This took already quite a bite of our day, I cannot wait to see #145 merged ;)

cneud commented

Btw I assume the files under ./parallel/semaphores were even created in a previous failed build before my 1st attempt of today, which would explain that cat /proc/PID/wchan returned "Not found".

This took already quite a bite of our day, I cannot wait to see #145 merged ;)

Same here, but we'll need a consensus first.

Btw I assume the files under ./parallel/semaphores were even created in a previous failed build before my 1st attempt of today, which would explain that cat /proc/PID/wchan returned "Not found".

Yup, that might very well have been just that!