seomoz/simhash-py

from .table import PyCorpus as Corpus

Closed this issue · 6 comments

after run "sudo python setup install "

Python 2.7.3 (default, Aug 1 2012, 05:14:39)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import simhash
Traceback (most recent call last):
File "", line 1, in
File "simhash/init.py", line 3, in
from .table import PyCorpus as Corpus
ImportError: No module named table

I can import simhash, but when I invoke simhash.Corpus(6,3), it suspends. So curious.

I've been able to reproduce this on a fresh Ubuntu Precise box, so hopefully I'll have a fix shortly.

With respect to the lack of being able to import this, it's probably because you're running it from the simhash-py directory. Instead of importing the installed code, it's trying to import the local copy, but unfortunately there's table.py. Instead, simhash.table is built as an extension.

Try changing directories and then importing simhash to see if that works.

With respect to the freezing, I've not yet been able to replicate that. I booted up a Vagrant box and this is what it took to get this up and running:

sudo apt-get install -y python-pip python-dev git
sudo pip install cython

# LibJudy
curl -OL http://downloads.sourceforge.net/project/judy/judy/Judy-1.0.5/Judy-1.0.5.tar.gz
tar xf Judy-1.0.5.tar.gz
rm Judy-1.0.5.tar.gz
pushd judy-1.0.5
./configure --prefix=/usr/local
make
sudo make install
popd
rm -r judy-1.0.5

# Make sure that /usr/local/lib is available for runtime use
echo '/usr/local/lib' | sudo tee /etc/ld.so.conf.d/local.conf
sudo ldconfig

# And now build ours
git clone https://github.com/seomoz/simhash-py
pushd simhash-py
git submodule update --init --recursive
sudo python setup.py install

From there, I was able to import it once I left the simhash-py directory, and successfully complete queries. Can you confirm whether or not this works for you?

I find that the simhash-cpp only run on 64bit system. In 32bit system, It suspends because of
seomoz/simhash-cpp#2 (comment)
http://judy.sourceforge.net/doc/Judy_3x.htm

Yes, we've only attempted builds on 64-bit machines, so it is unlikely to work on 32-bit systems.

What exactly are you trying to use a 32-bit machine for? The whole library is based on 64-bit hashes, so it will be most efficient on a 64-bit machine. Plus, if you have a large corpus of documents, you're going to want large quantities of RAM, and so probably need a 64-bit machine anyway. With 4GB of RAM and 64-bit hashes, you'll only be able to have about 67 million documents and that's if you use every addressable byte in the system, which is not realistic.

Stale, and has sort of veered into the territory of #2 , which has been resolved.