louismullie/treat

Impossible to install French package

Opened this issue · 7 comments

Hi, I'm running Fedora 25, and I'm trying to use treat. The gem itself was installed seamlessly, and I then wanted to install the French package (gem install treat, with gem 2.5.1).

A first error I had was about stanford-core-nlp which wasn't buildable because JAVA_HOME wasn't set. A simple export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk resolved the problem.

However I meet a more important problem with the download of models for the Punkt segmenter for the French language. Here is a sample of what I did and its result:

▶ ruby --version
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]
▶ irb --version
irb 0.9.6(09/06/30)
▶ irb 
irb(main):001:0> require 'treat'
=> true
irb(main):002:0>  Treat::Core::Installer.install 'french'

Treat Installer, v. 2.1.0


1. Installing core dependencies.

Installing nokogiri...
Building native extensions.  This could take a while...
WARN: Unresolved specs during Gem::Specification.reset:
      json (~> 1.8)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.
Installing ferret...
Building native extensions.  This could take a while...
Installing bson_ext...
Building native extensions.  This could take a while...
Installing mongo...
Installing lda-ruby...
Building native extensions.  This could take a while...
Installing stanford-core-nlp...
Building native extensions.  This could take a while...
Fetching: bind-it-0.2.7.gem (100%)
Fetching: stanford-core-nlp-0.5.3.gem (100%)
Installing linguistics...
This library also presents tie-ins for the 'linkparser' and
'wordnet' libraries, which you can enable by installing the
gems of the same name.
Installing ruby-readability...
Installing whatlanguage...
Installing chronic...
Installing kronic...
Installing nickel...
Installing decisiontree...
Installing rb-libsvm...
Building native extensions.  This could take a while...
Installing ruby-fann...
Building native extensions.  This could take a while...
Installing zip...
Installing loggability...
Installing tf-idf-similarity...
Installing narray...
Building native extensions.  This could take a while...
Installing fastimage...
Installing fuzzy-string-match...
Installing levenshtein-ffi...
Building native extensions.  This could take a while...

2. Installing dependencies for the French language.

Installing punkt-segmenter...
Installing tactful_tokenizer...
Installing stanford-core-nlp...

3. Downloading models for the Punkt segmenter for the French language.

RuntimeError: Couldn't download http://www.louismullie.com/treat/punkt/french.yaml (Max number of attempts reached). Error: (Couldn't download https://coreslicer.com/treat/punkt/french.yaml (Max number of attempts reached). Error: (Response code was not 200 , but 404.))
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/schiphol-1.0.2/lib/schiphol.rb:138:in `rescue in download'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/schiphol-1.0.2/lib/schiphol.rb:147:in `download'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/treat-2.1.0/lib/treat/core/installer.rb:149:in `download_punkt_models'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/treat-2.1.0/lib/treat/core/installer.rb:55:in `install'
        from (irb):2
        from /home/psychoslave/.rbenv/versions/2.3.1/bin/irb:11:in `<main>'

Is my gem version out of sync with the repository structure it's trying to fetch from? Does it result from a deprecation decision? Should I move to a newer version of treat through the github repository?

Actually the same happen for English:

6. Downloading models for the Punkt segmenter for the English language.

RuntimeError: Couldn't download http://www.louismullie.com/treat/punkt/english.yaml (Max number of attempts reached). Error: (Couldn't download https://coreslicer.com/treat/punkt/english.yaml (Max number of attempts reached). Error: (Response code was not 200 , but 404.))
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/schiphol-1.0.2/lib/schiphol.rb:138:in `rescue in download'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/schiphol-1.0.2/lib/schiphol.rb:147:in `download'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/treat-2.1.0/lib/treat/core/installer.rb:149:in `download_punkt_models'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/treat-2.1.0/lib/treat/core/installer.rb:55:in `install'
        from (irb):3
        from /home/psychoslave/.rbenv/versions/2.3.1/bin/irb:11:in `<main>'

I also tried with the git repository version:

▶ bundle install
Your Gemfile lists the gem rspec (>= 0) more than once.
You should probably keep only one of them.
While it's not a problem now, it could cause errors if you change the version of one of them later.
Your Gemfile lists the gem rake (>= 0) more than once.
You should probably keep only one of them.
While it's not a problem now, it could cause errors if you change the version of one of them later.
Your Gemfile lists the gem simplecov (>= 0) more than once.
You should probably keep only one of them.
While it's not a problem now, it could cause errors if you change the version of one of them later.
Fetching gem metadata from https://rubygems.org/..........
Fetching version metadata from https://rubygems.org/.
Resolving dependencies...
Using rake 12.0.0
Using birch 0.1.1
Using diff-lcs 1.3
Using docile 1.1.5
Using guess_html_encoding 0.0.11
Using json 1.8.6
Using mime-types 1.25.1
Using mini_portile2 2.2.0
Using progressbar 1.8.2
Using rspec-support 3.6.0
Using rubyzip 0.9.9
Using simplecov-html 0.10.1
Installing unicode-display_width 1.3.0
Using bundler 1.14.6
Using yomu 0.2.4
Using nokogiri 1.8.0
Using rspec-core 3.6.0
Using rspec-expectations 3.6.0
Using rspec-mocks 3.6.0
Using schiphol 1.0.2
Using simplecov 0.14.1
Installing terminal-table 1.8.0
Using ruby-readability 0.7.0
Installing rspec 3.6.0
Using treat 2.1.0 from source at `.`
Bundle complete! 13 Gemfile dependencies, 25 gems now installed.
Use `bundle show [gemname]` to see where a bundled gem is installed.

▶ irb  
irb(main):001:0> require 'treat'
=> true
irb(main):002:0> Treat::Core::Installer.install 'english'

Treat Installer, v. 2.1.0


1. Installing core dependencies.

Installing nokogiri...
Building native extensions.  This could take a while...
WARN: Unresolved specs during Gem::Specification.reset:
      json (~> 1.8)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.
Installing ferret...
Building native extensions.  This could take a while...
Installing bson_ext...
Building native extensions.  This could take a while...
Installing mongo...
Installing lda-ruby...
Building native extensions.  This could take a while...
Installing stanford-core-nlp...
Installing linguistics...
This library also presents tie-ins for the 'linkparser' and
'wordnet' libraries, which you can enable by installing the
gems of the same name.
Installing ruby-readability...
Installing whatlanguage...
Installing chronic...
Installing kronic...
Installing nickel...
Installing decisiontree...
Installing rb-libsvm...
Building native extensions.  This could take a while...
Installing ruby-fann...
Building native extensions.  This could take a while...
Installing zip...
Installing loggability...
Installing tf-idf-similarity...
Installing narray...
Building native extensions.  This could take a while...
Installing fastimage...
Installing fuzzy-string-match...
Installing levenshtein-ffi...
Building native extensions.  This could take a while...

2. Installing dependencies for the English language.

Installing rbtagger...
Building native extensions.  This could take a while...
Installing ruby-stemmer...
Building native extensions.  This could take a while...
Installing punkt-segmenter...
Installing tactful_tokenizer...
Installing nickel...
Installing rwordnet...
Installing uea-stemmer...
Installing engtagger...
Installing activesupport...
Installing srx-english...
Installing scalpel...

3. Downloading models for the Punkt segmenter for the English language.

RuntimeError: Couldn't download http://www.louismullie.com/treat/punkt/english.yaml (Max number of attempts reached). Error: (Couldn't download https://coreslicer.com/treat/punkt/english.yaml (Max number of attempts reached). Error: (Response code was not 200 , but 404.))
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/schiphol-1.0.2/lib/schiphol.rb:138:in `rescue in download'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/schiphol-1.0.2/lib/schiphol.rb:147:in `download'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/treat-2.1.0/lib/treat/core/installer.rb:149:in `download_punkt_models'
        from /home/psychoslave/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/treat-2.1.0/lib/treat/core/installer.rb:55:in `install'
        from (irb):2
        from /home/psychoslave/.rbenv/versions/2.3.1/bin/irb:11:in `<main>'

It seems my previous install actually didn't used the git repository version, and indeed server changed in the repository, however I'm astonished gem install would not install such an old change:

▶ git blame lib/treat/core/installer.rb | grep 'Server ='
f1f8c010 lib/treat/core/installer.rb    (Andrew Brown 2016-05-24 13:19:51 -0500   8)   Server = 's3.amazonaws.com/static-public-assets'

So now here is how to actually use the repository version

# gem install wordnet # suggested by install log
gem build treat.gemspec
gem install treat-2.1.0.gem 

And then in irb, the following will work:

Treat::Core::Installer.install

But any attempt to install a French package will fail, because there is indeed no french.yaml accessible:

Treat::Core::Installer.install 'french'

So the French package isn't available on the server, but it should be possible to bypass the problem by directly the copying the relevant file to the subdirectory ./models/punkt/ where the gem is installed. In my case it's in ~/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/treat-2.1.0.

You then need a french.yaml. It looks like the "Punt"s file are based on ".pickle" files, as used in NLTK for example. I need more investigation to find this files, but here are there JSON equivalent: harrisj/punkt@7c64ff0#diff-4bfc17cd24c1afdec0c3ea5f6513a402

Hey @psychoslave, could you finally make it work?