Abhijit-2592/spacy-langdetect

Spacy V3 decorator string name

rennanvoa2 opened this issue ยท 5 comments

Hello guys,
With the V3 update when I run the example code it complains:

ValueError: [E966] `nlp.add_pipe` now takes the string name of the registered component factory, not a callable component. Expected string, but got <spacy_cld.spacy_cld.LanguageDetector object at 0x7fb8d9051ed0> (name: 'None').

- If you created your component with `nlp.create_pipe('name')`: remove nlp.create_pipe and call `nlp.add_pipe('name')` instead.

- If you passed in a component like `TextCategorizer()`: call `nlp.add_pipe` with the string name instead, e.g. `nlp.add_pipe('textcat')`.

- If you're using a custom component: Add the decorator `@Language.component` (for function components) or `@Language.factory` (for class components / factories) to your custom component and assign it a name, e.g. `@Language.component('your_name')`. You can then run `nlp.add_pipe('your_name')` to add it to the pipeline.

I figured out that now we have to pass the string name, to nlp.add_pipe but how?

I've tried nlp.add_pipe("langdetect"), nlp.add_pipe("LanguageDetector"),nlp.add_pipe("languagedetector") and none of them seems to work.

Can you help me with this ?

Hi,

Since I'm new to SpaCy and Python, I'm not sure if this is the correct way to implement it. For Python 3.9 with SpaCy 3.0.3 the following worked for me:

import spacy
from spacy.language import Language
from spacy_langdetect import LanguageDetector

# Add LanguageDetector and assign it a string name
@Language.factory("language_detector")
def create_language_detector(nlp, name):
    return LanguageDetector(language_detection_function=None)

# Use a blank Pipeline, also a model can be used, e.g. nlp = spacy.load("en_core_web_sm")
nlp = spacy.blank("en")

# Add sentencizer for longer text
nlp.add_pipe('sentencizer')

# Add components using their string names
nlp.add_pipe("language_detector")

# Analyze components and their attributes
text = "This is an English text."
doc = nlp(text)

# Document level language detection.
print(doc._.language)

# See what happened to the pipes
nlp.analyze_pipes(pretty=True)`

I got on this track with: Language-specific pipeline

Is this the right way to use it with SpaCy3?

How to use the result for language specific processing?
Do I have to load language specific models, e.g.
nlp_en = spacy.load("en_core_web_sm") and
nlp_de = spacy.load("de_core_news_sm")?

Many thanks and best regards,

Cusard

same problem

Hello everybody!
Thanks to @Cusard I got the example code to work with the current spacy version.

import spacy
from spacy.language import Language
from spacy_langdetect import LanguageDetector

@Language.factory("language_detector")
def create_language_detector(nlp, name):
    return LanguageDetector(language_detection_function=None)

nlp = spacy.load("en_core_web_sm")

nlp.add_pipe('language_detector')
text = 'This is an english text.'
doc = nlp(text)
# document level language detection. Think of it like average language of the document!
print(doc._.language)
# sentence level language detection
for sent in doc.sents:
   print(sent, sent._.language)

The output looks like this:

{'language': 'en', 'score': 0.9999983570159962}
This is an english text. {'language': 'en', 'score': 0.9999956329695125}

Thanks for sharing the solution. It worked for me too.

It will be nice if the project home page had the example update: https://spacy.io/universe/project/spacy-langdetect

The example provided by @FelixSiegfriedRiedel works for me with v3.3.

I've also raised an issue about updating the documentation: explosion/spaCy#11038