ykdojo/editdojo

Automatically detect if the given text is Japanese or English with Python

ykdojo opened this issue · 10 comments

I think I'm going to release the Twitter-based version of this product for Japanese and English first. So, we should be able to detect if a given tweet is written in Japanese or English with Python. This way, we can only show Japanese tweets coming from Japanese learners to native speakers of the language. Same with English.

@ykdojo does it mean whenever there is a japanese tweet from a person,the person who is familiar with Japanese will only be able to see that.?or all the members in the community?If we notify only japanese familiar people,then while using this twitter app,they must be registered as learning English knows japanese?Is your thought process is the similar to this?,What I have understood.By the way I am very much interested in contributing to this app idea from which I can gain more knowledge.we can do this to other languages aswell here in India :)

Small doubt :(

Hmm here's an example to clarify.

Suppose User A is learning Japanese, and her native language is English.

She starts using one of her Twitter accounts, say, @uesr_a_jp to start tweeting stuff in Japanese.

Then, Japanese native speakers should start seeing these tweets so they can fix them.

However, I'm only concerned that, what if @user_a_jp starts tweeting stuff in both Japanese and English? We should probably be able to ignore all English tweets in that case.

For something like this, we could look into the langdetect library? If, following along with the above example, @user_a_jp writes a tweet that returns 'en', we would ignore the tweet.

Oh yeah, the langdetect library looks good!

Would you like me to go ahead and create a few functions that make use of the library? @ykdojo

NOTE: there's already a PR for this. #29

Will come back to this when it's more immediately useful.

would it be easier to implement google traductors feature of automatic language detection or its something extra and unnecessary ? @ykdojo

Yeah, actually I think that will be ideal.