Skip to content

Japanese model #450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vochicong opened this issue Jun 2, 2017 · 7 comments
Closed

Japanese model #450

vochicong opened this issue Jun 2, 2017 · 7 comments

Comments

@vochicong
Copy link

Hi, I am interested to using CoreNLP (and DeepDive) with Japanese.
Are you working on Japanese?
And how can I start building a Japanese model?
How did you build Chinese or English models?

Thanks!

@manning
Copy link
Member

manning commented Jun 10, 2017

No, we are not currently working on Japanese. The first requirement to make models for any language is labeled data to train models from. Most of the components in CoreNLP use supervised learning. Traditionally, the public availability of Japanese language corpora hasn't been very good, but, now, e.g., the Japanese Universal Dependencies corpora could be used to train several components (segmenter, POS, depparse). However, I still don't know of any usable Japanese NER data. The other requirement is somebody willing to do the work. In general, our expansions to other languages have occurred because somebody was interested in having the language available for some reason.

@vochicong
Copy link
Author

@manning Thank you for your reply. Actually, I am an NLP newbie and still don't fully understand your valuable answer ;)
But I think that Kuromoji, a Japanese morphological analyzer implemented in Java, looks promising to be embedded in CoreNLP.

I will search for Japanese NER data and if I find one I will share it with you.

BTW, I enjoy your Natural Language Processing with Deep Learning course very much! Thanks for it too!

@vochicong
Copy link
Author

vochicong commented Aug 28, 2017

I found jigg. It's said having similar interface to CoreNLP, actually including CoreNLP and Kuromoji, a Japanese tokenizer. The authors are inspired by CoreNLP, once tried to make an Japanese extension to CoreNLP, but later decided to make jigg for more flexibility.

For Japanese NER, they use JUMAN/KNP.

@AngledLuffa
Copy link
Contributor

FWIW (mostly for archival reasons at this point) there are now Japanese models for stanfordnlp

https://stanfordnlp.github.io/stanfordnlp/models.html#human-languages-supported-by-stanfordnlp

@vochicong
Copy link
Author

Thank you @AngledLuffa for your update.

@devanghingu
Copy link

@AngledLuffa @vochicong i just checked standford package and stanza. but still not support for process relation extraction(kbp)

@AngledLuffa
Copy link
Contributor

There is no relation extraction model in Stanza

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants