Skip to content
This repository was archived by the owner on Aug 6, 2025. It is now read-only.

Option to install MeCab automatically (also on Docker) and fix MeCab encoding issues#97

Closed
yannvgn wants to merge 3 commits intofacebookresearch:masterfrom
yannvgn:mecab-installation
Closed

Option to install MeCab automatically (also on Docker) and fix MeCab encoding issues#97
yannvgn wants to merge 3 commits intofacebookresearch:masterfrom
yannvgn:mecab-installation

Conversation

@yannvgn
Copy link
Copy Markdown

@yannvgn yannvgn commented Oct 6, 2019

Hi @hoschwenk

I'm the maintainer of laserembeddings, a packaged version of LASER installable with pip.
During the generation of samples of sentence embeddings (used to make sure the results I get with my package are the same as computed here) I had troubles processing Japanese sentences.

I followed the installation instructions of MeCab but I had weird issues that seemed to be related to the encoding used by MeCab.

I also noticed that MeCab was not installed by default in the Docker version of LASER (there's no reason not to install it there by default I think).

This PR in a nutshell:

  • I added the --install-mecab option to install_external_tools.sh (disabled by default) to try the automatic installation of MeCab (which should work in many contexts)
  • Mecab is now installed by default on Docker
  • MeCab is now installed in UTF-8 (I followed this: Trying to get Japanese tokenization to work #54 (comment))
  • I fixed the line that checks if MeCab is already installed (so that it's not being re-installed when running install_external_tools.sh again)

This would hopefully solve #54 and #92

Last but not least: THANK YOU for releasing and open-sourcing LASER. It is SO helpful for us who have to work on multilingual contexts. ✨✨✨

fix mecab installation test
install mecab on dockerized laser
@facebook-github-bot
Copy link
Copy Markdown

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Oct 6, 2019
@facebook-github-bot
Copy link
Copy Markdown

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@yannvgn
Copy link
Copy Markdown
Author

yannvgn commented Nov 20, 2020

I'm closing this as MeCab is not needed anymore with LASER2 (#153 (comment))

@yannvgn yannvgn closed this Nov 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA Signed Do not delete this pull request or issue due to inactivity.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants