-
Notifications
You must be signed in to change notification settings - Fork 52
Feature/translation - add an functionality to translate event data etc. #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
yammesicka
merged 17 commits into
PythonFreeCourse:develop
from
annashtirberg:feature/translation
Feb 2, 2021
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
4902b04
translation feature stuff
annashtirberg 4bb5f83
Initial translation stuff
annashtirberg b2c7ad4
cleaned up models
annashtirberg ce39245
added iso to requirements.txt
annashtirberg 053d02e
added language to user registration
annashtirberg 47570a5
added tests
annashtirberg 08e392c
Removed unused files
annashtirberg 6a05a9b
Merge branch 'develop' of https://github.com/PythonFreeCourse/calenda…
annashtirberg 7a725a3
added text blob to requirements.txt and made sure it has it's corpora
annashtirberg 4c09398
added translation feature and tests
annashtirberg e6154fe
pulled from upstream
annashtirberg b22c6dc
sorted requirements.txt and removed duplicated
annashtirberg 20c5c60
Fixed flake8 error
annashtirberg fb0a76d
Added test to translation and fixed stuff
annashtirberg 784d231
Merge branch 'develop' of https://github.com/PythonFreeCourse/calenda…
annashtirberg 839adbc
Fixed flake8 error
annashtirberg 797ec83
add tests, changes in function get_user_language
annashtirberg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
from typing import Optional | ||
|
||
from iso639 import languages | ||
from textblob import TextBlob, download_corpora | ||
from textblob.exceptions import NotTranslated | ||
|
||
from app.database.database import SessionLocal | ||
from loguru import logger | ||
|
||
from app.routers.user import get_users | ||
|
||
download_corpora.download_all() | ||
|
||
|
||
def translate_text(text: str, | ||
target_lang: str, | ||
original_lang: Optional[str] = None | ||
) -> str: | ||
""" | ||
Translate text to the target language | ||
optionally given the original language | ||
""" | ||
if not text.strip(): | ||
return "" | ||
if original_lang is None: | ||
original_lang = _detect_text_language(text) | ||
else: | ||
original_lang = _lang_full_to_short(original_lang) | ||
|
||
if original_lang == _lang_full_to_short(target_lang): | ||
return text | ||
|
||
try: | ||
return str(TextBlob(text).translate( | ||
from_lang=original_lang, | ||
to=_lang_full_to_short(target_lang))) | ||
except NotTranslated: | ||
return text | ||
|
||
|
||
def _detect_text_language(text: str) -> str: | ||
""" | ||
Gets some text and returns the language it is in | ||
Uses external API | ||
""" | ||
return str(TextBlob(text).detect_language()) | ||
|
||
|
||
def _get_user_language(user_id: int, session: SessionLocal) -> str: | ||
""" | ||
Gets a user-id and returns the language he speaks | ||
Uses the DB""" | ||
try: | ||
user = get_users(session, id=user_id)[0] | ||
except IndexError: | ||
logger.exception( | ||
"User was not found in the database." | ||
) | ||
return "" | ||
annashtirberg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
else: | ||
return user.language | ||
|
||
|
||
def translate_text_for_user(text: str, | ||
session: SessionLocal, | ||
user_id: int) -> str: | ||
""" | ||
Gets a text and a user-id and returns the text, | ||
translated to the language the user speaks | ||
""" | ||
target_lang = _get_user_language(user_id, session) | ||
if not target_lang: | ||
return text | ||
return translate_text(text, target_lang) | ||
|
||
|
||
def _lang_full_to_short(full_lang: str) -> str: | ||
""" | ||
Gets the full language name and | ||
converts it to a two-letter language name | ||
""" | ||
return languages.get(name=full_lang.capitalize()).alpha2 | ||
annashtirberg marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,3 @@ | ||
import nltk | ||
|
||
|
||
nltk.download('punkt') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,7 +26,8 @@ def get_placeholder_user(): | |
email='[email protected]', | ||
password='1a2s3d4f5g6', | ||
full_name='My Name', | ||
telegram_id='' | ||
telegram_id='', | ||
language='english', | ||
) | ||
|
||
|
||
|
@@ -110,6 +111,7 @@ async def upload_user_photo( | |
# Save to database | ||
user.avatar = await process_image(pic, user) | ||
session.commit() | ||
|
||
finally: | ||
url = router.url_path_for("profile") | ||
return RedirectResponse(url=url, status_code=HTTP_302_FOUND) | ||
|
@@ -145,6 +147,6 @@ async def process_image(image, user): | |
def get_image_crop_area(width, height): | ||
if width > height: | ||
delta = (width - height) // 2 | ||
return (delta, 0, width - delta, height) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the sograim are marked redundant, but has no effect on code |
||
return delta, 0, width - delta, height | ||
delta = (height - width) // 2 | ||
return (0, delta, width, width + delta) | ||
return 0, delta, width, width + delta |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,154 @@ | ||
import pytest | ||
from iso639 import languages | ||
from textblob import TextBlob | ||
|
||
from app.internal.translation import ( | ||
translate_text, | ||
translate_text_for_user, | ||
_get_user_language, | ||
_lang_full_to_short, | ||
_detect_text_language | ||
) | ||
|
||
|
||
@pytest.mark.parametrize("text, target_lang, original_lang", | ||
[("Привет мой друг", "english", "russian"), | ||
("Hola mi amigo", "english", "spanish"), | ||
("Bonjour, mon ami", "english", "french"), | ||
("Hallo, mein Freund", "english", "german"), | ||
]) | ||
def test_translate_text_with_original_lang(text, target_lang, original_lang): | ||
annashtirberg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
answer = translate_text(text, target_lang, original_lang) | ||
assert "Hello my friend" == answer | ||
assert TextBlob(text).detect_language() == languages.get( | ||
name=original_lang.capitalize()).alpha2 | ||
assert TextBlob(answer).detect_language() == languages.get( | ||
name=target_lang.capitalize()).alpha2 | ||
|
||
|
||
@pytest.mark.parametrize("text, target_lang", | ||
[("Привет мой друг", "english"), | ||
("Bonjour, mon ami", "english"), | ||
("Hallo, mein Freund", "english"), | ||
]) | ||
def test_translate_text_without_original_lang(text, target_lang): | ||
answer = translate_text(text, target_lang) | ||
assert "Hello my friend" == answer | ||
assert TextBlob(answer).detect_language() == languages.get( | ||
name=target_lang.capitalize()).alpha2 | ||
|
||
|
||
@pytest.mark.parametrize("text, target_lang, original_lang", | ||
[("Привет мой друг", "russian", "russian"), | ||
("Hola mi amigo", "spanish", "spanish"), | ||
("Bonjour, mon ami", "french", "french"), | ||
("Hallo, mein Freund", "german", "german"), | ||
("Ciao amico", "italian", "italian") | ||
]) | ||
def test_translate_text_with_same_original_target_lang_with_original_lang( | ||
text, | ||
target_lang, | ||
original_lang): | ||
answer = translate_text(text, target_lang, original_lang) | ||
assert answer == text | ||
|
||
|
||
@pytest.mark.parametrize("text, target_lang", | ||
[("Привет мой друг", "russian"), | ||
("Hola mi amigo", "spanish"), | ||
("Bonjour, mon ami", "french"), | ||
("Hallo, mein Freund", "german"), | ||
("Ciao amico", "italian") | ||
]) | ||
def test_translate_text_with_same_original_target_lang_without_original_lang( | ||
text, | ||
target_lang): | ||
answer = translate_text(text, target_lang) | ||
assert answer == text | ||
|
||
|
||
def test_translate_text_without_text_with_original_target_lang(): | ||
answer = translate_text("", "english", "russian") | ||
assert answer == "" | ||
|
||
|
||
def test_translate_text_without_text_without_original_lang(): | ||
answer = translate_text("", "english") | ||
assert answer == "" | ||
|
||
|
||
def test_lang_short_to_full(): | ||
answer = _lang_full_to_short("english") | ||
assert answer == "en" | ||
|
||
|
||
def test_get_user_language(user, session): | ||
user_id = user.id | ||
answer = _get_user_language(user_id, session=session) | ||
assert user_id == 1 | ||
assert answer.lower() == "english" | ||
|
||
|
||
@pytest.mark.parametrize("text", ["Привет мой друг", | ||
"Bonjour, mon ami", | ||
"Hello my friend"] | ||
) | ||
def test_translate_text_for_good_user(text, user, session): | ||
user_id = user.id | ||
answer = translate_text_for_user(text, session, user_id) | ||
assert answer == "Hello my friend" | ||
|
||
|
||
def test_translate_text_for_bed_user(user, session): | ||
user_id = user.id | ||
answer = translate_text_for_user("Привет мой друг", session, user_id + 1) | ||
assert answer == "Привет мой друг" | ||
|
||
|
||
def test_detect_text_language(): | ||
answer = _detect_text_language("Hello my friend") | ||
assert answer == "en" | ||
|
||
|
||
@pytest.mark.parametrize("text, target_lang, original_lang", | ||
[("Hoghhflaff", "english", "spanish"), | ||
("Bdonfdjourr", "english", "french"), | ||
("Hafdllnnc", "english", "german"), | ||
]) | ||
def test_translate_text_with_text_impossible_to_translate( | ||
annashtirberg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
text, | ||
target_lang, | ||
original_lang): | ||
answer = translate_text(text, target_lang, original_lang) | ||
assert answer == text | ||
|
||
|
||
@pytest.mark.parametrize("text, target_lang, original_lang", | ||
[("@Здравствуй#мой$друг!", "english", "russian"), | ||
("@Hola#mi$amigo!", "english", "spanish"), | ||
("@Bonjour#mon$ami!", "english", "french"), | ||
("@Hallo#mein$Freund!", "english", "german"), | ||
]) | ||
def test_translate_text_with_symbols(text, target_lang, original_lang): | ||
answer = translate_text(text, target_lang, original_lang) | ||
assert "@ Hello # my $ friend!" == answer | ||
|
||
|
||
@pytest.mark.parametrize("text, target_lang, original_lang", | ||
[("Привет мой друг", "italian", "spanish"), | ||
("Hola mi amigo", "english", "russian"), | ||
("Bonjour, mon ami", "russian", "german"), | ||
("Ciao amico", "french", "german") | ||
]) | ||
def test_translate_text_with_with_incorrect_lang( | ||
text, | ||
target_lang, | ||
original_lang): | ||
answer = translate_text(text, target_lang, original_lang) | ||
assert answer == text | ||
|
||
|
||
def test_get_user_language_for_bed_user(user, session): | ||
user_id = user.id + 1 | ||
answer = _get_user_language(user_id, session=session) | ||
assert not answer |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,10 +9,12 @@ def test_create_user(self, session): | |
username='new_test_username', | ||
password='new_test_password', | ||
email='[email protected]', | ||
language='english' | ||
) | ||
assert user.username == 'new_test_username' | ||
assert user.password == 'new_test_password' | ||
assert user.email == '[email protected]' | ||
assert user.language == 'english' | ||
session.delete(user) | ||
session.commit() | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,7 @@ def user(session: Session) -> User: | |
username='test_username', | ||
password='test_password', | ||
email='[email protected]', | ||
language='english' | ||
) | ||
yield test_user | ||
delete_instance(session, test_user) | ||
|
@@ -24,6 +25,7 @@ def sender(session: Session) -> User: | |
username='sender_username', | ||
password='sender_password', | ||
email='[email protected]', | ||
language='english' | ||
) | ||
yield sender | ||
delete_instance(session, sender) | ||
annashtirberg marked this conversation as resolved.
Show resolved
Hide resolved
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change it to a lazy call? (so we will call it in the first time it's needed and not always)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the same as other nltk uses in other pull requests
also will only take time on the first time
after that is fast