You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 20, 2026. It is now read-only.
✓ Model loaded successfully!
============================================================
ENGLISH TO ALBANIAN TESTS
============================================================
English: Hello, how are you?
Albanian: •••• How are you?•••How are you? •••• ••• How do you feel?•• • ••• • How are you feeling?•• How did you feel? •• •• • • • •• •
English: Good morning!
Albanian: ️️🏻🏻️🏼🏼️♀️🏼 🏼🏻🏼♀️ ️♂️🏼♀️️♀️🏻 🏼 ♀️
English: Where is the library?
Albanian: •••• •••• Where is the library?•••■ Where is the Library?••■•••
English: Thank you very much.
Albanian: Thank you very much Thank you so much
English: The weather is beautiful today.
Albanian: ️️🏻️🏼️︎️🇪️♀️️ ️♂️🏼 🏼🏼♀️ 🏼 ♀️🏼 ?? ♀️ ♂️
English: Hello, how are you?
Albanian: •••• How are you?•••How are you? •••• ••• How do you feel?•• • ••• • How are you feeling?•• How did you feel? •• •• • • • •• •
English: Good morning!
Albanian: ️️🏻🏻️🏼🏼️♀️🏼 🏼🏻🏼♀️ ️♂️🏼♀️️♀️🏻 🏼 ♀️
English: Where is the library?
Albanian: •••• •••• Where is the library?•••■ Where is the Library?••■•••
English: Thank you very much.
Albanian: Thank you very much Thank you so much
To Reproduce
def translate_nllb(text, src_lang='eng_Latn', tgt_lang='sqi_Latn', model_size='3.3B'):
"""Use local NLLB model (doesn't work well for Albanian)"""
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
model_name = f"facebook/nllb-200-{model_size}"
device = "mps" if torch.backends.mps.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)
tokenizer.src_lang = src_lang
tokenizer.tgt_lang = tgt_lang
inputs = tokenizer(text, return_tensors="pt").input_ids.to(device)
translated_tokens = model.generate(
input_ids=inputs,
forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang),
max_length=len(inputs[0]) + 50,
num_beams=5,
num_return_sequences=1,
no_repeat_ngram_size=4,
renormalize_logits=True
)
return tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
( they dont work even with osesPunctNormalizer and sentence splitting )
> cat test_unesco_exact.py
#!/usr/bin/env python3
"""Test with UNESCO's EXACT implementation"""
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from sacremoses import MosesPunctNormalizer
import torch
# Initialize
print("Loading 3.3B model with UNESCO's exact approach...")
model_name = "facebook/nllb-200-3.3B"
device = "mps" if torch.backends.mps.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)
punct_normalizer = MosesPunctNormalizer(lang="en")
print(f"Model loaded on {device}\n")
def translate_unesco_style(text, src_code='eng_Latn', tgt_code='sqi_Latn'):
"""Exact UNESCO implementation"""
# Set languages
tokenizer.src_lang = src_code
tokenizer.tgt_lang = tgt_code
# Normalize punctuation
text = punct_normalizer.normalize(text)
# Tokenize (UNESCO style - convert to list and back)
input_tokens = tokenizer(text, return_tensors="pt").input_ids[0].cpu().numpy().tolist()
# Generate
translated_chunk = model.generate(
input_ids=torch.tensor([input_tokens]).to(device),
forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_code),
max_length=len(input_tokens) + 50,
num_return_sequences=1,
num_beams=5,
no_repeat_ngram_size=4,
renormalize_logits=True
)
# Decode
return tokenizer.batch_decode(translated_chunk, skip_special_tokens=True)[0]
# Test
print("="*60)
print("TESTING WITH UNESCO'S EXACT CODE")
print("="*60 + "\n")
tests = [
"Hello, how are you?",
"Good morning!",
"Where is the library?",
"Thank you very much.",
]
for text in tests:
print(f"English: {text}")
translation = translate_unesco_style(text)
print(f"Albanian: {translation}\n")
print("="*60)
Code sample
Expected behavior
Environment
fairseq Version (e.g., 1.0 or main):
PyTorch Version (e.g., 1.0)
OS (e.g., Linux):
How you installed fairseq (pip, source):
Build command you used (if compiling from source):
🐛 Bug
To Reproduce
( they dont work even with osesPunctNormalizer and sentence splitting )
Code sample
Expected behavior
Environment
pip, source):Additional context