Skip to content

Implement parallel diversity calculation #73

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 3, 2025
Merged

Conversation

Jacob-Chmura
Copy link
Member

Close #71

@Jacob-Chmura Jacob-Chmura self-assigned this Mar 1, 2025
'--num-workers',
type=click.IntRange(min=1, max=32, clamp=True),
help='Number of sub-process workers to spawn for computing diversity validation.',
default=4,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably we should default to os.process_cpu_count()

logging.info('Downloading NLTK resources.')
nltk.download('averaged_perceptron_tagger', quiet=True)
nltk.download('punkt', quiet=True)
required_resources = ['punkt_tab']
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a hard wipe of my cache and found that this is the only hard requirement for the needed nltk resource

'--num-workers',
type=click.IntRange(min=1, max=64, clamp=True),
help='Number of sub-process workers to spawn for computing diversity validation.',
default=os.cpu_count(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, since we are running python3.9 we don't have access to os.process_cpu_count() which could be strictly lower than os.cpu_count() depending on cpu affinity. Hence, this could in theory over-spawn processes, but I don't think it is a big deal.

More info: python/cpython#109649

@Jacob-Chmura Jacob-Chmura merged commit 538ef8d into main Mar 3, 2025
4 checks passed
@Jacob-Chmura Jacob-Chmura deleted the dev/parallel_bleu branch March 3, 2025 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel Bleu Validation
2 participants