Logging Fixes & Enhancements #571

RobotSail · 2025-05-27T04:37:11Z

Addresses issue #569

Enhance training configuration and logging

Added new field log_level to TrainingArgs for improved training configuration.
Updated print_masked_samples to use print statements instead of logger for better visibility during data processing.
Integrated logging level configuration into the training setup, enabling dynamic log level adjustment.
Fixed command construction so log level is propagated down into the training command

- Added new field `log_level` to `TrainingArgs` for improved training configuration. - Updated `print_masked_samples` to use print statements instead of logger for better visibility during data processing. - Integrated logging level configuration into the training setup, enabling dynamic log level adjustment. - Fixed command construction so log level is propagated down into the training command Signed-off-by: Oleg S <[email protected]>

JamesKunstle · 2025-05-27T23:54:52Z

src/instructlab/training/main_ds.py

+    setup_root_logger(train_args.log_level)
+    setup_metric_logger("async", None, train_args.ckpt_output_dir)
+
+    logger = logging.getLogger("instructlab.training")


could this be inferred from:

Suggested change

logger = logging.getLogger("instructlab.training")

logger = logging.getLogger(__name__)

?

Will try this

JamesKunstle · 2025-05-27T23:58:11Z

src/instructlab/training/data_process.py

-                "Pretraining" if unmask else "Instruction" + " ex sample %d: %s",
-                i + 1,
-                text,
+            print(f"\033[35mOriginal Input: {orig_text}\n\033[0m")


I read your issue (#569) for this reversion and I see your point.

A user might not always wish to see these outputs, and may want to silence them by setting the log level higher than the level that this is logged. Maybe an alternative solution would be logger.info rather than logger.debug so that we could still use the logger here?

Having the formatted outputs always logged to the user isn't the standard behavior for other libraries (namely, Axolotl), so the previous behavior, while useful to us during development, might not be desirable.

@JamesKunstle I'm intentionally avoiding using the logger here because it applies formatting which misrepresents the actual content.

A user might not always wish to see these outputs, and may want to silence them by setting the log level higher than the level that this is logged. Maybe an alternative solution would be logger.info rather than logger.debug so that we could still use the logger here?

I would argue against this, for a few reasons:

Most issues during training stem from data, therefore having a preview of what data you send to the model is a crucial step

There is already a lot of noise which users do not care about, so providing an additional preview of their data does little to make that worse. If users don't wish to see a preview of their data, then all the other training logs that are printed today should also be muted. At this point, the program would likely be running somewhere where the user won't care or look at the logs anyway, so there's no reason to omit an otherwise useful step.

JamesKunstle

Generally in favor of all changes- question about switching logger.debug to print.

mergify bot added the ci-failure label May 27, 2025

RobotSail force-pushed the fix-logging branch from 40ee2bf to 61e655b Compare May 27, 2025 04:42

mergify bot added ci-failure and removed ci-failure labels May 27, 2025

JamesKunstle reviewed May 27, 2025

View reviewed changes

aldopareja self-requested a review May 28, 2025 06:26

aldopareja approved these changes May 28, 2025

View reviewed changes

mergify bot added the one-approval label May 28, 2025

JamesKunstle approved these changes May 28, 2025

View reviewed changes

mergify bot merged commit 17a7d19 into instructlab:main May 28, 2025
16 of 18 checks passed

mergify bot added ci-failure and removed one-approval ci-failure labels May 28, 2025

JamesKunstle mentioned this pull request May 28, 2025

uses __name__ in logging.getLogger #573

Merged

fynnsu mentioned this pull request Jun 12, 2025

Logging Changes Broke Data Processing Output #569

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Logging Fixes & Enhancements #571

Logging Fixes & Enhancements #571

Uh oh!

RobotSail commented May 27, 2025 •

edited

Loading

Uh oh!

JamesKunstle May 27, 2025

Uh oh!

RobotSail May 28, 2025

Uh oh!

JamesKunstle May 27, 2025 •

edited

Loading

Uh oh!

RobotSail May 28, 2025

Uh oh!

JamesKunstle left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	logger = logging.getLogger("instructlab.training")
	logger = logging.getLogger(__name__)

Logging Fixes & Enhancements #571

Logging Fixes & Enhancements #571

Uh oh!

Conversation

RobotSail commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamesKunstle May 27, 2025

Choose a reason for hiding this comment

Uh oh!

RobotSail May 28, 2025

Choose a reason for hiding this comment

Uh oh!

JamesKunstle May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RobotSail May 28, 2025

Choose a reason for hiding this comment

Uh oh!

JamesKunstle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RobotSail commented May 27, 2025 •

edited

Loading

JamesKunstle May 27, 2025 •

edited

Loading