-
Notifications
You must be signed in to change notification settings - Fork 74
Add general logging implementation #500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f431306 to
3834c21
Compare
RobotSail
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! A few things need to be fixed, but overall looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main points:
- Consider using logging module for file management (only use a custom Formatter for JSONL).
- Define and enforce the form of the input dicts ("a recursive dict of string values")
We will also need some unit tests to validate the new addition. Overall, this looks like a very good start. Kudos.
src/instructlab/training/logger.py
Outdated
|
|
||
| try: | ||
| # Third Party | ||
| import wandb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be also declared via optional-dependencies group in pyproject.toml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what is best here, so will defer to the team. However, my reasons for not including it are:
- We will potentially have support for 3-4 different logging libraries and would then need an optional dependency for each.
- Each of these dependency groups would be one package each. It isn't necessarily easier or more logical to do
pip install instructlab-training[wandb]than it is to dopip install instructlab-training wandb - It should be relatively clear to the user that they need to install
wandbto use the WandbLogger and if not the error message should clarify that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen packages with dozens of these one-entry dependencies. :) What it gives you is being able to request particular versions of libraries if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way is fine. For PyTorch you still have to install tensorboard separately even though it's technically part of the API. So I don't know if we need a one-off optional requirement just for wandb, but if there are other packages that we need to install to make it work then it could make sense.
src/instructlab/training/logger.py
Outdated
| """Create and initialize a logger of the specified type. | ||
| Args: | ||
| logger_type: Type of logger to create (must be one of ["file", "wandb", "tensorboard", "async"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to construct possible options programmatically to avoid drift. That said, you can probably enforce "one of" semantics with a type hint using enum constructed from allowed options.
| ) | ||
| parser.add_argument("--log_level", type=str, default="INFO") | ||
| parser.add_argument("--run_name", type=str, default=None) | ||
| parser.add_argument("--logger_type", type=str, default="async") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can list allowed choices: https://docs.python.org/3/library/argparse.html#choices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice it's better to avoid this though, I find choices to be super clunky and not as easy to work with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate? As long as choices are calculated (from a enum or dict keys), one doesn't need to touch the argument at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally agree w/ Ihar- if we maintained a mapping of available logger types in instructlab.training.logger we could generate the list of logging backends available dynamically. That's self-documenting and easier for a user to inspect via the --help message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Granted, that we even have argparse as an interface to the library is not optimal and we should probably get rid of it, switching to proper function arguments.)
RobotSail
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making the requested changes and checking functionality of TensorBoard + Wandb. LGTM!
|
One other scenario that may be useful is being able to use multiple log destinations at the same time. This would allow to collect same metrics in multiple formats and use preferred tools for different analysis. This is where integration with python logging module could be also helpful since it allows to define multiple destinations for the same messages through propagate. I'm thinking of enabling all these loggers in CI training runs and collecting all of the outputs as github artifacts. |
Yeah that's something I've been thinking about. It would also be very easy to just implement a "MultiLogger" class that just loops through its nested loggers. I will look into the logging module more and try to see if it would work well with wandb/tensorboard. I do think it is useful to have the "metric logger" separate from the regular run logging, so I would want to make sure it's possible to do that while using the logging module for regular logs. |
|
@fynnsu Yes please do that if you can. I would keep the implementation simple (re-use the existing code you've already written ) and just make it so it does the existing AsyncStructuredLogger as a default and then everything else can just be added on after-the-fact. This way we can still retain log data even when using TensorBoard or anything else. |
777a5eb to
908cb5e
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Fynn Schmitt-Ulms <[email protected]>
| ) | ||
| parser.add_argument("--log_level", type=str, default="INFO") | ||
| parser.add_argument("--run_name", type=str, default=None) | ||
| parser.add_argument("--logger_type", type=str, default="async") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Granted, that we even have argparse as an interface to the library is not optimal and we should probably get rid of it, switching to proper function arguments.)
| ) | ||
| ``` | ||
| """ | ||
| if not loggers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disagree on raising since it's a valid input. What it should probably mean - if doesn't already - is that all previously set loggers should be disabled.
booxter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this is good to go. Some extra care with test env cleanup is advised (restore env vars; clean up loggers), and some usage document as requested by James (and me before, though perhaps I wasn't explicit as to what I ask for). Otherwise I'm ready to merge this in.
Signed-off-by: Fynn Schmitt-Ulms <[email protected]>
Signed-off-by: Fynn Schmitt-Ulms <[email protected]>
Signed-off-by: Fynn Schmitt-Ulms <[email protected]>
Signed-off-by: Fynn Schmitt-Ulms <[email protected]>
5f47d0a to
55a9275
Compare
|
@booxter @JamesKunstle I've added a |
booxter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pulling this off and addressing all the comments (sometimes misleading!) I like how this functionality integrates with stdlib logging approach. We should strive to be pythonic.
| python src/instructlab/training/main_ds.py \ | ||
| ... \ | ||
| --run_name "my_run" \ | ||
| --logger_type "async,tensorboard,wandb" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love how easy it is to do multi-backend, or implement another backend. ❤️
Signed-off-by: Fynn Schmitt-Ulms <[email protected]>
JamesKunstle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, very excited to have this!
Adds a general metric logger format with support for logging to json file, wandb, tensorboard, and the existing async logger.
User can select the
--logger_type:async: uses the existing AsyncStructuredLogger, this is the default option and default file names are the same for backwards compatibility reasonsfile: basic jsonl file logger (essentially same asasyncbut synchronous)tensorboard: Uses PyTorch'storch.utils.tensorboard.SummaryWriterto write logs in tensorboard format.wandb: logs to wandb. Currently untested (I don't have an account and need to set that up first)User can also specify
--run_nameas a string. Instances of{rank},{local_rank}, and{time}in the string will be replaced with their respective value.e.g.
{time}_rank{rank}->2025-04-25T17:26:01.477437_rank0