Skip to content

basic log streaming #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 52 commits into from
Sep 5, 2024
Merged

basic log streaming #49

merged 52 commits into from
Sep 5, 2024

Conversation

pmcurtin
Copy link
Collaborator

@pmcurtin pmcurtin commented Jul 22, 2024

[draft] #30, #55


📚 Documentation preview 📚: https://torchrunx--49.org.readthedocs.build/en/49/

@pmcurtin
Copy link
Collaborator Author

What this does:

  • The workers stream all logs to the launcher machine, where they're logged under the name torchrunx.agent-{i}.worker-{j}, i being the agent index and j being the worker's local rank.
    • The worker's root logger is replaced by one that has a SocketHandler pointing at the socketserver being run under the log_process of the launcher. Thus, any logging done at all by the worker is eventually sent there...
  • The agents stream their logs (things logged using the logging library) to the same place, logged under torchrunx.agent-{i}. Thus, they're logged in the parent logger of the worker loggers... We could use this to add some debug info to the logs.
  • The launcher logs directly to the torchrunx logger, the parent of the other loggers. Also can be used for debugging info.
  • The torchrunx logger always logs to a log file under the log_dir, named based on the timestamp. Another argument has been added to disable log propagation to the root logger. e.g. if someone starts a root logger in the launcher's caller, torchrunx logs are sent there too by default.

I still need to deal with worker stdout/stderr.

@apoorvkh
Copy link
Owner

Based on today's discussion:

  • we don't want to propagate LogRecords from the torchrunx logger (in launcher process) to the root logger
  • that means our library handles processing all logs on its own
  • our torchrunx logger stores LogRecords in an implicit hierarchy (indicating agent and worker)
  • we can pass in a dictionary (keys corresponding to agent and worker) that establishes handlers (e.g. file and stream handlers) for outputting the logrecords
    • these handlers may specify their own log level
    • if they do not specify the log level, we should "inherit" from the root logger
  • we can make some simple utilities for default hierarchical logging behavior

@pmcurtin

This comment was marked as resolved.

@pmcurtin pmcurtin linked an issue Jul 24, 2024 that may be closed by this pull request
@pmcurtin

This comment was marked as resolved.

@apoorvkh

This comment was marked as resolved.

@apoorvkh

This comment was marked as resolved.

@pmcurtin

This comment was marked as resolved.

@pmcurtin pmcurtin linked an issue Aug 13, 2024 that may be closed by this pull request
@pmcurtin
Copy link
Collaborator Author

Since the logger objects aren't used anymore, we need to attach filter objects to each hander to make sure it only handles the appropriate LogRecords.

@pmcurtin
Copy link
Collaborator Author

pmcurtin commented Sep 5, 2024

@apoorvkh the CI tests are fixed, so I think this branch is good to go?

@apoorvkh apoorvkh merged commit b9110d6 into main Sep 5, 2024
30 checks passed
@apoorvkh apoorvkh deleted the logging branch September 5, 2024 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Abstraction for logging class Switch from "printing" to "logging"
2 participants