Skip to content

Tune tbxlogger add images#37822

Merged
sven1977 merged 16 commits intoray-project:masterfrom
simonsays1980:tune-tbxlogger-add-images
Feb 20, 2024
Merged

Tune tbxlogger add images#37822
sven1977 merged 16 commits intoray-project:masterfrom
simonsays1980:tune-tbxlogger-add-images

Conversation

@simonsays1980
Copy link
Contributor

@simonsays1980 simonsays1980 commented Jul 26, 2023

Why are these changes needed?

This PR enables users to provide in the result dicitonaries also image arrays that can be presented on TensorBoard like in the following example:

tensorboard

Images can be provided either as singleton in form of an np.ndarray with dimensions (3, H, W) or in form of a 4-d np.ndarray with dimensions (N, 3, H, W) (in this case images get concatenated horizontally).

Related issue number

#21954

As this issue is a P1 Issue that should be fixed within a few weeks rllib RLlib related issues the corresponding solution involves storing the images to the episode.media as this attribute of the episode is not summarized or appended in the metrics.collect_episodes() function.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@simonsays1980
Copy link
Contributor Author

@Yard1 @krfricke My PR fails because of a flakey test. How can I re-run? Should I just mrge the master when it updates?

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@yyyuhan
Copy link

yyyuhan commented Dec 11, 2023

I applied a similar fix with the TensorboardX logger. In my case, the media metrics are being compiled into a list within the summarize_episodes function, as shown here. Consequently, in TBXLoggerCallback, I had to retrieve the last element from this list prior to invoking the add_image method.
Could you recommend a more efficient approach for this situation, similar to how you directly incorporated the image numpy array in your solution? I would appreciate your advice. Thank you.

@JulianLorenz
Copy link

JulianLorenz commented Dec 21, 2023

I also tried to apply a similar modification to TBXLoggerCallback.
I discovered the following issue with image handling:

When the image in episode.media gets processed by the JSONLoggerCallback, significant delays (minutes!) are introduced by the JSON logger. This is because the JSON logger needs to rewrite the log file at each logging call (see #21416) . I worked around this by disabling other Logger Callbacks by setting TUNE_DISABLE_AUTO_CALLBACK_LOGGERS to 1.

Also I think it needs to be decided how to handle the case when images are logged by multiple episodes at one time in summarize_episodes. In my opinion, all images should be kept as it is currently implemented. The user can then decide how to log images across multiple episodes in his algorithm class by adding a on_train_result callback:

        def on_train_result(
                self,
                *,
                algorithm: "Algorithm",
                result: dict,
                **kwargs,
        ) -> None:
            """Called at the end of Algorithm.train().

            Args:
                algorithm: Current Algorithm instance.
                result: Dict of results returned from Algorithm.train() call.
                    You can mutate this object to add additional metrics.
                kwargs: Forward compatibility placeholder.
            """
            if 'trajectory' in result['episode_media'].keys():
                result['episode_media']['myimage'] = result['episode_media']['myimage'][0]

My modification of the TBXLoggerCallback.log_trial_result() function looks like this. I decided to make it more strict with the numpy array, to not automatically interpret any 3D array as image:

            ...
            elif (isinstance(value, list) and len(value) > 0) or (
                    isinstance(value, np.ndarray) and value.size > 0
            ):
                valid_result[full_attr] = value

                # Check for list of images:
                if all(isinstance(v, np.ndarray) and v.ndim == 3 and v.shape[0] in [1, 3] for v in value):
                    if len(value) == 1:
                        # only one image
                        self._trial_writer[trial].add_image(full_attr, value[0], global_step=step)
                    else:
                        # Multiple images - stack them as tensorboard requires
                        imgs = np.stack(value)
                        self._trial_writer[trial].add_images(full_attr, imgs, global_step=step)
                    continue

                # Check for list of videos:
                if all(isinstance(v, np.ndarray) and v.ndim == 5 and v.shape[2] in [1,3] for v in value):
                    video = np.concatenate(value, axis=1)
                    self._trial_writer[trial].add_video(
                        full_attr, video, global_step=step, fps=20)
                    continue

                # Cover either a single video or a single image
                if isinstance(value, np.ndarray) and value.size > 0:
                    # Video - Must have 5 dimensions in NTCHW format:
                    # C must be either 1 for grayscale of 3 for RGB
                    if value.ndim == 5 and value.shape[2] in [1, 3]:
                        self._trial_writer[trial].add_video(
                            full_attr, value, global_step=step, fps=20
                        )
                        continue

                    # Image - Must have 3 dimensions in CHW format.
                    # C must be either 1 for grayscale of 3 for RGB
                    if value.ndim == 3 and value.shape[0] in [1, 3]:
                        self._trial_writer[trial].add_image(
                            full_attr, value, global_step=step
                        )
                        continue

                try:
                ...

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@simonsays1980 simonsays1980 force-pushed the tune-tbxlogger-add-images branch from e10cf2c to 378aac4 Compare January 3, 2024 13:43
@sven1977 sven1977 self-assigned this Jan 4, 2024
)
continue

# Must be multi-image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question: What if this is a single video (t, w, h, c)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the definition of add_video() only 5-dimensional inputs are accepted for this function.

Do we anywhere pass data in a lower dimensional array into this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sven1977 Do you see any cases where this setup of separating arrays by dimension could fall on our feet?

sven1977 and others added 3 commits January 4, 2024 18:40
Signed-off-by: Sven Mika <sven@anyscale.io>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ray into tune-tbxlogger-add-images

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sven1977 sven1977 merged commit 60fc906 into ray-project:master Feb 20, 2024
khluu pushed a commit that referenced this pull request Feb 21, 2024
@simonsays1980 simonsays1980 deleted the tune-tbxlogger-add-images branch February 21, 2024 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants