Skip to content

Add option to define platforms on the command line #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 23, 2024

Conversation

Pennycook
Copy link
Contributor

@Pennycook Pennycook commented Feb 6, 2024

Similar to #42 , the goal of this PR is to make it simpler for users to interact with Code Base Investigator by providing a command-line option that minimizes YAML interaction.

Instead of this:

codebase:
    files: []
    platforms: [ "CPU", "GPU" ]

CPU:
    commands: "cpu_compile_commands.json"

GPU:
    commands: "gpu_compile_commands.json"
codebasin -c config.yaml

It is now possible to write this:

codebasin -p cpu.json -p gpu.json

Or, if a configuration file is still being used, it is possible to include only a subset of the files in the analysis:

codebasin -c config.yaml -p CPU

Related issues

N/A

Proposed changes

  • Adds --platform (-p) option to specify platform names and/or compilation database paths.
  • Interprets -p name as a request to use a pre-defined platform, effectively acting as a filter.
  • Interprets -p path.json as a request to use an implicitly defined platform, simplifying configuration.
  • Adjusts warnings and errors to help users reason about -p behavior, and to disable config file warnings when absent.

@Pennycook Pennycook added the enhancement New feature or request label Feb 6, 2024
@Pennycook Pennycook added this to the 1.2.0 milestone Feb 6, 2024
@Pennycook Pennycook requested a review from laserkelvin February 6, 2024 17:05
@Pennycook Pennycook changed the title Platform option Add option to define platforms on the command line Feb 6, 2024
@Pennycook
Copy link
Contributor Author

@laserkelvin - What do you think of the new API proposed here? I'm a little worried that it might be hard to understand, because it's not necessarily obvious why -p or --platform expects multiple arguments. I'm also worried that it might be hard to extend, because we won't be able change the number of arguments without breaking existing command lines.

I had a few ideas for alternatives, but none of them leap out as being significantly better...

Context/Order-aware Options

codebasin --platform CPU --commands cpu_compile_commands.json --platform GPU --commands gpu_compile_commands.json
codebasin -p CPU -C cpu_compile_commands.json -p GPU -C gpu_compile_commands.json

Separate Lists of Commands and (Optional?) Platform Names

codebasin --commands cpu_compile_commands.json,gpu_compile_commands.json --platform-names CPU,GPU
codebasin -C cpu_compile_commands.json,gpu_compile_commands.json -p CPU,GPU
codebasin --commands cpu_compile_commands.json gpu_compile_commands.json --platform-names CPU GPU
codebasin -C cpu_compile_commands.json gpu_compile_commands.json -p CPU GPU

Mappings from Platform Names to Commands

codebasin --platform CPU:cpu_compile_commands.json --platform GPU:gpu_compile_commands.json
codebasin -p CPU:cpu_compile_commands.json -p GPU:gpu_compile_commands.json
codebasin --platform CPU=cpu_compile_commands.json --platform GPU=gpu_compile_commands.json
codebasin -p CPU=cpu_compile_commands.json -p GPU=gpu_compile_commands.json

Can you think of any better way to do this?

@laserkelvin
Copy link
Contributor

Honestly this kind of thing is near the boundary of "should we configure something this complicated via the CLI" so i'm not 100% sure, in that this level of composability is probably better suited for a config/input file.

If I had to choose, the mapping is probably closest to what I like, since it's explicit in what it wants albeit more verbose for the user. This post demos providing key/value pairs by creating a custom argparse.Action, and I would think the mode of interaction would then look like this:

codebasin --platforms CPU=cpu_compile_commands.json GPU=gpu_compile_commands.json

So the user will just specify all of the platform/config mappings in one go, which is more strongly hinted by using platforms (plural) as the flag. You can add this as an example to the help string for it as well.

@Pennycook
Copy link
Contributor Author

Pennycook commented Feb 14, 2024

Honestly this kind of thing is near the boundary of "should we configure something this complicated via the CLI" so i'm not 100% sure, in that this level of composability is probably better suited for a config/input file.

I agree. My preference here is actually the multi-command version, because I think it's quite explicit:

cbi define CPU cpu_compile_commands.json
cbi define GPU gpu_compile_commands.json
cbi analyze

Maybe I've been too hasty in suggesting we throw away our analysis configuration files completely? Maybe the issue is that they're just too complicated, because of 1) all the old glob stuff; and 2) YAML? Something like the below would be quite a bit nicer:

[codebase]
source-dir = "."

[platform.cpu]
commands = "cpu_compile_commands.json"
build-dir = "./build/cpu/"

[platform.gpu]
commands = "gpu_compile_commands.json"
build-dir = "./build/gpu/"

...but I can't see an easy way to replace our existing YAML file with a TOML file without breaking lots of stuff. And ideally, I'd still like a way to specify all this stuff without creating a file!

If I had to choose, the mapping is probably closest to what I like, since it's explicit in what it wants albeit more verbose for the user. This post demos providing key/value pairs by creating a custom argparse.Action, and I would think the mode of interaction would then look like this:

codebasin --platforms CPU=cpu_compile_commands.json GPU=gpu_compile_commands.json

So the user will just specify all of the platform/config mappings in one go, which is more strongly hinted by using platforms (plural) as the flag. You can add this as an example to the help string for it as well.

I'll spend some time prototyping this (and keep thinking about this question of flags vs config files).

@Pennycook Pennycook marked this pull request as draft February 14, 2024 16:42
@Pennycook
Copy link
Contributor Author

Apologies in advance for the long post, @laserkelvin...

After our discussion yesterday, I did some playing around with argparse and did some research. It turns out there's a very old bug in argparse that affects its handling of options with an unknown number of values. The GNU documentation says that POSIX convention is to only accept one value per argument (presumably because somebody realized the alternative would be hard to parse in certain corner cases). So, while we could make something like --platforms A=B C=D work, it might make it harder/impossible for us to support certain other options in future in a user-friendly way.

So, I took a step back and tried to think about why I wanted a -p flag in the first place, and what it is that I don't like about the configuration file workflow. I think I've come up with something that will satisfy us both.

Use-Case 1

  • User has a configuration file with many platform definitions in it (e.g., SYCL, CUDA, HIP, OpenMP, Kokkos).
  • User wants to run the analysis for some subset of those platforms.

With the current interface, the only way to do this is to edit the configuration file to change the platform list, or maintain multiple configuration files that differ only in their platform list. I found this really annoying when working on the P3HPC paper.

To address this, I think what we want is a way to filter the platforms that will be used in the analysis. We can do that by accepting only a name, since the rest of the platform configuration is described in the config file already:

# Run the analysis using all platforms from the config file
codebasin -c config.yaml

# Run the analysis for platforms "cpu" and "gpu", which must exist in the config file
codebasin -c config.yaml -p cpu -p gpu

Use-Case 2

  • User wants to invoke codebasin as part of a pipeline that produces compile_commands.json files.
  • User does not want to create an analysis file.

We can build on the version of -p that accepts only one argument. If it's a string, then we treat it as a name to be found in the configuration file. If it's a path to a .json file, then we treat it as an inline platform definition:

# Run the analysis for two implicitly defined platforms
codebasin -p /path/to/cpu_commands.json -p /path/to/gpu_commands.json

# Run the analysis for a mix of pre-defined and implicitly defined platforms
codebasin -c config.yaml -p cpu -p /path/to/gpu_commands.json

We'd need to come up with some rules/conventions for how we name platforms, but that doesn't seem too bad. Even if we just used the filename (or part of it) the user would still have control over how the platform appeared in CBI output.

Use-Case 3

  • Similar to Use-Case 1, user has many different platform definitions.
  • User wants to build up an analysis from modular platform definitions, instead of maintaining one giant file.

This time, we teach -p to accept a standalone platform definition in a yet-to-be-defined TOML format:

# Run the analysis for two explicitly defined platforms from different files
codebasin -p /path/to/platforms/cpu.toml -p /path/to/platforms/gpu.toml

# Do all the things
codebasin -c config.yaml -p cpu -p /path/to/gpu_commands.json -p /path/to/platforms/fpga.toml

In 99% of cases we should be able to distinguish between the different cases from context (e.g., whether things look like paths, the file extension). If a path is ambiguous, we can check to see if it validates against the compilation database or platform schema (and it will only ever match one). If we still can't tell, we can throw an error.

The documentation just needs to say that we accept -p <PLATFORM>, explain the valid PLATFORM values, and say that if no -p flags are supplied then the analysis uses all platforms it knows about.

What do you think? My proposal is that we implement the functionality for use-cases 1) and 2) initially, and leave the functionality associated with use-case 3) for later. We can address it as part of a move from YAML to TOML for the configuration files.

The set of platforms in an analysis can now be specified using:

  --platform <platform>

When combined with a configuration file, this new option can be used to limit
analysis to a subset of platforms, as below:

  -c config.yaml --platform CPU

When the platform specification is a .json file, it is interpreted as a
definition of a new platform:

  --platform /path/to/cpu.json --platform /path/to/gpu.json

Both options are intended to improve user productivity, by minimizing the amount
of time spent editing configuration files when making minor adjustments to an
existing analysis.

Signed-off-by: John Pennycook <[email protected]>
@Pennycook Pennycook marked this pull request as ready for review February 21, 2024 19:44
@Pennycook
Copy link
Contributor Author

@laserkelvin - I think I've implemented what we discussed. Please take a look and let me know what you think.

Note that although it's not really necessary right now, the ".json" requirement is groundwork for a future PR. If we want -p to eventually accept .toml files as well, it's a good idea to require explicit file extensions from the get-go.

Copy link
Contributor

@laserkelvin laserkelvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had two minor things

argparse will always interpret values passed to -p as strings.

Signed-off-by: John Pennycook <[email protected]>
@Pennycook Pennycook merged commit 0ebef0f into intel:main Feb 23, 2024
@Pennycook Pennycook deleted the platform-option branch February 23, 2024 19:46
Pennycook added a commit to Pennycook/code-base-investigator that referenced this pull request Mar 6, 2024
After internal testing and review, we reached the conclusion that using -p to
both define and filter platforms was too confusing and difficult to teach.
Additionally, there are concerns that use of flags to define platforms may not
scale, since real-life use-cases may require more than a list of commands.

This commit is a partial revert of intel#49: the ability to filter platforms
remains, but the ability to define platforms directly via JSON is removed.

Signed-off-by: John Pennycook <[email protected]>
@Pennycook Pennycook mentioned this pull request Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants