Skip to content

Isolate docker interactions into module and simplify test docker containers #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 56 commits into from
Sep 12, 2022

Conversation

ImogenBits
Copy link
Collaborator

Following your request of splitting up the changes in #49 (here) into multiple smaller PRs, this is the first of them where I tried to isolate all the interactions with docker into its own module and simplify the docker containers used for testing.
I also made some very minor modifications to related parts of the code, such as fixing up type annotations, passing the currently fighting teams as an argument to FightHandler.fight() instead of them being part of its state, and adding the flake8 config file.

@Benezivas
Copy link
Collaborator

Thank you very much for taking the time to split up the pull request. These changes look a lot more manageable. I will add replies to this comment to collect more general questions and possible remaining issues that I find while checking the code that cannot be linked to a specific part of a file.

First observations:

  • Do you assume a changed format of the passed problems or is this pull request still compatible with the problem format of version 3.0.2? I get an error when trying to pass the biclique problem as a parameter (see below).
  • If the docker daemon is not running, it seems that this is communicated, but the program seems to try to continue its run. The run should stop instead.

I will continue to review the code in iterations, but I am confident that we can apply most of the suggested changes as-is.

battle ../algobattle-problems/problems/biclique
You can find the log files for this run in /home/henri/.algobattle_logs/2022-06-28_15:41:45.log
Running a benchmark to determine your machines I/O overhead to start and stop docker containers...
Maximal measured runtime overhead is at 1.91 seconds. Adding this amount to the configured runtime.
####################  Running Battle 1/5  ####################
==================== Iterative Battle, Instanze Size Cap: 50000 ====================
=============== Instance Size: 5/50000 ===============
Traceback (most recent call last):
  File "/home/henri/.local/bin/battle", line 164, in <module>
    results = match.run()
  File "/home/henri/.local/lib/python3.10/site-packages/algobattle/match.py", line 43, in run
    self.battle_wrapper.run_round(self.fight_handler, matchup)
  File "/home/henri/.local/lib/python3.10/site-packages/algobattle/battle_wrapper.py", line 35, in wrapper
    return function(self, *args, **kwargs)
  File "/home/henri/.local/lib/python3.10/site-packages/algobattle/battle_wrappers/iterated.py", line 80, in run_round
    approx_ratio = fight_handler.fight(matchup, n)
  File "/home/henri/.local/lib/python3.10/site-packages/algobattle/fight_handler.py", line 40, in fight
    instance, generator_solution = self._run_generator(matchup.generator, instance_size)
  File "/home/henri/.local/lib/python3.10/site-packages/algobattle/fight_handler.py", line 83, in _run_generator
    encoded_output = team.generator.run(str(instance_size), self.timeout_generator, scaled_memory, self.cpus)
AttributeError: 'str' object has no attribute 'run'

@ImogenBits
Copy link
Collaborator Author

The format of the problems remains the same as in the 4.0 version, which afaik is the same as in 3.0.2, the bug you saw was caused by the input paths not being actual Path objects and me incorrectly assuming they were. the commit i just pushed should fix it.

I can't observe the same behaviour regarding docker, it always exits right away for me. I also can't see why the code might behave that way, it raises SystemExit which is not caught anywhere. can you describe the error more?

@ImogenBits ImogenBits force-pushed the easy_changes branch 2 times, most recently from a402a9b to ee5ad7b Compare August 28, 2022 22:22
@ImogenBits
Copy link
Collaborator Author

After way more research and testing than I expected, I implemented the changes we discussed last week. The gist is that the docker module now uses the official docker api instead of shell commands. In addition I also moved the setup.py based build to a pyproject.toml one, this is now much more user friendly, future proof and installs on windows without any issues or additional user actions.

There are two more or less breaking changes here though:

  1. I renamed the cli command from battle to algobattle. I always found it unintuitive that the command is different from the project name and changing this now is a single line change in the pyproject.toml so I took the liberty. If you want to stick with battle I have no issues with changing it back.
  2. Because of limitations in the docker api lib I changed the way the containers interact with the program! Instead of using STDIN and STDOUT, it now creates a file /input in the root directory of the container and reads from /output there. I think this is a great change for the students too, when I was taking the course we basically always had our executed shell file be cat > input; run_program; cat output since interacting with files like that was much easier. All of the example docker containers you've provided also follow this structure, so migrating everything should be very easy.

There also are a couple further things of note:
A docker container that times out now no longer automatically gets treated as having produced no output. Instead we still check the output file and read from it. During the course we often had the problem that we were incrementally generating solutions, but because going even slightly over the timeout would mean a total fail, we had to create our own timeout a couple seconds shorter and kill our program then. With this change the students can now arbitrarily rewrite their output and the battle will always use the last solution generated within the timeout.
Because we now don't use shell commands to interface with docker the timeouts are much more precise since we can start and stop them with more control. On my machine the total overhead used to be roughly 1.5 to 2 seconds, it now is only 0.5-1.5. But more importantly the overhead the timeouts can't account for is only about 0.06 seconds! Because of this I removed the measure_runtime_overhead function and it's associated cli interface, it's just not really needed anymore and the values it measures vastly overestimate the actual time we need to give to the containers in addition to the timeout.
As you can see in the pyproject.toml this project uses a fork of the docker library instead of the official PyPI version. This is because it needs to implements a named pipe socket api to communicate with the docker daemon, but on windows this implementation is pretty broken. I've implemented a working version myself with all the features we need for this project and I'll try to make a PR to get it into the official library but idk how long that'll take.

@Benezivas Benezivas self-assigned this Aug 29, 2022
@Benezivas
Copy link
Collaborator

Thank you for commiting these changes, I will review them this week in detail and comment further here.

A short test shows that the CI pipeline currently seems to fail, could you look into it? (I have not figured out yet why the automated CI tests are currently not launched in this PR)

I do not mind the two breaking changes, they seem reasonable and since we are aiming for the next full release, such changes are to be expected. Please have an eye on not straying too far from the purpose of this specific pull request, I really appreciate your ideas and time to implement them, but it becomes much more complicated to properly review them if one pull request wants to do too many different things beyond its scope.

@ImogenBits
Copy link
Collaborator Author

CI runs here now (reason was that the on field of the github actions yaml specified it to only run in the main and develop branches) and passes without issues

@ImogenBits
Copy link
Collaborator Author

I made #53 and #54 to split out the unrelated changes in here, if you merge them I can then either just rebase this branch to be based on them or also split out the other small and somewhat unrelated changes to the battle script (having it handle signals using the python error handling rather than the signal handler and having the functions definitions outside of the main function) and the teams objects (though that would be kinda messy since they are mainly about changing the way the battle script interacts with the docker containers)

@Benezivas
Copy link
Collaborator

I have merged #53 and #54 into the release candidate. I would suggest not splitting out the changes to the teams object, as this is closely correlated to the docker changes and thus arguably in scope of this PR. If the required effort is reasonable I would advocate for splitting out the changes to the battle script (that are not related to the docker changes) to keep the PR clean.

@ImogenBits
Copy link
Collaborator Author

rebased the branch on the new 4.0 branch, file diffs are now properly only the docker changes.
the commit history is a bit messed up because of that but I tried to preserve it as best I could and the old branch is still at the corresponding tag in my repo.

…xecuting the battle script from the algobattle folder directly
@Benezivas
Copy link
Collaborator

Thank you for taking the time to do these additional changes. I have reviewed the changes and think that they are helpful.

For clarification - since I have not worked with python's docker module so far: How do we ensure that running docker containers are stopped upon early program termination, e.g. when receiving a SIGTERM from the os? Previously, this was done with the sigh module calling the kill_spawned_docker_containers function, which is removed in this PR.
I assume they will be spawned as subprocesses of the algobattle process and killed when the parent process dies?

Once this is cleared up, I am ready to merge these changes.

@ImogenBits
Copy link
Collaborator Author

The python docker lib is basically a very thin wrapper around the docker engine api (which largely mirrors the docker cli arguments but with some oddities such as different defaults and some added/removed higher level commands). In particular here we essentially invoke docker rm -f CONTAINER, which will both kill the container and then remove it.
This code will run when the algobattle script is killed early because the default python signal handler raises a KeyboardInterrupt when that happens, which will be propagated from that loop, through the finally clause until we catch it in the battle module. This does mean that if you circumvent the python error handling process the spawned containers will not be stopped, but afaik the only sensible way for that to happen is if you step through it in debugging and cancel it early.

@Benezivas Benezivas merged commit 69a62b2 into Algorithmic-Battle:4.0.0-rc Sep 12, 2022
@ImogenBits ImogenBits deleted the easy_changes branch September 13, 2022 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants