-
Notifications
You must be signed in to change notification settings - Fork 902
UPdate release branch? #3206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UPdate release branch? #3206
Conversation
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
- Replaces setup.py with pyproject.toml - Creates .python-version file that will control the python version that uv uses in the virtualenv Signed-off-by: John Strunk <[email protected]>
Signed-off-by: John Strunk <[email protected]>
Signed-off-by: John Strunk <[email protected]>
Update GitHub workflow to include a new job that checks the uv lockfile, ensuring it is up to date by installing uv and running the lock check command. Signed-off-by: John Strunk <[email protected]>
Added post_create_environment job to generate requirements.txt using uv, and updated python install method to use the generated requirements file. Signed-off-by: John Strunk <[email protected]>
Signed-off-by: John Strunk <[email protected]>
Add keyman image and remove conditional login to GitHub Container Registry Signed-off-by: John Strunk <[email protected]>
This adds a new test job to run on macOS. It installs the package dependencies and runs the worker setup script. Due to the interactive nature of `make install`, we can't use that directly. Signed-off-by: John Strunk <[email protected]>
This updates ugboost to fix the build on macos Signed-off-by: John Strunk <[email protected]>
graphql-server-core 1.1.1 is incompatible w/ Python 3.10 due to changes in the standard library. This ensures that a newer version gets installed. Signed-off-by: John Strunk <[email protected]>
Update build rules
We fixed some documentation around how `uv run` gets executed in sequence with `nohup` Signed-off-by: Sean P. Goggins <[email protected]>
Changed `git checkout dev` to `git checkout main` in docs file. Signed-off-by: Sean P. Goggins <[email protected]>
Fixing `nohup` and `uv` order in docs. Signed-off-by: Sean P. Goggins <[email protected]>
Fixing `nohup` sequencing using `uv run` as tested. Signed-off-by: Sean P. Goggins <[email protected]>
Fixed references to the `housekeeper` Signed-off-by: Sean P. Goggins <[email protected]>
`nohup` ordering with `uv run` Signed-off-by: Sean P. Goggins <[email protected]>
fixing `uv run` and `nohup` order in docs. Signed-off-by: Sean P. Goggins <[email protected]>
Try 2 - Switch from venv/pip to uv
Push the keyman container image
name: runner / uv-lock | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Check out repository | ||
uses: actions/checkout@v4 | ||
- name: Install uv | ||
uses: astral-sh/setup-uv@v6 | ||
- name: Ensure uv lockfile is up to date | ||
run: uv lock --check |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
The fix involves adding a permissions
block to the uv-lock
job and assigning the least privileges required for its execution. Since the job steps seem to involve checking out code and running commands locally without modifying repository content, the contents: read
permission is sufficient. This ensures the GITHUB_TOKEN
has minimal access while keeping the workflow functional.
-
Copy modified lines R47-R48
@@ -44,6 +44,8 @@ | ||
uv-lock: | ||
name: runner / uv-lock | ||
runs-on: ubuntu-latest | ||
permissions: | ||
contents: read | ||
steps: | ||
- name: Check out repository | ||
uses: actions/checkout@v4 |
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
…retry Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
determine whether to reset logs via new AUGUR_RESET_LOGS environment variable There are some linting issues revealed by this PR, but they are not a problem for this PR. They should be dealt with separately.
@@ -32,7 +32,9 @@ | |||
|
|||
from keyman.KeyClient import KeyClient, KeyPublisher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
W0611: Unused KeyClient imported from keyman.KeyClient (unused-import)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I'm convinced it's just github weirdness.
We should merge a PR to bump the version number and edit the changelogs and other metadata fixes at the time we want to cut this release, but I'll stop being the squeaky wheel
Signed-off-by: Adrian Edwards <[email protected]>
Removed Files Not Needed
This changes the test for the DB connection components to not exit with failure if AUGUR_DB can't be constructed. Signed-off-by: John Strunk <[email protected]>
Address warnings from linter
Update entrypoint.sh construction of AUGUR_DB from env vars
fix missing docker -> podman in podman CI job
Handle 410 gone when querying github
@@ -124,6 +135,13 @@ def make_request(self, url, method="GET", timeout=100): | |||
if response.status_code == 401: | |||
raise NotAuthorizedException(f"Could not authorize with the github api") | |||
|
|||
if response.status_code == 410: | |||
response_msg = response.json().get("message") | |||
if response_msg is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
R1720: Unnecessary "else" after "raise", remove the "else" and de-indent the code inside it (no-else-raise)
@@ -145,13 +163,21 @@ | |||
return self.__make_request_with_retries(url, method, timeout) | |||
except RetryError as e: | |||
raise e.last_attempt.exception() | |||
|
|||
def _decide_retry_policy(exception: Exception) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
E0213: Method '_decide_retry_policy' should have "self" as first argument (no-self-argument)
Returns: | ||
bool: Boolean describing whether or not the request should be retried | ||
""" | ||
return not isinstance(num, (UrlNotFoundException, ResourceGoneException)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
E0602: Undefined variable 'num' (undefined-variable)
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
Signed-off-by: Adrian Edwards <[email protected]>
use check_output for git log fetching so that it can fail more easily
correct docs links in error messages relating to keys
Signed-off-by: Adrian Edwards <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MoralCode / @JohnStrunk / @Ulincsys : The main
branch is leaking database connections. I have been through this PR to attempt to identify what change could have this side effect. Nothing is jumping out at me .. I made some comments, some of which are surely pointing a flashlight in the wrong corner ... If you have a chance to look this over and identify other possible places where this leak is introduced, THAT WOULD BE AWESOME!
NOTE: I have verified the database connection leak issue does NOT exist on the release
branch.
|
||
@retry(stop=stop_after_attempt(10), wait=wait_fixed(5), retry=retry_if_exception(lambda exc: not isinstance(exc, UrlNotFoundException))) | ||
@retry(stop=stop_after_attempt(10), wait=wait_fixed(5), retry=retry_if_exception(_decide_retry_policy)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may be causing an issue where not found URL's are causing breaking errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in #3235
@@ -19,7 +20,22 @@ | |||
workers = multiprocessing.cpu_count() * 2 + 1 | |||
umask = 0o007 | |||
reload = True | |||
reload_extra_files = glob(str(Path.cwd() / '**/*.j2'), recursive=True) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some issues here ... related to the disposal of database connections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on the issues you are seeing? This is a gunicorn config, so it shouldn't be affecting DB connections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MoralCode : In short, I have my POstgresql instance set to 3,000 connections .. and within one hour of starting Augur, they are all consumed. Which is bad in myriad ways, right? For comparison, the version of Augur in release uses max 200 connections over weeks.
@@ -22,6 +22,17 @@ class UrlNotFoundException(Exception): | |||
class NotAuthorizedException(Exception): | |||
pass | |||
|
|||
class ResourceGoneException(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generating errors in testing. @MoralCode .. basically we are getting errors when these resource exceptions are hit and the job fails. Perhaps that is intended. Just checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this is fixed by #3235 (TL;DR my fault).
If a resource returns 410 Gone from github it is handled similarly to a 404 (i.e. bail out of the retry loop and keep going with collection rather than propagating the exception up to the worker and stopping everything)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right.
def __init__(self) -> None: | ||
# Load channel names and IDs from the spec | ||
for channel in spec["channels"]: | ||
# IE: self.ANNOUNCE = "augur-oauth-announce" | ||
setattr(self, channel["name"], channel["id"]) | ||
self.conn = get_redis_connection() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems possibly wrong. @MoralCode / @Ulincsys ... we have changed the import from as conn
to simply be the library. I don't see where conn
exists as an import any longer. If this fails, it seems it could be the source of the database connection leaks on main
right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds plausible to me. I remember seeing that this keyclient file got refactored, but maybe it needs a deeper look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later on in my review it seemed like the object was getting used and reassigned in the new class made specifically for that ... I am suspicious of Redis in general, perhaps being used in a way that creates a database connection but doesn't let go of it ... or the process loses it ... Like I said, there is no change that appears to directly alter the way we are connecting to the database ... which is everyone's favorite type of bug, right?
@@ -172,7 +175,7 @@ def publish(self, key: str, platform: str): | |||
"key_platform": platform | |||
} | |||
|
|||
conn.publish(self.ANNOUNCE, json.dumps(message)) | |||
self.conn.publish(self.ANNOUNCE, json.dumps(message)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MoralCode / @Ulincsys : Here as well, conn
is referenced but I don't think it exists any longer.
@@ -1,4 +1,4 @@ | |||
from augur.tasks.init.redis_connection import redis_connection as conn | |||
from augur.tasks.init.redis_connection import get_redis_connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is where the as conn
is removed, but is still in use through the rest of the file.
@@ -215,7 +218,7 @@ def wait(self, timeout_seconds = 30, republish = False): | |||
} | |||
|
|||
listen_delta = 0.1 | |||
conn.publish(self.ANNOUNCE, json.dumps(message)) | |||
self.conn.publish(self.ANNOUNCE, json.dumps(message)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no conn
from augur.application.logs import AugurLogger | ||
|
||
logger = AugurLogger("KeyOrchestrator").get_logger() | ||
conn = get_redis_connection() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok .. here conn
exists as a call to a method.
@@ -29,8 +29,10 @@ def __init__(self, platform: str, logger: Logger): | |||
if not platform: | |||
raise ValueError("Platform must not be empty") | |||
|
|||
self.stdout = conn | |||
self.stdin: PubSub = conn.pubsub(ignore_subscribe_messages = True) | |||
self.conn = get_redis_connection() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change here is moving the refernce from the library to this call to the get_redis_connection()
method.
Fix variable name in _decide_retry_policy
Signed-off-by: Ulincsys <[email protected]>
cache redis connection on first retrieval; this fixes our open database connection infinite growth issue! I will talk with @ulincys before putting this to release to make sure I understand *why* this work. I have a conceptual grasp of how Redis is the issue, but less of a grasp of why this particular fix works. Mostly, before we release I want to understand if there are any possible side effects of this patch we need to look for.
Description