-
Notifications
You must be signed in to change notification settings - Fork 1k
feat: add ip address backfill cli command #13804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ip address backfill cli command #13804
Conversation
f"Backfilled {batch_size} rows. Sleeping for {sleep_time} second(s)..." | ||
) | ||
time.sleep(sleep_time) | ||
_backfill_ips( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have more than 10,000,000 entries that need backfilled this will cause an error, but we've already committed the session so we won't lose progress, so can just run it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good note - where does the 10m entries limit come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also curious of this.
Signed-off-by: Mike Fiedler <[email protected]>
Signed-off-by: Mike Fiedler <[email protected]>
d7f4f59
to
c082bca
Compare
registry_dict = {} | ||
config = pretend.stub( | ||
registry=pretend.stub( | ||
__getitem__=registry_dict.__getitem__, | ||
__setitem__=registry_dict.__setitem__, | ||
settings={"warehouse.ip_salt": "NaCl"}, | ||
) | ||
) | ||
config.registry["sqlalchemy.engine"] = engine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I struggled with this syntax for pretend.stub() to make an object that is both dict-accessible as well as method-friendly, since we call both config.registry["somekey"]
as well as config.registry.settings["somekey"]
.
If there's a better way to represent this nesting with pretend.stub(), happy to change to that!
Python defaults to a max recursion depth of 1000, and this does batches of 10000. Sent from my iPhoneOn May 30, 2023, at 2:27 PM, Mike Fiedler ***@***.***> wrote:
@miketheman commented on this pull request.
In warehouse/cli/hashing.py:
+ )
+ # Associate the IPAddress object with the Event
+ row.ip_address_obj = ip_addr
+ session.add(ip_addr)
+
+ # Update the rows with any new IPAddress objects
+ session.add_all(no_ip_obj_rows)
+ session.commit()
+
+ # If there are more rows to backfill, recurse until done
+ if continue_until_done and how_many == batch_size:
+ click.echo(
+ f"Backfilled {batch_size} rows. Sleeping for {sleep_time} second(s)..."
+ )
+ time.sleep(sleep_time)
+ _backfill_ips(
Good note - where does the 10m entries limit come from?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because your review was requested.Message ID: ***@***.***>
|
Just as a sanity check, I manually applied the python hashing to some records already populated in the DB via CDN and found the hashes to match! 🎉 |
To support the eventual removal of
*Event.ip_address_string
, create a CLI command that provides the operator the ability to run the necessary queries to associate the relationship between any existing UserEvent and an IpAddress record.Related to #8158
Once merged, relies on an operator to execute:
which should backfill 10,000 events at a time.
Has toggles for
--batch-size
and--sleep-time
, as well--continue-until-done
once we're happy with the batch size and sleep time impact on the system overall.Note: Need to update/modification for other Events:
File.Event
,Project.Event
,Organization.Event
,Team.Event