Partition by row id
#32886
Replies: 1 comment
-
|
Sorry, I completely misunderstood you. Yes, use dynamic partitions, populate them from another asset or job. Refresh them with a sensor or smth. Associate partition keys with ID ranges, like int(key) + batch_size |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I retrieve parquet files from a SQL database. Typically, I partition the data by month using TimePartitionDefinition and something like :
However, in some cases, the date column is not indexed, and retrieving the data takes a very long time. We want to partition the data based on the 'id' column instead, meaning we retrieve the data in batches of 1000 rows for instance . I would like something like :
Should I use static or dynamic partitioning? The issue is that I don’t know the total number of partitions in advance. The total number of row and then partition will change in time. I would need to run SELECT MAX(id) to determine this.
And I would like to see in dagster UI , the green line of ordered partition where I can select and materialize a specific partition.
What do you suggest ?
Beta Was this translation helpful? Give feedback.
All reactions