You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
provinding a simpler example with simple one-to-one match example. In the current example the beauty of the NO-COPY solutioun is a bit obscured by the complicated grouping ("last element selection", "tail", etc) used.
provide an efficient solution for a RIGH JOIN that minimizes copying. At least in my workflow, that is the main use case. I have a main DT, on which I performe several join operations a[DT,on=.(idA)], b[DT,on=.(idB)],.... I assume this is a super common usecase and would be great to discuss the canonical solution to it (even if the answer is that the x[i,on=, j =] syntax does not allow for updating i by reference)
In this SO question, I provided sample code for 1., with a simpler join example, and 2., with my current solution to avoid the expensive copy.
UPDATE1: the solution proposed in SO is super elegant and performant:
x = data.table(id = c(1:5,8), newvar1=c(LETTERS[1:5],'h'),newvar2=c(5:1,-2))
#In practice x would have more vars: newvar2, ..., newvarN
i = data.table(id = 1:7,var1 = c('bla','ble','bli','blo','blu','blA','blS'),var2=7:1,var3=2*(7:1),var4=1)
cols <- setdiff(names(x), 'id')
i[, (cols) := x[.SD, on = "id", .SD, .SDcols = cols]]
This should definetivelly go on the tutorial I wish I had known this ealier. On my application ( i: 220m rows, 40ish columns. x: 130m and 5 cols), the performance gain was massive. This method is 3x faster and uses only 1/5th to 1/6th of the RAM increase relative to i <- x[i,on=.(id)]
[I could not find an issue specific to this vignette, just the general #944]
The text was updated successfully, but these errors were encountered:
@venom1204 Another very helpful SO answer on this is https://stackoverflow.com/a/44592473. I always return to it. It would be great something similar was included in the joins vignette
iagogv3
added
the
joins
Use label:"non-equi joins" for rolling, overlapping, and non-equi joins
label
Mar 26, 2025
hi @iagogv3
Thank you for pointing me to that helpful Stack Overflow answer and your clear examples. I agree these recoding patterns would make the vignette more practical.
I’ll aim to commit these changes within the next day or two and tag you for review. Let me know if you’d like me to emphasize anything else!
Just noticed this after coming from #6997. Nice work on this section, but I think the by=.EACHI example really needs to be ditched altogether as there is a mult-based solution that is 500X faster. by=.EACHI could do with its own section as it is not especially linked to update-by-reference joins (where it isn't by each row of i anyway).
Uh oh!
There was an error while loading. Please reload this page.
The new Joins in data.tablevignette is great, thank you!
I think the updating by reference section could improve by:
a[DT,on=.(idA)], b[DT,on=.(idB)],...
. I assume this is a super common usecase and would be great to discuss the canonical solution to it (even if the answer is that thex[i,on=, j =]
syntax does not allow for updatingi
by reference)In this SO question, I provided sample code for 1., with a simpler join example, and 2., with my current solution to avoid the expensive copy.
UPDATE1: the solution proposed in SO is super elegant and performant:
This should definetivelly go on the tutorial I wish I had known this ealier. On my application ( i: 220m rows, 40ish columns. x: 130m and 5 cols), the performance gain was massive. This method is 3x faster and uses only 1/5th to 1/6th of the RAM increase relative to
i <- x[i,on=.(id)]
[I could not find an issue specific to this vignette, just the general #944]
The text was updated successfully, but these errors were encountered: