Added shallow search for data.table in tables()#7580
Added shallow search for data.table in tables()#7580
Conversation
…o feat/adding_list_search_to_tables
|
Hello, I created a new PR in replacement of #7568 Reasons: There was some git issue there and the merge became too complex and I changed the algo because I didnt know previously that rbind or cbind would cost for re-allocation The current PR considers that part and avoids appends Previous PR : creating seperate data.table called info and rbind at the end |
|
In reply to previous comment of @jangorecki An example of when this new feature could be useful? To support lists which occur due to split.data.table or fread like the following The original code supported data.table() top level and this code adds support for list(data.table) if the arg |
|
Example of the original code and the new feature is as follows
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7580 +/- ##
=======================================
Coverage 99.02% 99.03%
=======================================
Files 87 87
Lines 16896 16934 +38
=======================================
+ Hits 16732 16771 +39
+ Misses 164 163 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Generated via commit e2ae2ce Download link for the artifact containing the test results: ↓ atime-results.zip
|
…o feat/adding_list_search_to_tables
inst/tests/tests.Rraw
Outdated
| # creating env so that the names are within it | ||
| xenv2 = new.env() | ||
| xenv2$DT = data.table(a = 1L) | ||
| xenv2$L = list(data.table(a = 1, b = 4:6), data.table(a = 2, b = 7:10)) |
There was a problem hiding this comment.
this test should also include a further-nested table to demonstrate that depth=1L is honored:
xenv2$LL = list(list(data.table(a=1L, b=4:6)))There, we'd need depth=2L to find the data.table, AIUI.
inst/tests/tests.Rraw
Outdated
|
|
||
| #2606 tables() depth=1 finds nested data.tables in lists | ||
| # creating env so that the names are within it | ||
| xenv2 = new.env() |
There was a problem hiding this comment.
why xenv2? Where is xenv?
Co-authored-by: Michael Chirico <chiricom@google.com>
|
Hello @MichaelChirico , I have added the suggested changes. |
Yes, since we follow the Boy Scout principle here, which is the practice of leaving code slightly better than you found it. |
R/tables.R
Outdated
| name_count = length(w) + total_dt | ||
| # initialize info data.table with total number of data.tables found | ||
| if (name_count==0L) { | ||
| # nocov start. Requires long-running test case |
There was a problem hiding this comment.
I don't follow, why would it be slow?
There was a problem hiding this comment.
due to format(env) it might be slow, I had tried adding test to increase coverage for this line but atime didnt pass then. will try that again.
R/tables.R
Outdated
| # we check if depth=1L is requested and add found tables to w | ||
| if (depth==1L) { | ||
| is_list = vapply_1b(obj, is.list) | ||
| is_df = vapply_1b(obj, is.data.frame) |
There was a problem hiding this comment.
note that inherits(<data.table>, "data.frame") is TRUE (or at least we can assume it to be), so is_dt is redundant
There was a problem hiding this comment.
removed is_df and kept only is_dt to count lists of df as well as the below comment says
R/tables.R
Outdated
| is_df = vapply_1b(obj, is.data.frame) | ||
| is_dt = vapply_1b(obj, is.data.table) | ||
| # list_index is a index of list which is not data.frame or data.table | ||
| list_index = which(is_list & !is_dt & !is_df) |
There was a problem hiding this comment.
anyway, why not also search data.frames?
DF = data.frame(a = 1:2)
DF$b = data.table(c = 3:4, d = 5:6)
str(DF)
# 'data.frame': 2 obs. of 2 variables:
# $ a: int 1 2
# $ b:Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
# ..$ c: int 3 4
# ..$ d: int 5 6
# ..- attr(*, ".internal.selfref")=<pointer: 0x561d6ec08e30>There was a problem hiding this comment.
updated the code, and tested
R/tables.R
Outdated
| k = cnt + length(w) # row number in info data.table | ||
| cnt = cnt + 1L | ||
| set(info, k, "NAME", new_name) | ||
| set(info, k, "NROW", nrow(DT)) |
There was a problem hiding this comment.
let's use a helper to share logic between here & the depth=0 branch
Co-authored-by: Michael Chirico <chiricom@google.com>
…m/Rdatatable/data.table into feat/adding_list_search_to_tables
| xenv$M = list(b=data.table(a=1, b=4:6), a=1:5) | ||
| xenv$N = list(a=1:5) | ||
| test(2366.1, | ||
| tables(env=xenv, depth=1L, index=TRUE)[, .(NAME, NROW, NCOL, INDICES, KEY)], |
There was a problem hiding this comment.
@MichaelChirico this test was failing as indices and key column was missing, can you please check if this is the intended testcase? I have updated it to add index.

Closes #2606
added arg
depth = 1Lto tables() one for shallow searchif
depthis 0 then its the data.tableif
depthis 1, we loop through list-like objects using is.list and which are not data.tableif
depth> 1, we throw erroradded name for the nested list found
parent[[1]]orparent$childpre-allocating info to avoid reallocation cost