Skip to content

Added shallow search for data.table in tables()#7580

Open
manmita wants to merge 33 commits intomasterfrom
feat/adding_list_search_to_tables
Open

Added shallow search for data.table in tables()#7580
manmita wants to merge 33 commits intomasterfrom
feat/adding_list_search_to_tables

Conversation

@manmita
Copy link
Contributor

@manmita manmita commented Jan 9, 2026

Closes #2606

added arg depth = 1L to tables() one for shallow search
if depth is 0 then its the data.table
if depth is 1, we loop through list-like objects using is.list and which are not data.table
if depth > 1, we throw error

added name for the nested list found parent[[1]] or parent$child
pre-allocating info to avoid reallocation cost

@manmita
Copy link
Contributor Author

manmita commented Jan 9, 2026

Hello,

I created a new PR in replacement of #7568

Reasons: There was some git issue there and the merge became too complex and I changed the algo because I didnt know previously that rbind or cbind would cost for re-allocation

The current PR considers that part and avoids appends

Previous PR : creating seperate data.table called info and rbind at the end
This PR: pre-allocates for a total-sized data.table and fills the info

@manmita
Copy link
Contributor Author

manmita commented Jan 9, 2026

In reply to previous comment of @jangorecki

An example of when this new feature could be useful?

To support lists which occur due to split.data.table or fread like the following

list(data.table(a = 1, b = 4:6)),
      data.table(a = 2, b = 7:10))

The original code supported data.table() top level and this code adds support for list(data.table) if the arg shallow_search = TRUE

@manmita
Copy link
Contributor Author

manmita commented Jan 9, 2026

Example of the original code and the new feature is as follows

> A = list(data.table(a = 1, b = 4:6),
      data.table(a = 2, b = 7:10))
> B = list(data.table(a = 1, b = 4:6), 1:5)
> C = data.table(a = 1, b = 4:6)
> tables()
   NAME NROW NCOL MB COLS    KEY
1:    C    3    2  0  a,b [NULL]
Total: 0MB using type_size
> tables(shallow_search = TRUE)
     NAME NROW NCOL MB COLS    KEY
1: A[[1]]    3    2  0  a,b [NULL]
2: A[[2]]    4    2  0  a,b [NULL]
3: B[[1]]    3    2  0  a,b [NULL]
4:      C    3    2  0  a,b [NULL]
Total: 0MB using type_size
> D = list(d = data.table(a = 1, b = 4:6), x = 1:5)
> tables(shallow_search = TRUE)
     NAME NROW NCOL MB COLS    KEY
1: A[[1]]    3    2  0  a,b [NULL]
2: A[[2]]    4    2  0  a,b [NULL]
3: B[[1]]    3    2  0  a,b [NULL]
4:      C    3    2  0  a,b [NULL]
5:    D$d    3    2  0  a,b [NULL]
Total: 0MB using type_size

tables() work same as before and tables(shallow_search = TRUE) searches 1 level

@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.03%. Comparing base (6c6615c) to head (e2ae2ce).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7580   +/-   ##
=======================================
  Coverage   99.02%   99.03%           
=======================================
  Files          87       87           
  Lines       16896    16934   +38     
=======================================
+ Hits        16732    16771   +39     
+ Misses        164      163    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

  • HEAD=feat/adding_list_search_to_tables stopped early for DT[,.SD] improved in #4501
    Comparison Plot

Generated via commit e2ae2ce

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 2 minutes and 48 seconds
Installing different package versions 43 seconds
Running and plotting the test cases 3 minutes and 31 seconds

# creating env so that the names are within it
xenv2 = new.env()
xenv2$DT = data.table(a = 1L)
xenv2$L = list(data.table(a = 1, b = 4:6), data.table(a = 2, b = 7:10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test should also include a further-nested table to demonstrate that depth=1L is honored:

xenv2$LL = list(list(data.table(a=1L, b=4:6)))

There, we'd need depth=2L to find the data.table, AIUI.


#2606 tables() depth=1 finds nested data.tables in lists
# creating env so that the names are within it
xenv2 = new.env()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why xenv2? Where is xenv?

@manmita
Copy link
Contributor Author

manmita commented Feb 27, 2026

Hello @MichaelChirico , I have added the suggested changes.
regarding the coding style for data.table as data.table(a=1L), in my previous (PRs) tests, I have missed this.
Is it alright if I fix them in this PR?

@ben-schwen
Copy link
Member

Hello @MichaelChirico , I have added the suggested changes. regarding the coding style for data.table as data.table(a=1L), in my previous (PRs) tests, I have missed this. Is it alright if I fix them in this PR?

Yes, since we follow the Boy Scout principle here, which is the practice of leaving code slightly better than you found it.

R/tables.R Outdated
name_count = length(w) + total_dt
# initialize info data.table with total number of data.tables found
if (name_count==0L) {
# nocov start. Requires long-running test case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow, why would it be slow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to format(env) it might be slow, I had tried adding test to increase coverage for this line but atime didnt pass then. will try that again.

R/tables.R Outdated
# we check if depth=1L is requested and add found tables to w
if (depth==1L) {
is_list = vapply_1b(obj, is.list)
is_df = vapply_1b(obj, is.data.frame)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that inherits(<data.table>, "data.frame") is TRUE (or at least we can assume it to be), so is_dt is redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed is_df and kept only is_dt to count lists of df as well as the below comment says

R/tables.R Outdated
is_df = vapply_1b(obj, is.data.frame)
is_dt = vapply_1b(obj, is.data.table)
# list_index is a index of list which is not data.frame or data.table
list_index = which(is_list & !is_dt & !is_df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway, why not also search data.frames?

DF = data.frame(a = 1:2)
DF$b = data.table(c = 3:4, d = 5:6)
str(DF)
# 'data.frame':   2 obs. of  2 variables:
#  $ a: int  1 2
#  $ b:Classes ‘data.table’ and 'data.frame':     2 obs. of  2 variables:
#   ..$ c: int  3 4
#   ..$ d: int  5 6
#   ..- attr(*, ".internal.selfref")=<pointer: 0x561d6ec08e30>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the code, and tested

R/tables.R Outdated
k = cnt + length(w) # row number in info data.table
cnt = cnt + 1L
set(info, k, "NAME", new_name)
set(info, k, "NROW", nrow(DT))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use a helper to share logic between here & the depth=0 branch

xenv$M = list(b=data.table(a=1, b=4:6), a=1:5)
xenv$N = list(a=1:5)
test(2366.1,
tables(env=xenv, depth=1L, index=TRUE)[, .(NAME, NROW, NCOL, INDICES, KEY)],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaelChirico this test was failing as indices and key column was missing, can you please check if this is the intended testcase? I have updated it to add index.

@manmita manmita requested a review from MichaelChirico March 6, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tables could look for en-list-ed data.tables as well

3 participants