Skip to content

feat: support automatic per-cell execution history filtering and isolated callbacks#17144

Open
shuoweil wants to merge 25 commits into
mainfrom
shuowei-filter-execution-history
Open

feat: support automatic per-cell execution history filtering and isolated callbacks#17144
shuoweil wants to merge 25 commits into
mainfrom
shuowei-filter-execution-history

Conversation

@shuoweil
Copy link
Copy Markdown
Contributor

@shuoweil shuoweil commented May 14, 2026

This change introduces scoped query tracking and event callback management for BigQuery DataFrames within interactive notebook environments (Jupyter/Colab).

Key Changes

  • Jupyter Cell Scoping: Resolves and carries the active IPython cell execution count (cell_execution_count) through TableWidget, ExecutionSpec, query executors, and final JobMetadata.

  • Execution History Filtering: Adds events, job_ids, and all_cells parameters to session.execution_history(). When all_cells=False, it filters query logs down to the current active notebook cell.

  • Scoped Callback Support: Adds a callback parameter to _read_gbq_colab that automatically subscribes to the query progress publisher during execution and automatically unsubscribes upon completion.

  • Robustness Fixes:

    1. Instantiates expected schema/columns in _ExecutionHistory even when the dataframe is empty to prevent indexing errors.
    2. Converts custom option mappings to native Python dicts when assigning query labels to avoid validation errors in the underlying BigQuery client.
    3. Captures and propagates query_id in BigQueryFinishedEvent.

Verified at:
go/scrcast/NjQzOTAzMTUwMzA2MDk5MnwzZWQ2MTMzYS0xYg
Colab notebook test: screen/7d6Yt3C28BUAKEH

Fixes #<513337964> 🦕

@shuoweil shuoweil self-assigned this May 14, 2026
@shuoweil shuoweil requested review from a team as code owners May 14, 2026 22:39
@shuoweil shuoweil requested review from tswast and removed request for a team and tswast May 14, 2026 22:39
@shuoweil shuoweil marked this pull request as draft May 14, 2026 22:39
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a callback parameter to the _read_gbq_colab function, allowing users to receive query execution events. The changes include updates to the public API, the session implementation, and the addition of a unit test to verify the callback functionality. A review comment suggests refactoring the _read_gbq_colab method to eliminate code duplication in the query execution logic by using a local helper function.

Comment thread packages/bigframes/bigframes/session/__init__.py Outdated
@shuoweil shuoweil force-pushed the shuowei-filter-execution-history branch from 96d6693 to d3bf48c Compare May 14, 2026 22:44
@shuoweil shuoweil changed the title feat: add callback parameter to _read_gbq_colab feat: support universal cell-level execution history filtering May 18, 2026
@shuoweil shuoweil changed the title feat: support universal cell-level execution history filtering feat: support automatic per-cell execution history filtering and isolated callbacks May 18, 2026
@shuoweil shuoweil force-pushed the shuowei-filter-execution-history branch from 22ff695 to b9c9336 Compare May 20, 2026 20:41
@shuoweil shuoweil force-pushed the shuowei-filter-execution-history branch from ec7a56a to 5f9d824 Compare May 20, 2026 21:04
@shuoweil shuoweil added kokoro:force-run Add this label to force Kokoro to re-run the tests. and removed kokoro:force-run Add this label to force Kokoro to re-run the tests. labels May 21, 2026
@shuoweil shuoweil marked this pull request as ready for review May 22, 2026 21:17
class EventEnvelope:
event: Event
progress_bar: ProgressBarType = _DEFAULT
cell_execution_count: Optional[int] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't figure out what does it mean by just reading the params. Add doc strings to explain this and other parameters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a clear, comprehensive docstring to the EventEnvelope class in packages/bigframes/bigframes/core/events.py. It now explicitly documents what the wrapper does and describes each parameter, including how cell_execution_count is captured and used to filter and scope execution history on a per-cell basis.

job_ids: Optional[Iterable[str]] = None,
all_cells: bool = True,
) -> "bigframes.session._ExecutionHistory":
import pandas # noqa: F401
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the unused and redundant import pandas # noqa: F401 statement in packages/bigframes/bigframes/core/global_session.py under the execution_history function.

def _read_gbq_colab( # type: ignore[overload-overlap]
query_or_table: str,
*,
callback: Optional[Callable[[bigframes.core.events.EventEnvelope], None]] = ...,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why default is ... ? Doesn't comply with type hint.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standardized the overloads of _read_gbq_colab in packages/bigframes/bigframes/pandas/io/api.py. The default values in the overloads are now None and False to explicitly match their type annotations (Optional[...] and Literal[False]) and prevent any potential type checker warnings.

@GarrettWu
Copy link
Copy Markdown
Contributor

don't include internal http links in PRs.

@shuoweil shuoweil requested a review from GarrettWu May 26, 2026 21:52

ipy = IPython.get_ipython()
if ipy is not None and hasattr(ipy, "execution_count"):
return ipy.execution_count
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't rely on monkey patching, may be consider some existing settings or properties.

progress_bar:
Specifies the style of progress bar to display during execution.
cell_execution_count:
The 1-indexed execution count of the notebook cell that triggered the event.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it job count? Please be explicit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants