BigQuery metastore: implement commit_table with commit status verification#3099
BigQuery metastore: implement commit_table with commit status verification#3099jx2lee wants to merge 5 commits intoapache:mainfrom
Conversation
|
@rambleraptor could you take a look at this? |
| ) | ||
| current_metadata_location = parameters.get(METADATA_LOCATION_PROP) | ||
| if current_metadata_location == new_metadata_location: | ||
| return "SUCCESS" |
There was a problem hiding this comment.
nit: can we extract the status to enum?
There was a problem hiding this comment.
Convert to Enum ! BigqueryCommitStatus
| raise commit_error | ||
|
|
||
| if current_table: | ||
| self._delete_old_metadata(updated_staged_table.io, current_table.metadata, updated_staged_table.metadata) |
There was a problem hiding this comment.
_delete_old_metadata is already called in table once the commit table is returned in table class.
https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L1507
There was a problem hiding this comment.
Removed check current_table. Thanks!
| metadata_path=updated_staged_table.metadata_location, | ||
| ) | ||
|
|
||
| commit_error: Exception | None = None |
There was a problem hiding this comment.
Nit: non blocking it's more of an implementation detail but with the java implementation they are preventing orphaned metadata on commit failure. This doesn't affect table state, just maybe a nice to have.
I'll let @rambleraptor chime in here.
| current_metadata = FromInputFile.table_metadata(io.new_input(current_metadata_location)) | ||
|
|
||
| previous_metadata_locations = {log.metadata_file for log in current_metadata.metadata_log} | ||
| previous_metadata_location = parameters.get(PREVIOUS_METADATA_LOCATION_PROP) |
There was a problem hiding this comment.
This guard is for commit contention across concurrent writers. Could I ignore that scenairo now ?
If Writer A commits M1→M2 and, before A verifies the result, Writer B commits M2→M3, A may see the current pointer at M3 and mistakenly think its commit failed (which is why checking previous metadata history is needed).
|
Sorry I haven't taken a look at this! I'll take a look on Monday |
Closes #2893
Rationale for this change
BigQueryMetastoreCatalog.commit_tablewas not implemented.This PR implements
commit_tablefor the BigQuery metastore catalog with:metadata_log)Are these changes tested?
Yes.
uv run pytest -q tests/catalog/test_bigquery_metastore.py -qAre there any user-facing changes?
Maybe no.