Spec: Update formatting in tables to use material content tabs#14656
Spec: Update formatting in tables to use material content tabs#14656kevinjqliu merged 4 commits intoapache:mainfrom
Conversation
nastra
left a comment
There was a problem hiding this comment.
I actually like it because it makes it easier to spot the fields that are relevant for a given format version
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
|
I agree that tabs is probably the more sustainable as more versions are added to the spec. So I am +1 for this direction. Just to point out one benefit of the old layout. It is easier to see the changes/differences across versions in the old layout. |
Restore four pieces of content that were accidentally removed in the formatting-only tab refactor, as flagged by Steven's review: - column_sizes: restore "Does not include bytes necessary to read other columns, like footers." sentence - partitions: restore "(see below)" cross-reference to field_summary table - partition-spec: restore note that writers use this field but readers use specs from manifest files - properties: restore commit.retry.num-retries example Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Great catch @stevenzwu! I have added back those descriptors, I'm not sure how those specific statements ended up getting dropped |
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM! Verified locally
One nit about putting back the see below
| | _required_ | | | ~~**`105 block_size_in_bytes`**~~ | `long` | **Deprecated. Always write a default in v1. Do not write in v2 or v3.** | | ||
| | _optional_ | | | ~~**`106 file_ordinal`**~~ | `int` | **Deprecated. Do not write.** | | ||
| | _optional_ | | | ~~**`107 sort_columns`**~~ | `list<112: int>` | **Deprecated. Do not write.** | | ||
| | _optional_ | _optional_ | _optional_ | **`108 column_sizes`** | `map<117: int, 118: long>` | Map from column id to the total size on disk of all regions that store the column. **Does not include bytes necessary to read other columns, like footers.** Leave null for row-oriented formats (Avro) | |
There was a problem hiding this comment.
👍 just a formatting change
| | _optional_ | _required_ | _required_ | **`512 added_rows_count`** | `long` | Number of rows in all of files in the manifest that have status `ADDED`, when `null` this is assumed to be non-zero | | ||
| | _optional_ | _required_ | _required_ | **`513 existing_rows_count`** | `long` | Number of rows in all of files in the manifest that have status `EXISTING`, when `null` this is assumed to be non-zero | | ||
| | _optional_ | _required_ | _required_ | **`514 deleted_rows_count`** | `long` | Number of rows in all of files in the manifest that have status `DELETED`, when `null` this is assumed to be non-zero | | ||
| | _optional_ | _optional_ | _optional_ | **`507 partitions`** | `list<508: field_summary>` **(see below)** | A list of field summaries for each partition field in the spec. Each field in the list corresponds to a field in the manifest file’s partition spec. | |
There was a problem hiding this comment.
👍 just a formatting change, bolded (see below)
| | _required_ | | | **`schema`** | The table’s current schema. (**Deprecated**: use `schemas` and `current-schema-id` instead) | | ||
| | _optional_ | _required_ | _required_ | **`schemas`** | A list of schemas, stored as objects with `schema-id`. | | ||
| | _optional_ | _required_ | _required_ | **`current-schema-id`** | ID of the table’s current schema. | | ||
| | _required_ | | | **`partition-spec`** | The table’s current partition spec, stored as only fields. (**Deprecated**: use `partition-specs` and `default-spec-id` instead) Note that this is used by writers to partition data, but is not used when reading because reads use the specs stored in manifest files. | |
There was a problem hiding this comment.
👍 just a formatting change, switch deprecation message to be before the note
Trying to see how material content tabs look like for tables in the spec. Specifically, this is done with the intent to prepare for later spec changes like v4, where tables will either be completely different and/or make the current tables very wide and hard to represent in the site. With separate tabs , a user can just select which format version they want to look at and see the fields. It does make comparison with previous format versions a bit more involved as that's a tab switch but we have to weight that against the difficulty of adding new stuff to these tables as well.
Before:
After:
