Apache Parquet Java Performance Improvements

These issues/PRs implement a coordinated performance improvement effort for parquet-java encoding and decoding hot paths. The work focuses on reducing CPU overhead, allocation pressure, and avoidable memory copies in commonly used readers and writers, including plain values, binary values, byte-stream split encoding, dictionary encoding, delta byte-array encoding, delta binary packing, RLE/bit-packing decoding, page assembly, and row-group flushing.

Together, the changes preserve existing Parquet format compatibility and public behavior while making the implementation more efficient internally. The improvements use more direct ByteBuffer access, batched read/write operations, reusable buffers and helpers, cached computed values, and earlier release of temporary memory. The goal of the parent issue is to track this broader optimization series as a set of focused, reviewable PRs that each improve one hot path while contributing to better end-to-end read/write performance and lower memory usage.

### Benchmark summary
Benchmarks were run with JMH (`-wi 3 -i 5 -f 1`, 100k values/invocation) on Linux `x86_64`, JDK 25 (`Temurin-25.0.3+9-LTS`). The machine was an Azure VM with 8 vCPUs on an AMD EPYC 9V45 96-Core Processor, 4 cores / 8 threads visible, AVX2 and AVX-512 available, and 31 GiB RAM.
| Area / PR | Benchmark | Baseline | Optimized | Improvement |
|---|---:|---:|---:|---:|
| Plain values reader | `IntEncodingBenchmark.decodePlain` | 428M ops/s | 5,397M ops/s | 12.6x |
| Plain values writer | `IntEncodingBenchmark.encodePlain` | 183M ops/s | 328M ops/s | +80% |
| Binary hashCode cache | `BinaryEncodingBenchmark.encodeDictionary` LOW/1000 | 1.4M ops/s | 146M ops/s | +10,019% |
| Byte-stream split writer | `ByteStreamSplitEncodingBenchmark` Long | 51M ops/s | 423M ops/s | +732% |
| Byte-stream split reader | `ByteStreamSplitDecodingBenchmark` Float | 199M ops/s | 1,017M ops/s | +412% |
| Binary plain reader | `BinaryEncodingBenchmark.decodePlain` LOW/10 | 140M ops/s | 230M ops/s | +64% |
| Dictionary writers | `IntEncodingBenchmark.encodeDictionary` RANDOM | 14.7M ops/s | 23.4M ops/s | +59% |
| Delta byte-array writer | `BinaryEncodingBenchmark.encodeDeltaByteArray` HIGH/10 | 56.8M ops/s | 79.0M ops/s | +39% |
| RLE dictionary-id decode | `IntEncodingBenchmark.decodeDictionary` SEQUENTIAL | 418M ops/s | 539M ops/s | +29% |
| Delta integer decode | `IntEncodingBenchmark.decodeDelta` HIGH_CARDINALITY | 371M ops/s | 506M ops/s | +37% |

Additional changes are primarily allocation or memory improvements rather than direct throughput microbenchmark wins: `IntList.size()` becomes O(1), the batch read API enables more efficient reader implementations, page assembly avoids full-page copies, and row-group flushing releases column buffers earlier to reduce peak memory usage.


This is a parent issue to track the ongoing work on performance improvements for encodings/decodings and other areas of the Java implementation. Since I am not a committer I don't have permission to create sub-issues so I am using this one as the main place to track them.

REVIEWS IN PROGRESS
1. #3493
#3494 
2. #3495 
#3496 
3. #3499 
#3500 
4. #3503 (NOT REVIEWED YET)
#3504 
5. #3505 (NOT REVIEWED YET)
#3506 
6. #3509
#3510
7. #3513 
#3513
8. #3516 
#3517 
9. #3522 
#3523 

Benchmarks PR

10. #3511 
#3512 

WARNING: The associated GH-XXXX is not correct)

11. #3533 (NOT REVIEWED YET)
12. #3534 (NOT REVIEWED YET)
13. #3535 (NOT REVIEWED YET)
14. #3536 (NOT REVIEWED YET)
15. #3537 (NOT REVIEWED YET)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apache Parquet Java Performance Improvements #3530

Benchmark summary

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Area / PR	Benchmark	Baseline	Optimized	Improvement
Plain values reader	`IntEncodingBenchmark.decodePlain`	428M ops/s	5,397M ops/s	12.6x
Plain values writer	`IntEncodingBenchmark.encodePlain`	183M ops/s	328M ops/s	+80%
Binary hashCode cache	`BinaryEncodingBenchmark.encodeDictionary` LOW/1000	1.4M ops/s	146M ops/s	+10,019%
Byte-stream split writer	`ByteStreamSplitEncodingBenchmark` Long	51M ops/s	423M ops/s	+732%
Byte-stream split reader	`ByteStreamSplitDecodingBenchmark` Float	199M ops/s	1,017M ops/s	+412%
Binary plain reader	`BinaryEncodingBenchmark.decodePlain` LOW/10	140M ops/s	230M ops/s	+64%
Dictionary writers	`IntEncodingBenchmark.encodeDictionary` RANDOM	14.7M ops/s	23.4M ops/s	+59%
Delta byte-array writer	`BinaryEncodingBenchmark.encodeDeltaByteArray` HIGH/10	56.8M ops/s	79.0M ops/s	+39%
RLE dictionary-id decode	`IntEncodingBenchmark.decodeDictionary` SEQUENTIAL	418M ops/s	539M ops/s	+29%
Delta integer decode	`IntEncodingBenchmark.decodeDelta` HIGH_CARDINALITY	371M ops/s	506M ops/s	+37%

Apache Parquet Java Performance Improvements #3530

Description

Benchmark summary

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions