Conversation
Merging this PR will improve performance by 11.36%
Performance Changes
Comparing Footnotes
|
1ac487c to
1f71c78
Compare
ScalarValue::Array
1f71c78 to
b1a66d3
Compare
b1a66d3 to
71deb1e
Compare
## Summary Tracking Issue: #6771 Adds a `VortexSession` parameter to the scalar value deserialization functions, and updates all callers to pass a session. We want this in order to support adding an `Array` variant to the `ScalarValue` enum. That is coming in a followup PR (draft is at #6717). This PR just makes it easier to review what is going on. ## API Changes Adds a `VortexSession` parameter in a few places. ## Testing This changes doesn't actually make use of the session yet, so there is no logical changes (yet). --------- Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
ce65460 to
4069a0d
Compare
| let scalar_value = ScalarValue::Array(list); | ||
| Scalar::try_new(array.dtype().clone(), Some(scalar_value)) |
There was a problem hiding this comment.
Is it worth making a constructor for this?
| let array_len: usize = array_value.row_count.try_into().map_err( | ||
| |_| vortex_err!(Serde: "array row_count {} exceeds usize::MAX", array_value.row_count), | ||
| )?; | ||
|
|
||
| // TODO(connor): Do we care about alignment? | ||
| // We don't care about alignment here since ArrayParts will handle it. | ||
| let parts = ArrayParts::try_from(array_value.data.as_slice())?; | ||
|
|
||
| let encoding_ids = array_value | ||
| .encoding_ids | ||
| .iter() | ||
| .map(|id| ArrayId::new_arc(Arc::from(id.clone()))) | ||
| .collect::<Vec<_>>(); | ||
| let ctx = ArrayContext::new(encoding_ids); | ||
|
|
||
| parts.decode(elem_dtype, array_len, &ctx, session) |
There was a problem hiding this comment.
I have no idea if this is correct
| DType::List(..) => self.as_list().is_empty(), | ||
| DType::FixedSizeList(_, list_size, _) => self.as_list().len() == *list_size as usize, |
There was a problem hiding this comment.
Is this strange? (value vs self)
| // TODO(connor): Do we need this? | ||
|
|
||
| impl TryFrom<&[u8]> for ArrayParts { | ||
| type Error = VortexError; | ||
|
|
||
| fn try_from(value: &[u8]) -> Result<Self, Self::Error> { |
There was a problem hiding this comment.
Should we just reimplement this for slices instead of byte buffer? It is not clear to me why alignment is important.
| // Two arrays: compare element-by-element without allocating intermediate Vecs. | ||
| (ScalarValue::Array(a), ScalarValue::Array(b)) => { | ||
| // TODO(connor): Is there a better way to do this? | ||
| a.len() == b.len() | ||
| && a.dtype() == b.dtype() | ||
| && (0..a.len()).all(|i| a.scalar_at(i).ok() == b.scalar_at(i).ok()) | ||
| } | ||
| // Cross-variant comparisons between `Array` and `List`: expand the `Array` side. | ||
| (ScalarValue::Array(_), ScalarValue::List(_)) | ||
| | (ScalarValue::List(_), ScalarValue::Array(_)) => { | ||
| let lhs = self.as_list_expanded().vortex_expect( | ||
| "something went wrong when expanding lhs scalar value for comparison", | ||
| ); | ||
| let rhs = other.as_list_expanded().vortex_expect( | ||
| "something went wrong when expanding rhs scalar value for comparison", | ||
| ); | ||
| lhs == rhs | ||
| } |
There was a problem hiding this comment.
This whole thing feels ugly...
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
4069a0d to
4a4858f
Compare
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb NVMeSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-DS SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb S3Summary
Detailed Results Table
|
Benchmarks: Statistical and Population GeneticsSummary
Detailed Results Table
|
Benchmarks: Random AccessSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on S3Summary
Detailed Results Table
|
Benchmarks: Clickbench on NVMESummary
Detailed Results Table
|
Benchmarks: CompressionSummary
Detailed Results Table
|
gatesn
left a comment
There was a problem hiding this comment.
I think we can convert to ScalarValue::Array during deserialization, then we don't need all the cross-list x Array checks?
Summary
Tracking Issue: #6771
Adds an
Arrayvariant toScalarValuethat holds anArrayRef.This also requires passing a
VortexSessionin a lot more places (this is why we needed to do #6751, #6759, and #6761).Changes (claude summarized)
ListScalar::try_newnow handles bothScalarValue::ListandScalarValue::Arrayvia inline matching, replacing the removedScalarValue::as_list()method.ScalarValue::PartialEq/PartialOrdhandle cross-variant(Array, List)comparisons.(Array, Array)equality compares element-by-element without allocating.Scalar::is_zeroroutes list dtypes throughself.as_list()instead of the rawScalarValue, which now handles both variants.ListScalar::as_array()exposes the underlyingArrayRefwhen array-backed.append_valueinListBuilder,FixedSizeListBuilder, andListViewBuildernow callsappend_array_as_listdirectly when the scalar is array backed, avoiding per-element expansion.ListScalartrait impls:PartialEqandHashavoid materializing aVec<Scalar>by comparing/hashing element-by-element.API Changes
The main change is the addition of the
Arrayvariant toScalarValue. Everything else is dependent on that change.Testing
TODO
The existing tests should probably be sufficient but I will add some more.