[Spark] SparkWidthBucket return_type is Int32, should be Int64 to match Spark

### Describe the bug

`SparkWidthBucket::return_type` returns `Int32`, but Spark's `WidthBucket.dataType` is `LongType`:

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L1825

```rust
// datafusion/spark/src/function/math/width_bucket.rs
fn return_type(&self, _arg_types: &[DataType]) -> Result<DataType> {
    Ok(Int32)
}
```

The `n_bucket` input was aligned to `i64` to match Spark in #20330, but the return type was left as `Int32`. The kernel still builds `Int32Array`.

This produces wrong results in any consumer that plans against Spark's declared output type (`Int64`) but receives an `Int32Array` at runtime: with two rows per batch, the consumer reads 16 bytes of `Int64` from an 8-byte `Int32` buffer, packing two int32 values into a single int64 and reading uninitialized bytes for the rest.

Concretely, for `width_bucket(value, 0.0, 10.0, 5)` over `Range(0, 10)` split into 5 partitions of 2 rows each:

| value | expected (Int64) | observed |
|---|---|---|
| 0 | 1 | 4294967297 (= 0x1_00000001) |
| 1 | 1 | 0 |
| 2 | 2 | 8589934594 (= 0x2_00000002) |
| 3 | 2 | 0 |
| ... | ... | ... |

### To Reproduce

Run any consumer that respects Spark's declared `LongType` for `WidthBucket` against `SparkWidthBucket`. Reproduces in DataFusion Comet on the `width_bucket - with range data` test in `CometMathExpressionSuite` (https://github.com/apache/datafusion-comet/issues/4347).

### Expected behavior

`SparkWidthBucket::return_type` returns `Int64` and the kernel builds `Int64Array`, matching Spark.

### Additional context

Related: #20330 (input parameter alignment).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spark] SparkWidthBucket return_type is Int32, should be Int64 to match Spark #22602

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

value	expected (Int64)	observed
0	1	4294967297 (= 0x1_00000001)
1	1	0
2	2	8589934594 (= 0x2_00000002)
3	2	0
...	...	...

[Spark] SparkWidthBucket return_type is Int32, should be Int64 to match Spark #22602

Description

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions