The DictSet is problematic, particularly when it comes to the simdjson objects and arrays. A Relation is a faster construct and it uses less memory. This is still useful with legacy datasets where PyArrow cannot load them.
The DictSet is problematic, particularly when it comes to the simdjson objects and arrays.
A Relation is a faster construct and it uses less memory.
This is still useful with legacy datasets where PyArrow cannot load them.