Skip to content
This repository was archived by the owner on Feb 22, 2023. It is now read-only.
This repository was archived by the owner on Feb 22, 2023. It is now read-only.

Possible to factor out the null bytemap and/or use more of arrow compute API? #191

@davesque

Description

@davesque

I noticed that fletcher converts the null bitmap into a null bytemap as a step in many computations for arrays that have null values. Do you have any interest in eventually factoring this step out or accepting PRs that do? I think that would involve a fair bit of custom Cython or Numba code that manually iterates over the null bitmap along with the values buffer. But it might be worth doing and could narrow the gap or even overtake Pandas on some of the benchmarks in your benchmarking suite.

Also, I noticed a number of other places where it might be possible to make simple calls to the Arrow compute API. I made a simple modification to the FletcherBaseArray.sum method to just make a direct call to pyarrow.compute.sum. This does make it so that you can't specify any special behavior regarding nulls via skipna. However, it speeds things up by a lot (35-40% faster than Pandas or Fletcher). It makes me wonder if it wouldn't be worth implementing more of Fletcher's internals via Cython and Arrow's compute API.

What are your thoughts on these things?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions