Skip to content

Conversation

@benkrikler
Copy link
Member

When there are many bins in a binned dataframe producing all of them, even those that aren't actually filled, can consume a lot of memory and a lot more processing time. This new option enables the resulting dataframe to be much more sparse by only filling bins with non-zero. This can be combined with the pad_missing option that existed previously to force these bins to be added back in the merging step.

Note that in the future this option might be automatically enabled / disabled in conjunction with pad_missing since this is really a performance optimisation control that a user often may not want to have to know about.

@codecov
Copy link

codecov bot commented Apr 3, 2020

Codecov Report

Merging #118 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #118      +/-   ##
==========================================
+ Coverage   65.38%   65.40%   +0.02%     
==========================================
  Files          20       20              
  Lines        1456     1457       +1     
==========================================
+ Hits          952      953       +1     
  Misses        504      504              
Impacted Files Coverage Δ
fast_carpenter/summary/binned_dataframe.py 85.71% <100.00%> (+0.08%) ⬆️
fast_carpenter/version.py 100.00% <100.00%> (ø)

@benkrikler benkrikler merged commit 8fa33c0 into master Apr 3, 2020
@benkrikler benkrikler deleted the BK_add_observed_option branch April 3, 2020 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants