Skip to content

API: Use lowest precision floating type found in reindex with columns #51444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

lithomas1
Copy link
Member

@lithomas1 lithomas1 commented Feb 17, 2023

The problem there was that the .values call here

value_slice = chunk.reindex(columns=level_vals_used).values

upcasts the values, so the float32 will become float64.

I had to modify reindex in order to make this possible, since I don't think you can easily tell whether an all-nan col was there before or was created because of re-indexing.

  • I guess could modify stack, but then I'd be duplicating reindex logic.

expected = DataFrame(
{
"a": np.array([1, 2, 3], dtype="float32"),
"d": np.array([np.nan, np.nan, np.nan], dtype="float32"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a good idea. This might lead to precision issues if you don't expect this.

Is there a good argument for making this float32 (internally speaking)

Copy link
Member Author

@lithomas1 lithomas1 Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't mean to PR so early. This should only apply to all nan cols. We'd use the highest precision float in the frame found if any, otherwise it's float64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I understand that. But if you set values into the all nan columns later on you are stuck with float32

@simonjayhawkins simonjayhawkins added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions labels Feb 22, 2023
@lithomas1
Copy link
Member Author

I think I overcomplicated this.

@lithomas1 lithomas1 closed this Feb 23, 2023
@lithomas1 lithomas1 deleted the fix-reindex-upcasting branch February 23, 2023 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.stack() sometimes changes dtype from float32 to float64
3 participants