Skip to content

Broaden read csv param types #630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 6, 2023
Merged

Broaden read csv param types #630

merged 7 commits into from
Apr 6, 2023

Conversation

EFord36
Copy link
Contributor

@EFord36 EFord36 commented Apr 4, 2023

Submitted as 1 PR for the 2 issues because the relevant code is so close that I thought it might be quicker to review together, plus trying to have both as separate PRs would likely result in merge conflicts if/when one was merged before the other. The commits addressing the two issues are separate though, so it would be easy for me to split into 2 separate PR if this is preferred.

As well as read_csv, modify read_table, read_clipboard, read_excel, which all have the relevant params (I searched for other functions but let me know if I missed any).

Closes #605 in a different way than suggested in that ticket, in order to allow broader types than just lists of string or ints, since list-likes are allowed here. Naming of the type alias UsecolsArgType was inspired by ColspaceArgType, happy to change if desired.

Please let me know if coverage of more different types for names and/or usecols is needed in the tests, or if there are stylistic details you would like me to change. There is a lot of repetition between the tests for the different functions that have these params - I'm happy to work on reducing it, but I thought it might mean restructuring some of the existing tests, so I didn't want to fiddle with it too much until I knew if it (and the actual functional change) was desired.

Passes tests (poetry run poe test_all) locally when the last commit before my changes is 7730abc - the two commits after that (6819e32 and 9c301b3) result in test failures for me locally that seem unrelated to my change, and I get if I just run on main with no changes.

EFord36 added 5 commits April 4, 2023 22:29
Broaden the type hint for the 'names' param of read_csv (and read_table,
which behaves similarly) from previous list[str], so that other valid
types are accepted by mypy.
Noticed as I found clipboard after the changes to read_csv and
read_table, and it calls it, so should match - but it was missing None
as an option.
Match prior change to read_csv, since read_clipboard calls read_csv.
Match prior change to read_csv, read_table, read_clipboard.
This fixes the pycharm tooltip problem in gh-605, as well as allowing
more list-like types of strings (tuples of strings, as well as mutable
sequences of strings other than list), and callables that accept
hashables, not just strings.
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this nice PR and for combining the changes for both issues in the same PR, and the extensive tests.

I would like to add tests that make sure that a plain string is not accepted where it is not supposed to be. Something like this:

        if TYPE_CHECKING_INVALID_USAGE:
            pd.read_excel(path, names="foo")  # type: ignore[call-overload] # pyright: ignore[reportGeneralTypeIssues]
            pd.read_csv(path, usecols="foo")  # type: ignore[arg-type] # pyright: ignore[reportGeneralTypeIssues]

I tried the above, with your PR, and it worked, but you should add tests for the names and usecols argument where str is not accepted, and I think it is just for names for read_excel().

I think that for read_excel() you use call-overload, but for the rest, arg-type should work.

EFord36 added 2 commits April 5, 2023 11:38
Strings aren't valid arguments here (except for read_excel, where we
have a test now to check that this is accepted). Adding tests to make
sure the type hints aren't overly wide and accept string arguments by
mistake.
@EFord36 EFord36 requested a review from Dr-Irv April 5, 2023 10:53
@EFord36
Copy link
Contributor Author

EFord36 commented Apr 5, 2023

Added tests to make sure the plain string isn't accepted (and one for read_excel to check it is accepted where it should be).

Thanks for the guidance, it made adding the tests nice and quick for me!

Running the tests a few times, turns out it needs arg-type for usecols, but call-overload for names.

(I'm speculating maybe because usecols has a specific type alias that is shared between all the overloads, mypy is smart enough there to take into account that the argument type is the same between overloads, and raise arg-type, whereas for usecols, because the still shared type hint is now more complex, it doesn't? I may be missing something obvious, but to me it seems like the failures are for the same reason.)

@EFord36 EFord36 mentioned this pull request Apr 5, 2023
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @EFord36 . Nice work.

@Dr-Irv Dr-Irv merged commit a9dc8ba into pandas-dev:main Apr 6, 2023
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Apr 6, 2023

(I'm speculating maybe because usecols has a specific type alias that is shared between all the overloads, mypy is smart enough there to take into account that the argument type is the same between overloads, and raise arg-type, whereas for usecols, because the still shared type hint is now more complex, it doesn't? I may be missing something obvious, but to me it seems like the failures are for the same reason.)

TBH, I've never figured out exactly why mypy picks one error over another. So we put in the error that was reported to make sure that we get through CI. For these kinds of tests where we are using TYPE_CHECKING_INVALID_USAGE, the goal is to make sure that mypy and pyright flag them as known errors. The specific error reported is less of a concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants