-
Notifications
You must be signed in to change notification settings - Fork 119
Fix: Allow \r in unquoted fields when row separator doesn't contain \r #346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
lib/csv/parser.rb
Outdated
no_unquoted_values = "\r\n".encode(@encoding) | ||
# Only exclude characters that are actually part of the row separator | ||
# instead of hardcoding "\r\n" | ||
row_separator_chars = @row_separator.chars.map { |c| Regexp.escape(c) }.join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need Regexp.escape
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need chars.map.join
?
Why can't we use Regexp.escape(@row_separator)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! Using Regexp.escape(@row_separator)?
now. Thanks!
…ed implementation based on review feedback
…b.com/jsxs0/csv into fix-issue-60-accept-cr-without-quotes
lib/csv/parser.rb
Outdated
# Only exclude characters that are actually part of the row separator | ||
# instead of hardcoding "\r\n" | ||
no_unquoted_values = Regexp.escape(@row_separator).encode(@encoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix indent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! Thanks!
test/csv/parse/test_general.rb
Outdated
@@ -139,27 +139,24 @@ def test_non_regex_edge_cases | |||
end | |||
|
|||
def test_malformed_csv_cr_first_line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update test name?
test/csv/parse/test_general.rb
Outdated
# With the fix for accepting \r without quote when row separator doesn't include \r, | ||
# this should now parse successfully when row_sep is "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this comment with suitable test name.
# With the fix for accepting \r without quote when row separator doesn't include \r, | |
# this should now parse successfully when row_sep is "\n" |
test/csv/parse/test_general.rb
Outdated
# With the fix for accepting \r without quote when row separator doesn't include \r, | ||
# this should now parse successfully when row_sep is "\n" | ||
result = CSV.parse_line("1,2\r,3", row_sep: "\n") | ||
assert_equal(["1", "2\r", "3"], result) | ||
end | ||
|
||
def test_malformed_csv_cr_middle_line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
test/csv/parse/test_general.rb
Outdated
# With the fix for accepting \r without quote when row separator doesn't include \r, | ||
# this should now parse successfully (default row_sep is "\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
test/csv/parse/test_invalid.rb
Outdated
# this should now parse successfully (default row_sep is "\n") | ||
result = CSV.parse("\n" + "\r") | ||
# This should parse as an empty first row and a second row with just "\r" | ||
assert_equal([[], ["\r"]], result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case is for invalid cases.
Can we move this to other test case?
test/csv/parse/test_unquoted_cr.rb
Outdated
|
||
def test_reject_cr_when_row_separator_includes_cr | ||
# When row separator includes \r (like \r\n), \r should still be rejected in unquoted fields | ||
data = "field1,field2,field3\r\nrow2,data,here\r\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use invalid data something like field1\r,...
for this case?
test/csv/parse/test_unquoted_cr.rb
Outdated
assert_equal(expected, CSV.parse(data, row_sep: "\r\n")) | ||
end | ||
|
||
def test_reject_cr_when_row_separator_is_cr_only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_reject_cr_when_row_separator_is_cr_only | |
def test_unquoted_cr_with_cr_row_separator |
test/csv/parse/test_unquoted_cr.rb
Outdated
|
||
def test_reject_cr_when_row_separator_is_cr_only | ||
# When row separator is just \r, \r should be rejected in unquoted fields | ||
data = "field1,field2,field3\rrow2,data,here\r" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
test/csv/parse/test_unquoted_cr.rb
Outdated
assert_equal(expected, CSV.parse(data, row_sep: "|", liberal_parsing: true)) | ||
end | ||
|
||
def test_quoted_fields_with_cr_and_custom_row_separator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_quoted_fields_with_cr_and_custom_row_separator | |
def test_quoted_cr_with_custom_row_separator |
test/csv/parse/test_unquoted_cr.rb
Outdated
end | ||
|
||
def test_liberal_parsing_with_custom_row_separator | ||
# Test liberal parsing mode with custom row separator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this comment? I feel that test name is described well. So I feel that this is redundant.
@kou I've made all the changes asked. |
test/csv/parse/test_unquoted_cr.rb
Outdated
def test_unquoted_cr_with_lf_row_separator | ||
data = "field1,field\rwith\rcr,field3\nrow2,data,here\n" | ||
expected = [ | ||
["field1", "field\rwith\rcr", "field3"], | ||
["row2", "data", "here"] | ||
] | ||
assert_equal(expected, CSV.parse(data, row_sep: "\n")) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same (concept) test as the changed test in test_general.rb
?
If so, we don't need this (or the test in test_general.rb
).
test/csv/parse/test_unquoted_cr.rb
Outdated
assert_equal(expected, CSV.parse(data, row_sep: "|", liberal_parsing: true)) | ||
end | ||
|
||
def test_quoted_cr_with_custom_row_separator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not have test_quoted_cr...
in TestUnquortedCR
.
BTW, do we need to create test_unquoted_cr.rb
? Can we move tests in this file to test_general.rb
or something?
Co-authored-by: Sutou Kouhei <[email protected]>
@kou Thank you. I have:
|
test/csv/parse/test_general.rb
Outdated
def test_unquoted_cr_with_crlf_row_separator | ||
data = "field1\r,field2,field3\r\nrow2,data,here\r\n" | ||
assert_raise(CSV::MalformedCSVError) do | ||
CSV.parse(data, row_sep: "\r\n") | ||
end | ||
assert_equal("Unquoted fields do not allow new line <\"\\r\"> in line 1.", | ||
error.message) | ||
end | ||
|
||
def test_malformed_csv_cr_middle_line | ||
csv = <<-CSV | ||
line,1,abc | ||
line,2,"def\nghi" | ||
def test_unquoted_cr_rejected_when_included_in_row_separator | ||
data = "field1,field\r2,field3\r\nrow2,data,here\r\n" | ||
assert_raise(CSV::MalformedCSVError) do | ||
CSV.parse(data, row_sep: "\r\n") | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they the same concept tests?
Can we remove test_unquoted_cr_rejected_when_included_in_row_separator
?
test/csv/parse/test_general.rb
Outdated
def test_liberal_parsing_with_unquoted_cr_and_custom_row_separator | ||
data = "field1,field\rwith\rcr,field3|row2,data,here|" | ||
expected = [ | ||
["field1", "field\rwith\rcr", "field3"], | ||
["row2", "data", "here"] | ||
] | ||
assert_equal(expected, CSV.parse(data, row_sep: "|", liberal_parsing: true)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this to test/csv/parse/test_liberal_parsing
and remove the liberal_parsing_with_
part from the test name?
test/csv/parse/test_general.rb
Outdated
assert_equal(expected, CSV.parse(data, row_sep: "|")) | ||
end | ||
|
||
def test_unquoted_cr_rejected_when_included_in_row_separator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use the same naming rule for parse error case.
def test_unquoted_cr_rejected_when_included_in_row_separator | |
def test_unquoted_cr_with_crlf_row_separator |
test/csv/parse/test_general.rb
Outdated
|
||
def test_unquoted_cr_rejected_when_included_in_row_separator | ||
data = "field1,field\r2,field3\r\nrow2,data,here\r\n" | ||
assert_raise(CSV::MalformedCSVError) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also check the error message like other tests?
assert_raise(CSV::MalformedCSVError) do | |
message = "..." | |
assert_raise(CSV::MalformedCSVError.new(message, 1)) do |
lib/csv/parser.rb
Outdated
# Only exclude characters that are actually part of the row separator | ||
# instead of hardcoding "\r\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we can remove this comment.
I feel that it's useful for commit message (the PR description in this repository) because it describes why we do this change but it may not be useful for readers of new code. (Nobody will not try using "\r\n"
here.)
Fixes #60
This has been bugging me for a while - the CSV parser was rejecting
\r
characters in unquoted fields even when the row separator was something completely different like\n
or a custom separator.For example, this would fail unnecessarily:
The problem was in
prepare_unquoted
where we were hardcoding"\r\n"
instead of checking what the actual row separator was.What changed:
\n
, then\r
is allowed in unquoted fields\r\n
, then both\r
and\n
are still properly excludedTesting:
This makes the parser more flexible while keeping it safe for the cases where
\r
should actually be restricted.