Fix: Allow \r in unquoted fields when row separator doesn't contain \r #346

jsxs0 · 2025-06-05T05:31:33Z

Fixes #60

This has been bugging me for a while - the CSV parser was rejecting \r characters in unquoted fields even when the row separator was something completely different like \n or a custom separator.

For example, this would fail unnecessarily:

CSV.parse("field1,field\rwith\rcr,field3\n", row_sep: "\n")

The problem was in prepare_unquoted where we were hardcoding "\r\n" instead of checking what the actual row separator was.

What changed:

Now we only exclude characters that are actually part of the row separator
If your row separator is \n, then \r is allowed in unquoted fields
If your row separator is \r\n, then both \r and \n are still properly excluded
Quoted fields work exactly the same as before

Testing:

Updated the tests that were expecting the old behavior
Added comprehensive tests for different row separator scenarios
All existing tests still pass

This makes the parser more flexible while keeping it safe for the cases where \r should actually be restricted.

Fixes ruby#60

kou · 2025-06-05T07:02:33Z

lib/csv/parser.rb

-      no_unquoted_values = "\r\n".encode(@encoding)
+      # Only exclude characters that are actually part of the row separator
+      # instead of hardcoding "\r\n"
+      row_separator_chars = @row_separator.chars.map { |c| Regexp.escape(c) }.join


Why do we need Regexp.escape here?

Why do we need chars.map.join?
Why can't we use Regexp.escape(@row_separator)?

Fixed! Using Regexp.escape(@row_separator)? now. Thanks!

…ed implementation based on review feedback

…b.com/jsxs0/csv into fix-issue-60-accept-cr-without-quotes

kou · 2025-06-05T07:31:39Z

lib/csv/parser.rb

+              # Only exclude characters that are actually part of the row separator
+        # instead of hardcoding "\r\n"
+        no_unquoted_values = Regexp.escape(@row_separator).encode(@encoding)


Could you fix indent?

Fixed! Thanks!

kou · 2025-06-05T07:35:02Z

test/csv/parse/test_general.rb

@@ -139,27 +139,24 @@ def test_non_regex_edge_cases
  end

  def test_malformed_csv_cr_first_line


Could you update test name?

kou · 2025-06-05T07:35:31Z

test/csv/parse/test_general.rb

+    # With the fix for accepting \r without quote when row separator doesn't include \r,
+    # this should now parse successfully when row_sep is "\n"


We don't need this comment with suitable test name.

Suggested change

# With the fix for accepting \r without quote when row separator doesn't include \r,

# this should now parse successfully when row_sep is "\n"

kou · 2025-06-05T07:35:37Z

test/csv/parse/test_general.rb

+    # With the fix for accepting \r without quote when row separator doesn't include \r,
+    # this should now parse successfully when row_sep is "\n"
+    result = CSV.parse_line("1,2\r,3", row_sep: "\n")
+    assert_equal(["1", "2\r", "3"], result)
  end

  def test_malformed_csv_cr_middle_line


kou · 2025-06-05T07:35:44Z

test/csv/parse/test_general.rb

+    # With the fix for accepting \r without quote when row separator doesn't include \r,
+    # this should now parse successfully (default row_sep is "\n")


kou · 2025-06-05T07:36:35Z

test/csv/parse/test_invalid.rb

+    # this should now parse successfully (default row_sep is "\n")
+    result = CSV.parse("\n" + "\r")
+    # This should parse as an empty first row and a second row with just "\r"
+    assert_equal([[], ["\r"]], result)


This test case is for invalid cases.
Can we move this to other test case?

kou · 2025-06-05T07:42:41Z

test/csv/parse/test_unquoted_cr.rb

+
+  def test_reject_cr_when_row_separator_includes_cr
+    # When row separator includes \r (like \r\n), \r should still be rejected in unquoted fields
+    data = "field1,field2,field3\r\nrow2,data,here\r\n"


Could you use invalid data something like field1\r,... for this case?

kou · 2025-06-05T07:43:06Z

test/csv/parse/test_unquoted_cr.rb

+    assert_equal(expected, CSV.parse(data, row_sep: "\r\n"))
+  end
+
+  def test_reject_cr_when_row_separator_is_cr_only


Suggested change

def test_reject_cr_when_row_separator_is_cr_only

def test_unquoted_cr_with_cr_row_separator

kou · 2025-06-05T07:43:12Z

test/csv/parse/test_unquoted_cr.rb

+
+  def test_reject_cr_when_row_separator_is_cr_only
+    # When row separator is just \r, \r should be rejected in unquoted fields
+    data = "field1,field2,field3\rrow2,data,here\r"


kou · 2025-06-05T07:43:34Z

test/csv/parse/test_unquoted_cr.rb

+    assert_equal(expected, CSV.parse(data, row_sep: "|", liberal_parsing: true))
+  end
+
+  def test_quoted_fields_with_cr_and_custom_row_separator


Suggested change

def test_quoted_fields_with_cr_and_custom_row_separator

def test_quoted_cr_with_custom_row_separator

kou · 2025-06-05T07:44:21Z

test/csv/parse/test_unquoted_cr.rb

+  end
+
+  def test_liberal_parsing_with_custom_row_separator
+    # Test liberal parsing mode with custom row separator


Do we need this comment? I feel that test name is described well. So I feel that this is redundant.

jsxs0 · 2025-06-05T08:23:48Z

@kou I've made all the changes asked.

test/csv/parse/test_invalid.rb

kou · 2025-06-05T08:53:07Z

test/csv/parse/test_unquoted_cr.rb

+  def test_unquoted_cr_with_lf_row_separator
+    data = "field1,field\rwith\rcr,field3\nrow2,data,here\n"
+    expected = [
+      ["field1", "field\rwith\rcr", "field3"],
+      ["row2", "data", "here"]
+    ]
+    assert_equal(expected, CSV.parse(data, row_sep: "\n"))
+  end


Is this the same (concept) test as the changed test in test_general.rb?
If so, we don't need this (or the test in test_general.rb).

kou · 2025-06-05T08:54:36Z

test/csv/parse/test_unquoted_cr.rb

+    assert_equal(expected, CSV.parse(data, row_sep: "|", liberal_parsing: true))
+  end
+
+  def test_quoted_cr_with_custom_row_separator


We should not have test_quoted_cr... in TestUnquortedCR.

BTW, do we need to create test_unquoted_cr.rb? Can we move tests in this file to test_general.rb or something?

Co-authored-by: Sutou Kouhei <[email protected]>

jsxs0 · 2025-06-05T09:08:35Z

@kou Thank you. I have:

Removed the duplicate test
Moved the unique tests to test_general.rb
Fixed the misnamed test
Deleted the separate test file

kou · 2025-06-05T09:33:20Z

test/csv/parse/test_general.rb

+  def test_unquoted_cr_with_crlf_row_separator
+    data = "field1\r,field2,field3\r\nrow2,data,here\r\n"
+    assert_raise(CSV::MalformedCSVError) do
+      CSV.parse(data, row_sep: "\r\n")
    end
-    assert_equal("Unquoted fields do not allow new line <\"\\r\"> in line 1.",
-                 error.message)
  end

-  def test_malformed_csv_cr_middle_line
-    csv = <<-CSV
-line,1,abc
-line,2,"def\nghi"
+  def test_unquoted_cr_rejected_when_included_in_row_separator
+    data = "field1,field\r2,field3\r\nrow2,data,here\r\n"
+    assert_raise(CSV::MalformedCSVError) do
+      CSV.parse(data, row_sep: "\r\n")
+    end
+  end


Are they the same concept tests?
Can we remove test_unquoted_cr_rejected_when_included_in_row_separator?

kou · 2025-06-05T09:34:53Z

test/csv/parse/test_general.rb

+  def test_liberal_parsing_with_unquoted_cr_and_custom_row_separator
+    data = "field1,field\rwith\rcr,field3|row2,data,here|"
+    expected = [
+      ["field1", "field\rwith\rcr", "field3"],
+      ["row2", "data", "here"]
+    ]
+    assert_equal(expected, CSV.parse(data, row_sep: "|", liberal_parsing: true))
+  end


Can we move this to test/csv/parse/test_liberal_parsing and remove the liberal_parsing_with_ part from the test name?

kou · 2025-06-06T01:11:50Z

test/csv/parse/test_general.rb

+    assert_equal(expected, CSV.parse(data, row_sep: "|"))
+  end
+
+  def test_unquoted_cr_rejected_when_included_in_row_separator


We can use the same naming rule for parse error case.

Suggested change

def test_unquoted_cr_rejected_when_included_in_row_separator

def test_unquoted_cr_with_crlf_row_separator

kou · 2025-06-06T01:13:15Z

test/csv/parse/test_general.rb

+
+  def test_unquoted_cr_rejected_when_included_in_row_separator
+    data = "field1,field\r2,field3\r\nrow2,data,here\r\n"
+    assert_raise(CSV::MalformedCSVError) do


Could you also check the error message like other tests?

Suggested change

assert_raise(CSV::MalformedCSVError) do

message = "..."

assert_raise(CSV::MalformedCSVError.new(message, 1)) do

kou · 2025-06-06T01:15:54Z

lib/csv/parser.rb

+      # Only exclude characters that are actually part of the row separator
+      # instead of hardcoding "\r\n"


It seems that we can remove this comment.
I feel that it's useful for commit message (the PR description in this repository) because it describes why we do this change but it may not be useful for readers of new code. (Nobody will not try using "\r\n" here.)

fix: accept \r in unquoted fields when row_sep excludes \r

b3f7932

Fixes ruby#60

kou reviewed Jun 5, 2025

View reviewed changes

jsxs0 added 3 commits June 5, 2025 16:06

fix: accept \r in unquoted fields when row_sep excludes \r - simplifi…

440c545

…ed implementation based on review feedback

Merge branch 'fix-issue-60-accept-cr-without-quotes' of https://githu…

196efe4

…b.com/jsxs0/csv into fix-issue-60-accept-cr-without-quotes

Update .gitignore

5b8f693

jsxs0 requested a review from kou June 5, 2025 07:09

kou reviewed Jun 5, 2025

View reviewed changes

style: fix indentation in prepare_unquoted method

c237450

jsxs0 requested a review from kou June 5, 2025 07:35

kou reviewed Jun 5, 2025

View reviewed changes

reviewers' feedback

dd88061

jsxs0 requested a review from kou June 5, 2025 08:23

jsxs0 added 3 commits June 5, 2025 17:26

refactor(parser): simplify row separator escaping per code review

750531a

Updated test_unquoted_cr_with_crlf_row_separator

f323873

test_unquoted_cr_with_cr_row_separator test: logically problematic

cb1084d

kou reviewed Jun 5, 2025

View reviewed changes

jsxs0 and others added 2 commits June 5, 2025 17:56

Update test/csv/parse/test_invalid.rb

f2a2f8f

Co-authored-by: Sutou Kouhei <[email protected]>

Following better organization principles

313f849

kou reviewed Jun 5, 2025

View reviewed changes

test: consolidate unquoted CR tests, remove duplication

b455a09

kou reviewed Jun 6, 2025

View reviewed changes

Apply maintainer feedback: improve tests and clean up code

9be946f

		@@ -139,27 +139,24 @@ def test_non_regex_edge_cases
		end

		def test_malformed_csv_cr_first_line

		# With the fix for accepting \r without quote when row separator doesn't include \r,
		# this should now parse successfully when row_sep is "\n"

		# With the fix for accepting \r without quote when row separator doesn't include \r,
		# this should now parse successfully (default row_sep is "\n")

	def test_reject_cr_when_row_separator_is_cr_only
	def test_unquoted_cr_with_cr_row_separator

	def test_quoted_fields_with_cr_and_custom_row_separator
	def test_quoted_cr_with_custom_row_separator

	def test_unquoted_cr_rejected_when_included_in_row_separator
	def test_unquoted_cr_with_crlf_row_separator

	assert_raise(CSV::MalformedCSVError) do
	message = "..."
	assert_raise(CSV::MalformedCSVError.new(message, 1)) do

		# Only exclude characters that are actually part of the row separator
		# instead of hardcoding "\r\n"

Fix: Allow \r in unquoted fields when row separator doesn't contain \r #346

Are you sure you want to change the base?

Fix: Allow \r in unquoted fields when row separator doesn't contain \r #346

Uh oh!

Conversation

jsxs0 commented Jun 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsxs0 commented Jun 5, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsxs0 commented Jun 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!