Add more Py_AASequence Convenience functions #25

jcharkow · 2025-11-28T20:53:13Z

Add some more Py_AASequence convenience functions

E.g.

seq = Py_AASequence.from_string("PEPTIDE")
seq[1]          # → Py_AASequence("E")
seq[1:4]        # → Py_AASequence("EPT")
seq[-1]         # → Py_AASequence("E")

seq1 = Py_AASequence.from_string("PEP")
seq1 + seq2                    # → Py_AASequence("PEPTIDE")
seq1 + "TIDE"                  # → Py_AASequence("PEPTIDE")
"PEP" + seq1                   # → Py_AASequence("PEPTIDE")

seq * 3         # → Py_AASequence("PEPEPEPEP")
3 * seq         # → Py_AASequence("PEPEPEPEP")

'P' in seq      # → True
'PEP' in seq    # → True

Summary by CodeRabbit

New Features
- AA sequences now behave as immutable Python sequences: indexing (including negative), slicing, concatenation, repetition, ordering and hashing.
- Indexing/iteration yield residue objects with accessible sequence text.
- Substring containment and residue counting supported.
- New to_string API with selectable modified-residue formats (default, unimod, bracket) and input validation.
- Ability to wrap existing native sequence objects.
Tests
- Updated/added tests for iteration, indexing, slicing, to_string formats and error handling.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-28T20:53:21Z

Walkthrough

Py_AASequence was extended into an immutable, Pythonic wrapper: supports construction from a native AASequence, int and slice indexing (including negatives, no slice.step), iteration yielding residue objects, concatenation/repetition, containment, hashing, ordering, counting, and a new to_string(modified, mod_format) API.

Changes

Cohort / File(s)	Summary
Py_AASequence core `openms_python/py_aasequence.py`	Added `from_native(cls, native_sequence: oms.AASequence)`. `__getitem__` now accepts ints and slices (returns residue objects or subsequence wrappers; supports negative indices; disallows non-None slice.step). Iteration yields residue objects. Added `__add__`, `__radd__`, `__mul__`, `__rmul__`, `__contains__`, `__hash__`, `__lt__`, `count`, and `to_string(modified=True, mod_format: Literal['default','unimod','bracket'])`. Updated imports and docstrings; raises on invalid mod_format.
Tests updated `tests/test_py_aasequence.py`	Updated tests to expect residue objects from iteration/indexing (`.sequence` access), removed an out-of-bounds negative-index test, added `to_string` format tests (default, unmodified, bracket, unimod, invalid format), slicing tests, and checks for wrapping a native pyOpenMS AASequence.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Focus review on:
- Iterator/indexing changes (residue objects vs. strings) and caller impact.
- Slice semantics: negative index handling and explicit rejection of slice.step.
- Operator overloads (__add__, __mul__, and their r-variants): type checks and NotImplemented paths.
- __hash__ and ordering (__lt__) consistency with equality semantics.
- to_string behavior across modified and mod_format options and error handling.
- from_native ownership/linkage to the native AASequence and immutability guarantees.

Poem

🐇 I nibble letters in moonlit code,

I hop through slices, neat and stowed,
I join and multiply with cheer,
Brackets and UniMod draw near,
A tiny rabbit, sequencing bold.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding convenience functions/operators to Py_AASequence class (add, mul, contains, hash, etc.).
Docstring Coverage	✅ Passed	Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch aa_seq_manual_edits_josh

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

openms_python/py_aasequence.py (2)
9-38: Clarify “immutable wrapper” wording vs exposed mutable native

The class docstring now advertises an immutable wrapper, but the native property still exposes the underlying oms.AASequence, which can be mutated externally. For callers, this is closer to “functional-style API on top of a mutable core”.

If you want to lean into immutability, consider either:

Returning a defensive copy from native, or

Documenting that native is an escape hatch that can break immutability guarantees.

245-326: New dunder methods look good; a few minor API polish opportunities

The concat/repeat/containment/comparison helpers are nicely Pythonic and keep Py_AASequence immutable by always returning new instances. A few small nits you might consider:
__mul__ / __rmul__:

You currently return NotImplemented for non-ints and for negative times. Given the docstring says “must be >= 0”, raising a ValueError for negative times would surface a clearer error than the generic TypeError produced when both __mul__ and __rmul__ return NotImplemented.
__contains__:
The example and signature are string-based (substring: str), which is great for "PEP" in seq. It might also be useful to accept Py_AASequence so subseq in seq works:
if isinstance(substring, Py_AASequence):
    substring = substring.sequence
count:

Docstring says “to be consistent with str.count()”, but AASequence.count is residue-oriented. If multi-character substrings are not intended, either:

Enforce len(residue) == 1 and document that only single one-letter codes are supported, or

Rename/document it as a residue counter, not a general substring counter.
These are all behavioral nits rather than blockers; the current implementation is otherwise coherent with the PR description.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1707b48 and af516c2.

📒 Files selected for processing (1)

openms_python/py_aasequence.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.14.6)

openms_python/py_aasequence.py

228-228: Avoid specifying long messages outside the exception class

(TRY003)

236-236: Avoid specifying long messages outside the exception class

(TRY003)

openms_python/py_aasequence.py

timosachsenberg · 2025-11-28T21:15:37Z

Nice. One thing to consider: AASequence can be modified. So many of these operations maybe ???? should fail if e.g. a N-term or C-term modification is contained. Reason is that some Modifications have a terminal specificity and concatenation could violate this

timosachsenberg · 2025-11-29T13:36:31Z

Should we just add a check and throw off this rare case for a concat occurs?

jcharkow · 2025-12-01T20:13:15Z

Yes good point, I think yes it should just fail if the resulting sequences is invalid. I'll look into fixing that

jcharkow · 2025-12-01T20:54:46Z

@timosachsenberg thinking more about our conversation earlier today and a reply to the comments above. It looks like the C++ implementation supports both + and - operators. I am not sure how if they handle N terminal and C terminal modifications. From as far as I can tell the tests do not examine modified sequences. If that is the case I think we can keep the concatenation as is?

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

openms_python/py_aasequence.py (1)
221-252: Preserve PTMs for single-index __getitem__ and handle reversed slices safely

The slice branch is now using getSubsequence, which fixes the earlier PTM-loss issue for slices, but two problems remain:
Single-index lookups still drop PTMs (major behavioral bug)
residue = self._sequence.getResidue(index)
residue_char = residue.getOneLetterCode()
return Py_AASequence.from_string(residue_char)
This reconstructs from the one-letter code only, so any residue-level or terminal modification on that position is lost. __iter__ depends on __getitem__, so iteration will also strip PTMs on modified residues.

Prefer using getSubsequence(index, 1) and wrapping the native subsequence to keep all modification info:
-        else:
-            # Handle negative indices
-            if index < 0:
-                index = len(self) + index
-            if index < 0 or index >= len(self):
-                raise IndexError(f"Index {index} out of range for sequence of length {len(self)}")
-            residue = self._sequence.getResidue(index)
-            residue_char = residue.getOneLetterCode()
-            return Py_AASequence.from_string(residue_char)
+        else:
+            # Normalize and bounds-check index.
+            if index < 0:
+                index = len(self) + index
+            if index < 0 or index >= len(self):
+                raise IndexError(f"Index {index} out of range for sequence of length {len(self)}")
+            native_subseq = self._sequence.getSubsequence(index, 1)
+            return Py_AASequence.from_native(native_subseq)
Reversed / empty slices can pass a negative length to getSubsequence
For something like seq[5:2], index.indices(len(self)) yields (5, 2, 1), so stop - start is negative. Passing this as a length to getSubsequence is at odds with normal Python slicing semantics (which should yield an empty sequence) and may trigger undefined or backend-specific behavior.

You can special-case this to return an empty sequence:
-        if isinstance(index, slice):
-            start, stop, step = index.indices(len(self))
-            if step != 1:
-                raise ValueError("Step slicing is not supported for amino acid sequences")
-            return Py_AASequence.from_native(self._sequence.getSubsequence(start, stop - start))
+        if isinstance(index, slice):
+            start, stop, step = index.indices(len(self))
+            if step != 1:
+                raise ValueError("Step slicing is not supported for amino acid sequences")
+            if stop <= start:
+                # Match Python's semantics for empty / reversed slices.
+                return Py_AASequence()
+            return Py_AASequence.from_native(
+                self._sequence.getSubsequence(start, stop - start)
+            )
Optionally, you may also want to normalize indices using the __index__ protocol so that numpy.int64 and similar integer-likes work, but that’s a nice-to-have.

Please double-check that AASequence.getSubsequence in your pyOpenMS version accepts (start, length) with length >= 0 and raises a sensible error for out-of-range inputs. To verify via docs, you can search:
OpenMS AASequence getSubsequence signature and semantics in the pyOpenMS bindings

🧹 Nitpick comments (4)

openms_python/py_aasequence.py (4)
60-72: Polish from_native docstring and parameter description

The implementation is fine, but the docstring is incomplete and has a typo (“opject”), and the native_sequence argument isn’t actually described.

You could tighten this up along these lines:
-    def from_native(cls, native_sequence: oms.AASequence) -> Py_AASequence:
-        """
-        Creates Py_AASequence from native pyOpenMS AASequence.
-        
-        Args:
-            native_sequence (oms.AASequence): 
-        
-        Returns:
-            Py_AASequence: New wrapped opject
-        
-        """
+    def from_native(cls, native_sequence: oms.AASequence) -> Py_AASequence:
+        """
+        Create a Py_AASequence from an existing pyOpenMS AASequence.
+
+        Args:
+            native_sequence: The underlying pyOpenMS AASequence instance to wrap.
+
+        Returns:
+            Py_AASequence: New wrapper around the given native sequence.
+        """
301-323: Consider using __index__ and explicit TypeError for __mul__/__rmul__

The repetition logic is straightforward and fine for core use, but there are a couple of minor API details:
def __mul__(self, times: int) -> Py_AASequence:
    if not isinstance(times, int) or times < 0:
        return NotImplemented
    return Py_AASequence.from_string(self.sequence * times)
Returning NotImplemented for negative times leads to a somewhat indirect TypeError, whereas Python’s built-ins usually allow times < 0 and return an empty sequence.

isinstance(times, int) excludes integer-likes (e.g., numpy.int64) that implement __index__.

If you want more Pythonic behavior, you could:
+    import operator
+
     def __mul__(self, times: int) -> Py_AASequence:
-        if not isinstance(times, int) or times < 0:
-            return NotImplemented
-        return Py_AASequence.from_string(self.sequence * times)
+        try:
+            n = operator.index(times)
+        except TypeError:
+            return NotImplemented
+        if n < 0:
+            # Match str/list semantics; or raise TypeError if you prefer.
+            return Py_AASequence.from_string("")
+        return Py_AASequence.from_string(self.sequence * n)
Same reasoning applies to __rmul__, which delegates to __mul__.

If you want to align exactly with CPython’s sequence semantics, please verify current behavior of "abc" * -1 and "abc" * numpy.int64(2) in your environment, and decide whether you want Py_AASequence to mirror that.

396-417: Unify string rendering and consider shortening error message for mod_format

to_string is a nice central place for string rendering:
def to_string(self, modified=True, mod_format: Optional[Literal['unimod', 'bracket']] = 'unimod') -> str:
    if not modified:
        return self.unmodified_sequence
    else:
        if mod_format == 'unimod':
            return self._sequence.toUniModString()
        elif mod_format == 'bracket':
            return self._sequence.toBracketString()
        else:
            raise ValueError(f"Unsupported mod_format: {mod_format}, supported are 'unimod' and 'bracket'")
Two minor suggestions:
Reuse here in __str__ / sequence for consistency
Right now sequence uses _sequence.toString(), while to_string(modified=True, 'unimod') uses toUniModString(). If you want a single “canonical” representation, consider having __str__ and/or the sequence property delegate to to_string to avoid divergence.
Shorten the ValueError message (TRY003 hint)
Ruff is flagging the long error message. You could keep it shorter and let type hints/docs carry the rest:
   else:
       if mod_format == 'unimod':
           return self._sequence.toUniModString()
       elif mod_format == 'bracket':
           return self._sequence.toBracketString()
       else:
           raise ValueError(f"Unsupported mod_format: {mod_format}, supported are 'unimod' and 'bracket'")
   if mod_format == 'unimod':
       return self._sequence.toUniModString()
   if mod_format == 'bracket':
       return self._sequence.toBracketString()
   raise ValueError(f"Unsupported mod_format: {mod_format}")
Both are nice-to-have cleanups rather than blockers.

258-299: Add tests for concatenation with terminal modifications

Currently, the concatenation operators concatenate sequences as plain strings:
combined_str = self.sequence + other.sequence
return Py_AASequence.from_string(combined_str)
While this works for simple sequences, OpenMS uses a specific string format for terminal modifications (e.g., .(Dimethyl)PEPTIDE. or .PEPTIDE.(Label:18O(2))). Concatenating sequences with terminal modifications may produce malformed strings since the terminal notation (dots) cannot appear in the middle of a sequence string.

Add tests covering:

seq_with_Nterm_mod + "PEP"

"PEP" + seq_with_Cterm_mod

Concatenation of two sequences both carrying terminal mods

Verify and document whether:

The resulting sequence has the expected terminal/internal modifications, or

fromString() appropriately raises an error for invalid sequences

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af516c2 and 1d0d031.

📒 Files selected for processing (1)

openms_python/py_aasequence.py (5 hunks)

🧰 Additional context used

🪛 Ruff (0.14.6)

openms_python/py_aasequence.py

242-242: Avoid specifying long messages outside the exception class

(TRY003)

249-249: Avoid specifying long messages outside the exception class

(TRY003)

417-417: Avoid specifying long messages outside the exception class

(TRY003)

coderabbitai · 2025-12-01T21:01:56Z

openms_python/py_aasequence.py

+    def __contains__(self, substring: str) -> bool:
+        """Check if substring is in sequence."""
+        return self.has_substring(substring)
+
+    def __hash__(self) -> int:
+        """Make sequences hashable for use in sets/dicts."""
+        return hash(self.sequence)
+
+    def __lt__(self, other: Py_AASequence) -> bool:
+        """Lexicographic comparison by sequence."""
+        if not isinstance(other, Py_AASequence):
+            return NotImplemented
+        return self.sequence < other.sequence
+    def count(self, residue: str) -> int:
+        """Count occurrences of a residue, to be consistent with str.count()."""
+        return self._sequence.count(residue)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "py_aasequence.py" -type f

Repository: OpenMS/openms-python

Length of output: 96

🏁 Script executed:

rg -A 10 "def has_substring" --type py

Repository: OpenMS/openms-python

Length of output: 751

🏁 Script executed:

rg "def _sequence|self\._sequence\s*=" openms_python/py_aasequence.py -B 2 -A 2

Repository: OpenMS/openms-python

Length of output: 297

🏁 Script executed:

rg "def count" openms_python/py_aasequence.py -B 2 -A 5

Repository: OpenMS/openms-python

Length of output: 386

🏁 Script executed:

find . -path "*/test*" -name "*.py" -type f | head -10 | xargs rg "__contains__|\.count\(" 2>/dev/null | head -20

Repository: OpenMS/openms-python

Length of output: 46

🌐 Web query:

OpenMS AASequence::count String parameter documentation

💡 Result:

Do you mean AASequence::count (which I can't find in the AASequence API) or the String::count / substring-count behavior used with AASequence (e.g., counting residues/substrings)? Please clarify which exact method/signature you need (or paste the header/signature).

🏁 Script executed:

find . -type f -name "*test*.py" -o -name "test_*.py" | xargs grep -l "AASequence\|count\|__contains__" 2>/dev/null | head -5

Repository: OpenMS/openms-python

Length of output: 228

🏁 Script executed:

grep -r "def count" . --include="*.py" | grep -v ".pyc" | head -20

Repository: OpenMS/openms-python

Length of output: 138

🏁 Script executed:

rg "oms\.AASequence" openms_python/py_aasequence.py -B 1 -A 1 | head -30

Repository: OpenMS/openms-python

Length of output: 848

🏁 Script executed:

cat openms_python/py_aasequence.py | head -50

Repository: OpenMS/openms-python

Length of output: 1684

🏁 Script executed:

cat ./tests/test_py_aasequence.py

Repository: OpenMS/openms-python

Length of output: 7563

🏁 Script executed:

rg "test.*count|test.*contains|__contains__|\.count\(" ./tests/test_py_aasequence.py -B 2 -A 5

Repository: OpenMS/openms-python

Length of output: 46

🏁 Script executed:

rg "def __contains__" openms_python/py_aasequence.py -B 2 -A 5

Repository: OpenMS/openms-python

Length of output: 384

🏁 Script executed:

python3 << 'EOF' import sys try: import pyopenms as oms seq = oms.AASequence.fromString("PEPTIDE") # Check if count method exists if hasattr(seq, 'count'): print("count method exists on oms.AASequence") print(f"count('P') = {seq.count('P')}") print(f"count('E') = {seq.count('E')}") print(f"count('PE') = {seq.count('PE')}") else: print("count method does NOT exist on oms.AASequence") print(f"Available methods: {[m for m in dir(seq) if not m.startswith('_')]}") except Exception as e: print(f"Error: {e}") print(f"Exception type: {type(e).__name__}") EOF

Repository: OpenMS/openms-python

Length of output: 150

🏁 Script executed:

python3 << 'EOF' import sys try: from openms_python.py_aasequence import Py_AASequence seq = Py_AASequence.from_string("PEPTIDE") # Test __contains__ with non-str try: result = 123 in seq print(f"Non-str in sequence returned: {result}") except Exception as e: print(f"Non-str in sequence raised: {type(e).__name__}: {e}") # Test __contains__ with str try: result = "PEP" in seq print(f"Str 'PEP' in sequence returned: {result}") except Exception as e: print(f"Str in sequence raised: {type(e).__name__}: {e}") except Exception as e: print(f"Setup error: {e}") EOF

Repository: OpenMS/openms-python

Length of output: 150

Tighten type expectations for __contains__ and document count() behavior

Two small points here:

Type handling in __contains__

def __contains__(self, substring: str) -> bool: return self.has_substring(substring)

At runtime, __contains__ will be called with whatever the caller uses. Non-str inputs will flow through to AASequence.fromString inside has_substring, raising an unclear error.

A light guard improves robustness:

def __contains__(self, substring: str) -> bool:

"""Check if substring is in sequence."""

return self.has_substring(substring)

"""Check if a string substring is in sequence."""

if not isinstance(substring, str):

return False

return self.has_substring(substring)

2. **Undocumented behavior of `count()`** ```python def count(self, residue: str) -> int: return self._sequence.count(residue)

No tests exist for this method. The docstring claims it mirrors str.count(), but it's unclear whether this:

counts single residues (characters),

counts subsequences,

or handles modifications differently.

Add a docstring clarifying the exact behavior and include tests with simple sequences to demonstrate expected results (e.g., count("P") and count("PE") on "PEPTIDE").

🤖 Prompt for AI Agents

In openms_python/py_aasequence.py around lines 324 to 339, __contains__ currently assumes the argument is a str and forwards it to has_substring causing unclear errors for non-str inputs; add a light guard that checks isinstance(substring, str) and return False (or raise TypeError if you prefer strictness) when it's not a str, then call has_substring for valid strings. Also update the count() docstring to explicitly state it mirrors str.count (i.e., it counts occurrences of the given substring which may be a single residue or a multi-character subsequence) and add unit tests that assert expected results (e.g., "PEPTIDE".count("P") == 2 and .count("PE") == 1) to prevent regressions.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

openms_python/py_aasequence.py (2)
324-339: Type handling in __contains__ and documenting count() behavior

Two points here, both already raised in a previous review and still applicable:

__contains__ currently forwards any object to has_substring, which will eventually hit AASequence.fromString with a non-str and raise a less-than-clear error. A light type guard would make behavior more predictable:
     def __contains__(self, substring: str) -> bool:
-        """Check if substring is in sequence."""
-        return self.has_substring(substring)
+        """Check if a string substring is in sequence."""
+        if not isinstance(substring, str):
+            return False
+        return self.has_substring(substring)
count() simply forwards to _sequence.count(residue) with a brief docstring. It’s not obvious whether this counts single residues, arbitrary substrings, or how modifications are treated. Clarifying the docstring (e.g., “Counts occurrences of the given residue or subsequence, mirroring str.count()”) and adding a couple of small tests would make the behavior much clearer.

221-252: Integer indexing still loses PTMs and mishandles very negative indices

Slices are now correctly implemented via getSubsequence, but the single-index path still:

Builds the result from residue.getOneLetterCode() ⇒ any PTMs on that residue are silently dropped.

Fails to guard indices < -len(self): for index = -len(self) - 1, the adjusted index stays negative and is passed to getResidue, which is undefined/likely erroneous.

This is the same core concern as in the earlier review; slices were fixed, but integer indexing remains inconsistent with the rest of the API and with PTM-preserving behavior.

A more robust implementation would reuse the native subsequence API for both slices and single indices, and perform full bounds checking:
     def __getitem__(self, index):
         """
         Get residue(s) at position(s).

         Supports both single indexing and slicing, returning Py_AASequence objects.
         """
-        if isinstance(index, slice):
-            start, stop, step = index.indices(len(self))
-            if step != 1:
-                raise ValueError("Step slicing is not supported for amino acid sequences")
-            return Py_AASequence.from_native(self._sequence.getSubsequence(start, stop - start))
-        else:
-            # Handle negative indices
-            if index < 0:
-                index = len(self) + index
-            if index >= len(self):
-                raise IndexError(f"Index {index} out of range for sequence of length {len(self)}")
-            residue = self._sequence.getResidue(index)
-            residue_char = residue.getOneLetterCode()
-            return Py_AASequence.from_string(residue_char)
+        if isinstance(index, slice):
+            start, stop, step = index.indices(len(self))
+            if step != 1:
+                raise ValueError("Step slicing is not supported for amino acid sequences")
+            return Py_AASequence.from_native(
+                self._sequence.getSubsequence(start, stop - start)
+            )
+
+        # Normalize integer index (including negative) and check bounds
+        if not isinstance(index, int):
+            raise TypeError(f"Indices must be integers or slices, not {type(index).__name__}")
+        if index < 0:
+            index += len(self)
+        if index < 0 or index >= len(self):
+            raise IndexError(f"Index {index} out of range for sequence of length {len(self)}")
+
+        # Use native subsequence to preserve PTMs for single residues
+        return Py_AASequence.from_native(
+            self._sequence.getSubsequence(index, 1)
+        )
This keeps PTMs intact for integer indexing, aligns with how slicing is now done, and ensures negative indices behave like standard Python sequences.

🧹 Nitpick comments (4)

tests/test_py_aasequence.py (2)

125-135: Indexing tests align with wrapper semantics, but no coverage for modified residues

The updated assertions on seq[i].sequence correctly enforce that element access returns a wrapped residue. Once the integer-index PTM bug in __getitem__ is fixed, consider adding a case like aa_seq[-2].sequence == "M(Oxidation)" on a modified sequence to guard against regressions.

285-291: Slicing tests validate modification preservation; consider single-index coverage later

The new slicing cases, especially aa_seq[-2:].sequence == 'M(Oxidation)R', correctly verify that subsequence operations preserve PTMs. After fixing integer indexing in __getitem__, it would be worth adding a complementary test for aa_seq[-2] on the same modified sequence.

openms_python/py_aasequence.py (2)

60-72: Minor docstring polish for from_native

from_native is a useful public helper; the docstring could be tightened (e.g., describe native_sequence instead of leaving it blank and fix “opject” → “object”), but behavior itself looks fine.

258-323: String-based concatenation/repetition and terminal modifications

__add__, __radd__, __mul__, and __rmul__ build new sequences by string concatenation/repetition followed by from_string(). That’s simple and works for unmodified sequences, but for N-/C-terminal–specific modifications this may:

Rely entirely on AASequence.fromString to detect invalid combinations, or

Silently reinterpret or misplace terminal-specific modifications after concatenation/repetition.

Given the earlier discussion about terminal modifications and invalid concatenations, it would be good to make the intended behavior explicit:

Decide whether these operators should:

reject combinations that violate terminal-mod rules (preferred, via checks after construction), or

fully delegate to the native parsing semantics and document that behavior.

Add a couple of tests covering concatenation/repetition of modified sequences to pin down and guard the chosen semantics.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d0d031 and 70eb83d.

📒 Files selected for processing (2)

openms_python/py_aasequence.py (5 hunks)
tests/test_py_aasequence.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.14.6)

openms_python/py_aasequence.py

242-242: Avoid specifying long messages outside the exception class

(TRY003)

249-249: Avoid specifying long messages outside the exception class

(TRY003)

427-427: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)

tests/test_py_aasequence.py (2)

116-123: Iteration behavior test matches new API

Using res.sequence in test_py_aasequence_iteration correctly reflects that __iter__ now yields sequence-wrapped residues rather than raw strings; this anchors the new contract well.

260-283: to_string tests provide solid coverage across formats

These checks exercise the default, unmodified, bracket, UniMod, and invalid-format paths of to_string, which is exactly what’s needed to pin down the new API behavior.

openms_python/py_aasequence.py

timosachsenberg · 2025-12-01T21:58:41Z

@timosachsenberg thinking more about our conversation earlier today and a reply to the comments above. It looks like the C++ implementation supports both + and - operators. I am not sure how if they handle N terminal and C terminal modifications. From as far as I can tell the tests do not examine modified sequences. If that is the case I think we can keep the concatenation as is?

I think it is also wrong in the c++ implementation in that case. But we could handle the error there (c++) and keep this simpler.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

openms_python/py_aasequence.py (1)
324-326: Add type guard in __contains__ for non‑string inputs.

As noted in a previous review, __contains__ will be called with whatever the user provides. Non‑str inputs flow through to AASequence.fromString inside has_substring, raising an unclear error.

Apply this diff to improve robustness:
     def __contains__(self, substring: str) -> bool:
-        """Check if substring is in sequence."""
+        """Check if a string substring is in sequence."""
+        if not isinstance(substring, str):
+            return False
         return self.has_substring(substring)
Based on past review feedback.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70eb83d and d9d6c47.

📒 Files selected for processing (1)

openms_python/py_aasequence.py (5 hunks)

🧰 Additional context used

🪛 Ruff (0.14.7)

openms_python/py_aasequence.py

242-242: Avoid specifying long messages outside the exception class

(TRY003)

249-249: Avoid specifying long messages outside the exception class

(TRY003)

427-427: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)

openms_python/py_aasequence.py (2)
396-427: LGTM! Optional: centralize supported‑formats list.

The previous review concerns (mismatched default, incomplete error message, unnecessary Optional) have been addressed. The signature, docstring, and error message now correctly reflect 'default' as the default and list all three supported formats.

To address the Ruff TRY003 hint and improve maintainability, consider centralizing the supported formats:
     def to_string(self, modified=True, mod_format: Literal['default', 'unimod', 'bracket'] = 'default') -> str:
         """
         Get string representation of the sequence.
 
         Args:
             modified (bool): Whether to include modifications in the string.
             mod_format (Optional[Literal['default', 'unimod', 'bracket']]): Format for modifications.
                 'default' for OpenMS format,
                 'unimod' for UniMod format,
                 'bracket' for bracket notation.
                  Default is 'default'.                
 
         Returns:
             str: Amino acid sequence as string.
 
         Example:
             >>> seq = Py_AASequence.from_string("PEPTIDE")
             >>> seq_str = seq.to_string()
         """
         if not modified:
             return self.unmodified_sequence
         
-        else:
-            if mod_format == 'default':
-                return self._sequence.toString()
-            elif mod_format == 'unimod':
-                return self._sequence.toUniModString()
-            elif mod_format == 'bracket':
-                return self._sequence.toBracketString()
-            else:
-                raise ValueError(f"Unsupported mod_format: {mod_format}, supported are 'default', 'unimod' and 'bracket'")
+        if mod_format == 'default':
+            return self._sequence.toString()
+        elif mod_format == 'unimod':
+            return self._sequence.toUniModString()
+        elif mod_format == 'bracket':
+            return self._sequence.toBracketString()
+        
+        supported = ('default', 'unimod', 'bracket')
+        raise ValueError(
+            f"Unsupported mod_format: {mod_format!r}; supported are {supported}"
+        )
258-299: Add test coverage or document limitations for concatenation with terminal-modified sequences.

The __add__ and __radd__ implementations concatenate sequences via string representation and re-parse:
combined_str = self.sequence + other.sequence
return Py_AASequence.from_string(combined_str)
Terminal modifications are supported (as shown by has_n_terminal_modification and has_c_terminal_modification properties), but concatenating sequences with terminal modifications may produce invalid results. For example, concatenating a sequence with a C-terminal modification to one with an N-terminal modification could create malformed strings during re-parsing.

Either add test cases for modified sequence concatenation to verify behavior, or document this as a known limitation if not supported by the underlying C++ implementation.

coderabbitai · 2025-12-01T22:12:05Z

openms_python/py_aasequence.py

+    @classmethod
+    def from_native(cls, native_sequence: oms.AASequence) -> Py_AASequence:
+        """
+        Creates Py_AASequence from native pyOpenMS AASequence.
+
+        Args:
+            native_sequence (oms.AASequence): 
+
+        Returns:
+            Py_AASequence: New wrapped opject
+
+        """
+        return cls(native_sequence)


⚠️ Potential issue | 🟡 Minor

Fix typos and complete docstring in from_native.

The docstring has two issues:

Line 66: Args description is incomplete—no text follows native_sequence (oms.AASequence):.

Line 69: Typo "opject" should be "object".

Apply this diff:

@classmethod def from_native(cls, native_sequence: oms.AASequence) -> Py_AASequence: """ Creates Py_AASequence from native pyOpenMS AASequence. Args: - native_sequence (oms.AASequence): + native_sequence (oms.AASequence): Native pyOpenMS AASequence object to wrap. Returns: - Py_AASequence: New wrapped opject + Py_AASequence: New wrapped object. """ return cls(native_sequence)

🤖 Prompt for AI Agents

In openms_python/py_aasequence.py around lines 60 to 72, the from_native docstring is incomplete and has a typo: fill in the Args description for native_sequence to explain that it is a pyOpenMS AASequence to be wrapped (e.g., "native_sequence (oms.AASequence): the native pyOpenMS amino acid sequence to wrap"), correct "opject" to "object" in the Returns section, and optionally tighten wording to "New wrapped object" or "Wrapped Py_AASequence instance" so the docstring reads clearly and completely.

openms_python/py_aasequence.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

openms_python/py_aasequence.py (2)
61-73: Fix typos and complete docstring in from_native.

The docstring has two issues flagged in previous review:

Line 67: Args description is incomplete—no text follows native_sequence (oms.AASequence):.

Line 70: Typo "opject" should be "object".

Apply this diff:
-            native_sequence (oms.AASequence): 
+            native_sequence (oms.AASequence): Native pyOpenMS AASequence object to wrap.
 
         Returns:
-            Py_AASequence: New wrapped opject
+            Py_AASequence: New wrapped object.
324-326: Add type guard to __contains__ for non-string inputs.

Currently, passing a non‑str to __contains__ flows through to has_substring, which calls AASequence.fromString(substring) and raises an unclear error. Per previous review, add a guard:
 def __contains__(self, substring: str) -> bool:
-    """Check if substring is in sequence."""
+    """Check if substring is in sequence."""
+    if not isinstance(substring, str):
+        return False
     return self.has_substring(substring)

🧹 Nitpick comments (5)

tests/test_py_aasequence.py (2)
260-283: Tests pass but miss key edge cases for to_string.

The tests verify happy‑path behavior for each mod_format, but don't cover:

Unmodified sequences (where modified=True vs False should yield the same result)

Multiple modifications in one sequence

Edge‑case formats (e.g., what if the C++ method returns unexpected output?)

Consider adding:
def test_py_aasequence_to_string_unmodified():
    """Verify to_string on unmodified sequence."""
    seq = Py_AASequence.from_string("PEPTIDE")
    assert seq.to_string(modified=True) == "PEPTIDE"
    assert seq.to_string(modified=False) == "PEPTIDE"
285-297: Add test coverage for count() with modifications.

The current test only exercises unmodified sequences. Given that the implementation warns about not accounting for modifications (line 339 in py_aasequence.py), add a test demonstrating the behavior with modified sequences:
def test_count_with_modifications():
    """Verify count behavior on modified sequences."""
    seq = Py_AASequence.from_string("PEPTIDEM(Oxidation)")
    # Should count unmodified M
    assert seq.count('M') == 1
    # Should not count modified notation
    assert seq.count('M(Oxidation)') == 0
openms_python/py_aasequence.py (3)
258-299: **Verify concatenation behavior with terminal modifications.**Per pyOpenMS documentation, AASequence supports concatenation with the + operator, and N- and C-terminal modifications are represented by brackets to the right of the dots terminating the sequence. As raised in the PR discussion (PR objectives), concatenating sequences with terminal modifications may produce invalid results because the C‑terminus of the first sequence and the N‑terminus of the second are no longer terminal after joining.

The current implementation (lines 279, 281) simply concatenates string representations and calls from_string, which may parse successfully but create a biochemically invalid sequence. For example, concatenating a sequence with a C‑terminal modification and one with an N‑terminal modification could result in an intermediate residue incorrectly carrying terminal‑specific modifications.

Per the PR discussion, the decision is to defer validation to the C++ side. However, document this limitation:
 def __add__(self, other: Py_AASequence | str) -> Py_AASequence:
     """
     Concatenate sequences.
+    
+    Note: Concatenation does not validate terminal modifications. If either
+    sequence has N- or C-terminal modifications, the result may be invalid.

     Args:
Based on PR objectives and reviewer discussion.

337-340: count() correctly uses unmodified sequence but has static analysis hint.

The implementation correctly uses self.unmodified_sequence.count(residue) and warns users about modification handling, which is appropriate. The static analysis hint (Ruff B028) suggests adding stacklevel=2 to the warning:
-    warnings.warn("count method does not account for modifications")
+    warnings.warn("count method does not account for modifications", stacklevel=2)
This makes the warning point to the caller's location rather than this method.

398-428: Remove Optional from type hint if None is unsupported.

The signature allows mod_format to be None via the Optional type hint (implied by the import at line 5), but None is not handled and would fall through to the else block, raising ValueError. Either:

Remove Optional if None is not a valid input, or

Handle None explicitly (e.g., treat as alias for 'default').

Additionally, the Ruff TRY003 hint suggests extracting the list of supported formats to reduce the error message length:
+SUPPORTED_MOD_FORMATS = ('default', 'unimod', 'bracket')
+
 def to_string(
     self,
     modified: bool = True,
-    mod_format: Optional[Literal['default', 'unimod', 'bracket']] = 'default',
+    mod_format: Literal['default', 'unimod', 'bracket'] = 'default',
 ) -> str:
     ...
     else:
-        raise ValueError(f"Unsupported mod_format: {mod_format}, supported are 'default', 'unimod' and 'bracket'")
+        raise ValueError(
+            f"Unsupported mod_format: {mod_format!r}; supported: {SUPPORTED_MOD_FORMATS}"
+        )
Similar to past review feedback.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9d6c47 and f2fa5e8.

📒 Files selected for processing (2)

openms_python/py_aasequence.py (5 hunks)
tests/test_py_aasequence.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.14.7)

openms_python/py_aasequence.py

243-243: Avoid specifying long messages outside the exception class

(TRY003)

250-250: Avoid specifying long messages outside the exception class

(TRY003)

339-339: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

428-428: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)

openms_python/py_aasequence.py (1)

222-252: LGTM: PTM preservation now correct.

The reworked __getitem__ correctly uses getSubsequence(start, length) for both slices (line 244) and single indices (line 251), preserving post‑translational modifications. This resolves the critical issue flagged in previous reviews where getResidue().getOneLetterCode() discarded PTMs.

The negative‑index handling (line 247) and bounds check (line 249) are correct.

Minor: The Ruff TRY003 hints (lines 243, 250) suggest extracting long error messages into exception classes, but this is a nitpick; inline messages are acceptable for this use case.

jcharkow added 2 commits November 28, 2025 15:26

feature: add "+" and slicing for AASequence

ebb5cdf

feature: add more pythonic methods

af516c2

jcharkow requested a review from timosachsenberg November 28, 2025 20:53

coderabbitai bot reviewed Nov 28, 2025

View reviewed changes

openms_python/py_aasequence.py Show resolved Hide resolved

jcharkow added 2 commits December 1, 2025 15:32

fix: slicing for modified peptide sequences

4167d76

feature: add string export method

1d0d031

coderabbitai bot reviewed Dec 1, 2025

View reviewed changes

test: fix tests, allow for negative indices in slicing

70eb83d

coderabbitai bot reviewed Dec 1, 2025

View reviewed changes

openms_python/py_aasequence.py Outdated Show resolved Hide resolved

fix: address ai review comments

d9d6c47

coderabbitai bot reviewed Dec 1, 2025

View reviewed changes

jcharkow added 2 commits December 1, 2025 17:30

fix: critical slicing bug

266a926

address ai comments

f2fa5e8

coderabbitai bot reviewed Dec 1, 2025

View reviewed changes

timosachsenberg merged commit 204e8b6 into master Dec 2, 2025
2 checks passed

coderabbitai bot mentioned this pull request Dec 16, 2025

Feature/py residue #26

Open

Add more Py_AASequence Convenience functions #25

Add more Py_AASequence Convenience functions #25

Uh oh!

Conversation

jcharkow commented Nov 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

timosachsenberg commented Nov 28, 2025

Uh oh!

timosachsenberg commented Nov 29, 2025

Uh oh!

jcharkow commented Dec 1, 2025

Uh oh!

jcharkow commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

timosachsenberg commented Dec 1, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jcharkow commented Nov 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 28, 2025 •

edited

Loading

jcharkow commented Dec 1, 2025 •

edited

Loading