-
Notifications
You must be signed in to change notification settings - Fork 1
Add more Py_AASequence Convenience functions #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughPy_AASequence was extended into an immutable, Pythonic wrapper: supports construction from a native AASequence, int and slice indexing (including negatives, no slice.step), iteration yielding residue objects, concatenation/repetition, containment, hashing, ordering, counting, and a new to_string(modified, mod_format) API. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
openms_python/py_aasequence.py (2)
9-38: Clarify “immutable wrapper” wording vs exposed mutablenativeThe class docstring now advertises an immutable wrapper, but the
nativeproperty still exposes the underlyingoms.AASequence, which can be mutated externally. For callers, this is closer to “functional-style API on top of a mutable core”.If you want to lean into immutability, consider either:
- Returning a defensive copy from
native, or- Documenting that
nativeis an escape hatch that can break immutability guarantees.
245-326: New dunder methods look good; a few minor API polish opportunitiesThe concat/repeat/containment/comparison helpers are nicely Pythonic and keep
Py_AASequenceimmutable by always returning new instances. A few small nits you might consider:
__mul__/__rmul__:
- You currently return
NotImplementedfor non-ints and for negativetimes. Given the docstring says “must be >= 0”, raising aValueErrorfor negativetimeswould surface a clearer error than the genericTypeErrorproduced when both__mul__and__rmul__returnNotImplemented.__contains__:
- The example and signature are string-based (
substring: str), which is great for"PEP" in seq. It might also be useful to acceptPy_AASequencesosubseq in seqworks:if isinstance(substring, Py_AASequence): substring = substring.sequencecount:
- Docstring says “to be consistent with
str.count()”, butAASequence.countis residue-oriented. If multi-character substrings are not intended, either:
- Enforce
len(residue) == 1and document that only single one-letter codes are supported, or- Rename/document it as a residue counter, not a general substring counter.
These are all behavioral nits rather than blockers; the current implementation is otherwise coherent with the PR description.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
openms_python/py_aasequence.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.6)
openms_python/py_aasequence.py
228-228: Avoid specifying long messages outside the exception class
(TRY003)
236-236: Avoid specifying long messages outside the exception class
(TRY003)
|
Nice. One thing to consider: AASequence can be modified. So many of these operations maybe ???? should fail if e.g. a N-term or C-term modification is contained. Reason is that some Modifications have a terminal specificity and concatenation could violate this |
|
Should we just add a check and throw off this rare case for a concat occurs? |
|
Yes good point, I think yes it should just fail if the resulting sequences is invalid. I'll look into fixing that |
|
@timosachsenberg thinking more about our conversation earlier today and a reply to the comments above. It looks like the C++ implementation supports both + and - operators. I am not sure how if they handle N terminal and C terminal modifications. From as far as I can tell the tests do not examine modified sequences. If that is the case I think we can keep the concatenation as is? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
openms_python/py_aasequence.py (1)
221-252: Preserve PTMs for single-index__getitem__and handle reversed slices safelyThe slice branch is now using
getSubsequence, which fixes the earlier PTM-loss issue for slices, but two problems remain:
Single-index lookups still drop PTMs (major behavioral bug)
residue = self._sequence.getResidue(index) residue_char = residue.getOneLetterCode() return Py_AASequence.from_string(residue_char)This reconstructs from the one-letter code only, so any residue-level or terminal modification on that position is lost.
__iter__depends on__getitem__, so iteration will also strip PTMs on modified residues.Prefer using
getSubsequence(index, 1)and wrapping the native subsequence to keep all modification info:- else: - # Handle negative indices - if index < 0: - index = len(self) + index - if index < 0 or index >= len(self): - raise IndexError(f"Index {index} out of range for sequence of length {len(self)}") - residue = self._sequence.getResidue(index) - residue_char = residue.getOneLetterCode() - return Py_AASequence.from_string(residue_char) + else: + # Normalize and bounds-check index. + if index < 0: + index = len(self) + index + if index < 0 or index >= len(self): + raise IndexError(f"Index {index} out of range for sequence of length {len(self)}") + native_subseq = self._sequence.getSubsequence(index, 1) + return Py_AASequence.from_native(native_subseq)Reversed / empty slices can pass a negative length to
getSubsequence
For something likeseq[5:2],index.indices(len(self))yields(5, 2, 1), sostop - startis negative. Passing this as a length togetSubsequenceis at odds with normal Python slicing semantics (which should yield an empty sequence) and may trigger undefined or backend-specific behavior.You can special-case this to return an empty sequence:
- if isinstance(index, slice): - start, stop, step = index.indices(len(self)) - if step != 1: - raise ValueError("Step slicing is not supported for amino acid sequences") - return Py_AASequence.from_native(self._sequence.getSubsequence(start, stop - start)) + if isinstance(index, slice): + start, stop, step = index.indices(len(self)) + if step != 1: + raise ValueError("Step slicing is not supported for amino acid sequences") + if stop <= start: + # Match Python's semantics for empty / reversed slices. + return Py_AASequence() + return Py_AASequence.from_native( + self._sequence.getSubsequence(start, stop - start) + )Optionally, you may also want to normalize indices using the
__index__protocol so thatnumpy.int64and similar integer-likes work, but that’s a nice-to-have.Please double-check that
AASequence.getSubsequencein your pyOpenMS version accepts(start, length)withlength >= 0and raises a sensible error for out-of-range inputs. To verify via docs, you can search:OpenMS AASequence getSubsequence signature and semantics in the pyOpenMS bindings
🧹 Nitpick comments (4)
openms_python/py_aasequence.py (4)
60-72: Polishfrom_nativedocstring and parameter descriptionThe implementation is fine, but the docstring is incomplete and has a typo (“opject”), and the
native_sequenceargument isn’t actually described.You could tighten this up along these lines:
- def from_native(cls, native_sequence: oms.AASequence) -> Py_AASequence: - """ - Creates Py_AASequence from native pyOpenMS AASequence. - - Args: - native_sequence (oms.AASequence): - - Returns: - Py_AASequence: New wrapped opject - - """ + def from_native(cls, native_sequence: oms.AASequence) -> Py_AASequence: + """ + Create a Py_AASequence from an existing pyOpenMS AASequence. + + Args: + native_sequence: The underlying pyOpenMS AASequence instance to wrap. + + Returns: + Py_AASequence: New wrapper around the given native sequence. + """
301-323: Consider using__index__and explicit TypeError for__mul__/__rmul__The repetition logic is straightforward and fine for core use, but there are a couple of minor API details:
def __mul__(self, times: int) -> Py_AASequence: if not isinstance(times, int) or times < 0: return NotImplemented return Py_AASequence.from_string(self.sequence * times)
- Returning
NotImplementedfor negativetimesleads to a somewhat indirectTypeError, whereas Python’s built-ins usually allowtimes < 0and return an empty sequence.isinstance(times, int)excludes integer-likes (e.g.,numpy.int64) that implement__index__.If you want more Pythonic behavior, you could:
+ import operator + def __mul__(self, times: int) -> Py_AASequence: - if not isinstance(times, int) or times < 0: - return NotImplemented - return Py_AASequence.from_string(self.sequence * times) + try: + n = operator.index(times) + except TypeError: + return NotImplemented + if n < 0: + # Match str/list semantics; or raise TypeError if you prefer. + return Py_AASequence.from_string("") + return Py_AASequence.from_string(self.sequence * n)Same reasoning applies to
__rmul__, which delegates to__mul__.If you want to align exactly with CPython’s sequence semantics, please verify current behavior of
"abc" * -1and"abc" * numpy.int64(2)in your environment, and decide whether you want Py_AASequence to mirror that.
396-417: Unify string rendering and consider shortening error message formod_format
to_stringis a nice central place for string rendering:def to_string(self, modified=True, mod_format: Optional[Literal['unimod', 'bracket']] = 'unimod') -> str: if not modified: return self.unmodified_sequence else: if mod_format == 'unimod': return self._sequence.toUniModString() elif mod_format == 'bracket': return self._sequence.toBracketString() else: raise ValueError(f"Unsupported mod_format: {mod_format}, supported are 'unimod' and 'bracket'")Two minor suggestions:
Reuse here in
__str__/sequencefor consistency
Right nowsequenceuses_sequence.toString(), whileto_string(modified=True, 'unimod')usestoUniModString(). If you want a single “canonical” representation, consider having__str__and/or thesequenceproperty delegate toto_stringto avoid divergence.Shorten the
ValueErrormessage (TRY003 hint)
Ruff is flagging the long error message. You could keep it shorter and let type hints/docs carry the rest:
else:if mod_format == 'unimod':return self._sequence.toUniModString()elif mod_format == 'bracket':return self._sequence.toBracketString()else:raise ValueError(f"Unsupported mod_format: {mod_format}, supported are 'unimod' and 'bracket'")
if mod_format == 'unimod':return self._sequence.toUniModString()if mod_format == 'bracket':return self._sequence.toBracketString()raise ValueError(f"Unsupported mod_format: {mod_format}")Both are nice-to-have cleanups rather than blockers.
258-299: Add tests for concatenation with terminal modificationsCurrently, the concatenation operators concatenate sequences as plain strings:
combined_str = self.sequence + other.sequence return Py_AASequence.from_string(combined_str)While this works for simple sequences, OpenMS uses a specific string format for terminal modifications (e.g.,
.(Dimethyl)PEPTIDE.or.PEPTIDE.(Label:18O(2))). Concatenating sequences with terminal modifications may produce malformed strings since the terminal notation (dots) cannot appear in the middle of a sequence string.Add tests covering:
seq_with_Nterm_mod + "PEP""PEP" + seq_with_Cterm_mod- Concatenation of two sequences both carrying terminal mods
Verify and document whether:
- The resulting sequence has the expected terminal/internal modifications, or
fromString()appropriately raises an error for invalid sequences
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
openms_python/py_aasequence.py(5 hunks)
🧰 Additional context used
🪛 Ruff (0.14.6)
openms_python/py_aasequence.py
242-242: Avoid specifying long messages outside the exception class
(TRY003)
249-249: Avoid specifying long messages outside the exception class
(TRY003)
417-417: Avoid specifying long messages outside the exception class
(TRY003)
openms_python/py_aasequence.py
Outdated
| def __contains__(self, substring: str) -> bool: | ||
| """Check if substring is in sequence.""" | ||
| return self.has_substring(substring) | ||
|
|
||
| def __hash__(self) -> int: | ||
| """Make sequences hashable for use in sets/dicts.""" | ||
| return hash(self.sequence) | ||
|
|
||
| def __lt__(self, other: Py_AASequence) -> bool: | ||
| """Lexicographic comparison by sequence.""" | ||
| if not isinstance(other, Py_AASequence): | ||
| return NotImplemented | ||
| return self.sequence < other.sequence | ||
| def count(self, residue: str) -> int: | ||
| """Count occurrences of a residue, to be consistent with str.count().""" | ||
| return self._sequence.count(residue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "py_aasequence.py" -type fRepository: OpenMS/openms-python
Length of output: 96
🏁 Script executed:
rg -A 10 "def has_substring" --type pyRepository: OpenMS/openms-python
Length of output: 751
🏁 Script executed:
rg "def _sequence|self\._sequence\s*=" openms_python/py_aasequence.py -B 2 -A 2Repository: OpenMS/openms-python
Length of output: 297
🏁 Script executed:
rg "def count" openms_python/py_aasequence.py -B 2 -A 5Repository: OpenMS/openms-python
Length of output: 386
🏁 Script executed:
find . -path "*/test*" -name "*.py" -type f | head -10 | xargs rg "__contains__|\.count\(" 2>/dev/null | head -20Repository: OpenMS/openms-python
Length of output: 46
🌐 Web query:
OpenMS AASequence::count String parameter documentation
💡 Result:
Do you mean AASequence::count (which I can't find in the AASequence API) or the String::count / substring-count behavior used with AASequence (e.g., counting residues/substrings)? Please clarify which exact method/signature you need (or paste the header/signature).
🏁 Script executed:
find . -type f -name "*test*.py" -o -name "test_*.py" | xargs grep -l "AASequence\|count\|__contains__" 2>/dev/null | head -5Repository: OpenMS/openms-python
Length of output: 228
🏁 Script executed:
grep -r "def count" . --include="*.py" | grep -v ".pyc" | head -20Repository: OpenMS/openms-python
Length of output: 138
🏁 Script executed:
rg "oms\.AASequence" openms_python/py_aasequence.py -B 1 -A 1 | head -30Repository: OpenMS/openms-python
Length of output: 848
🏁 Script executed:
cat openms_python/py_aasequence.py | head -50Repository: OpenMS/openms-python
Length of output: 1684
🏁 Script executed:
cat ./tests/test_py_aasequence.pyRepository: OpenMS/openms-python
Length of output: 7563
🏁 Script executed:
rg "test.*count|test.*contains|__contains__|\.count\(" ./tests/test_py_aasequence.py -B 2 -A 5Repository: OpenMS/openms-python
Length of output: 46
🏁 Script executed:
rg "def __contains__" openms_python/py_aasequence.py -B 2 -A 5Repository: OpenMS/openms-python
Length of output: 384
🏁 Script executed:
python3 << 'EOF'
import sys
try:
import pyopenms as oms
seq = oms.AASequence.fromString("PEPTIDE")
# Check if count method exists
if hasattr(seq, 'count'):
print("count method exists on oms.AASequence")
print(f"count('P') = {seq.count('P')}")
print(f"count('E') = {seq.count('E')}")
print(f"count('PE') = {seq.count('PE')}")
else:
print("count method does NOT exist on oms.AASequence")
print(f"Available methods: {[m for m in dir(seq) if not m.startswith('_')]}")
except Exception as e:
print(f"Error: {e}")
print(f"Exception type: {type(e).__name__}")
EOFRepository: OpenMS/openms-python
Length of output: 150
🏁 Script executed:
python3 << 'EOF'
import sys
try:
from openms_python.py_aasequence import Py_AASequence
seq = Py_AASequence.from_string("PEPTIDE")
# Test __contains__ with non-str
try:
result = 123 in seq
print(f"Non-str in sequence returned: {result}")
except Exception as e:
print(f"Non-str in sequence raised: {type(e).__name__}: {e}")
# Test __contains__ with str
try:
result = "PEP" in seq
print(f"Str 'PEP' in sequence returned: {result}")
except Exception as e:
print(f"Str in sequence raised: {type(e).__name__}: {e}")
except Exception as e:
print(f"Setup error: {e}")
EOFRepository: OpenMS/openms-python
Length of output: 150
Tighten type expectations for __contains__ and document count() behavior
Two small points here:
-
Type handling in
__contains__def __contains__(self, substring: str) -> bool: return self.has_substring(substring)
At runtime,
__contains__will be called with whatever the caller uses. Non-strinputs will flow through toAASequence.fromStringinsidehas_substring, raising an unclear error.A light guard improves robustness:
def __contains__(self, substring: str) -> bool:
- """Check if substring is in sequence."""
- return self.has_substring(substring)
- """Check if a string substring is in sequence."""
- if not isinstance(substring, str):
-
return False - return self.has_substring(substring)
2. **Undocumented behavior of `count()`**
```python
def count(self, residue: str) -> int:
return self._sequence.count(residue)
No tests exist for this method. The docstring claims it mirrors str.count(), but it's unclear whether this:
- counts single residues (characters),
- counts subsequences,
- or handles modifications differently.
Add a docstring clarifying the exact behavior and include tests with simple sequences to demonstrate expected results (e.g., count("P") and count("PE") on "PEPTIDE").
🤖 Prompt for AI Agents
In openms_python/py_aasequence.py around lines 324 to 339, __contains__
currently assumes the argument is a str and forwards it to has_substring causing
unclear errors for non-str inputs; add a light guard that checks
isinstance(substring, str) and return False (or raise TypeError if you prefer
strictness) when it's not a str, then call has_substring for valid strings. Also
update the count() docstring to explicitly state it mirrors str.count (i.e., it
counts occurrences of the given substring which may be a single residue or a
multi-character subsequence) and add unit tests that assert expected results
(e.g., "PEPTIDE".count("P") == 2 and .count("PE") == 1) to prevent regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
openms_python/py_aasequence.py (2)
324-339: Type handling in__contains__and documentingcount()behaviorTwo points here, both already raised in a previous review and still applicable:
__contains__currently forwards any object tohas_substring, which will eventually hitAASequence.fromStringwith a non-str and raise a less-than-clear error. A light type guard would make behavior more predictable:def __contains__(self, substring: str) -> bool: - """Check if substring is in sequence.""" - return self.has_substring(substring) + """Check if a string substring is in sequence.""" + if not isinstance(substring, str): + return False + return self.has_substring(substring)
count()simply forwards to_sequence.count(residue)with a brief docstring. It’s not obvious whether this counts single residues, arbitrary substrings, or how modifications are treated. Clarifying the docstring (e.g., “Counts occurrences of the given residue or subsequence, mirroringstr.count()”) and adding a couple of small tests would make the behavior much clearer.
221-252: Integer indexing still loses PTMs and mishandles very negative indicesSlices are now correctly implemented via
getSubsequence, but the single-index path still:
- Builds the result from
residue.getOneLetterCode()⇒ any PTMs on that residue are silently dropped.- Fails to guard indices
< -len(self): forindex = -len(self) - 1, the adjusted index stays negative and is passed togetResidue, which is undefined/likely erroneous.This is the same core concern as in the earlier review; slices were fixed, but integer indexing remains inconsistent with the rest of the API and with PTM-preserving behavior.
A more robust implementation would reuse the native subsequence API for both slices and single indices, and perform full bounds checking:
def __getitem__(self, index): """ Get residue(s) at position(s). Supports both single indexing and slicing, returning Py_AASequence objects. """ - if isinstance(index, slice): - start, stop, step = index.indices(len(self)) - if step != 1: - raise ValueError("Step slicing is not supported for amino acid sequences") - return Py_AASequence.from_native(self._sequence.getSubsequence(start, stop - start)) - else: - # Handle negative indices - if index < 0: - index = len(self) + index - if index >= len(self): - raise IndexError(f"Index {index} out of range for sequence of length {len(self)}") - residue = self._sequence.getResidue(index) - residue_char = residue.getOneLetterCode() - return Py_AASequence.from_string(residue_char) + if isinstance(index, slice): + start, stop, step = index.indices(len(self)) + if step != 1: + raise ValueError("Step slicing is not supported for amino acid sequences") + return Py_AASequence.from_native( + self._sequence.getSubsequence(start, stop - start) + ) + + # Normalize integer index (including negative) and check bounds + if not isinstance(index, int): + raise TypeError(f"Indices must be integers or slices, not {type(index).__name__}") + if index < 0: + index += len(self) + if index < 0 or index >= len(self): + raise IndexError(f"Index {index} out of range for sequence of length {len(self)}") + + # Use native subsequence to preserve PTMs for single residues + return Py_AASequence.from_native( + self._sequence.getSubsequence(index, 1) + )This keeps PTMs intact for integer indexing, aligns with how slicing is now done, and ensures negative indices behave like standard Python sequences.
🧹 Nitpick comments (4)
tests/test_py_aasequence.py (2)
125-135: Indexing tests align with wrapper semantics, but no coverage for modified residuesThe updated assertions on
seq[i].sequencecorrectly enforce that element access returns a wrapped residue. Once the integer-index PTM bug in__getitem__is fixed, consider adding a case likeaa_seq[-2].sequence == "M(Oxidation)"on a modified sequence to guard against regressions.
285-291: Slicing tests validate modification preservation; consider single-index coverage laterThe new slicing cases, especially
aa_seq[-2:].sequence == 'M(Oxidation)R', correctly verify that subsequence operations preserve PTMs. After fixing integer indexing in__getitem__, it would be worth adding a complementary test foraa_seq[-2]on the same modified sequence.openms_python/py_aasequence.py (2)
60-72: Minor docstring polish forfrom_native
from_nativeis a useful public helper; the docstring could be tightened (e.g., describenative_sequenceinstead of leaving it blank and fix “opject” → “object”), but behavior itself looks fine.
258-323: String-based concatenation/repetition and terminal modifications
__add__,__radd__,__mul__, and__rmul__build new sequences by string concatenation/repetition followed byfrom_string(). That’s simple and works for unmodified sequences, but for N-/C-terminal–specific modifications this may:
- Rely entirely on
AASequence.fromStringto detect invalid combinations, or- Silently reinterpret or misplace terminal-specific modifications after concatenation/repetition.
Given the earlier discussion about terminal modifications and invalid concatenations, it would be good to make the intended behavior explicit:
- Decide whether these operators should:
- reject combinations that violate terminal-mod rules (preferred, via checks after construction), or
- fully delegate to the native parsing semantics and document that behavior.
- Add a couple of tests covering concatenation/repetition of modified sequences to pin down and guard the chosen semantics.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
openms_python/py_aasequence.py(5 hunks)tests/test_py_aasequence.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.6)
openms_python/py_aasequence.py
242-242: Avoid specifying long messages outside the exception class
(TRY003)
249-249: Avoid specifying long messages outside the exception class
(TRY003)
427-427: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (2)
tests/test_py_aasequence.py (2)
116-123: Iteration behavior test matches new APIUsing
res.sequenceintest_py_aasequence_iterationcorrectly reflects that__iter__now yields sequence-wrapped residues rather than raw strings; this anchors the new contract well.
260-283: to_string tests provide solid coverage across formatsThese checks exercise the default, unmodified, bracket, UniMod, and invalid-format paths of
to_string, which is exactly what’s needed to pin down the new API behavior.
I think it is also wrong in the c++ implementation in that case. But we could handle the error there (c++) and keep this simpler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
openms_python/py_aasequence.py (1)
324-326: Add type guard in__contains__for non‑string inputs.As noted in a previous review,
__contains__will be called with whatever the user provides. Non‑strinputs flow through toAASequence.fromStringinsidehas_substring, raising an unclear error.Apply this diff to improve robustness:
def __contains__(self, substring: str) -> bool: - """Check if substring is in sequence.""" + """Check if a string substring is in sequence.""" + if not isinstance(substring, str): + return False return self.has_substring(substring)Based on past review feedback.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
openms_python/py_aasequence.py(5 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
openms_python/py_aasequence.py
242-242: Avoid specifying long messages outside the exception class
(TRY003)
249-249: Avoid specifying long messages outside the exception class
(TRY003)
427-427: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (2)
openms_python/py_aasequence.py (2)
396-427: LGTM! Optional: centralize supported‑formats list.The previous review concerns (mismatched default, incomplete error message, unnecessary
Optional) have been addressed. The signature, docstring, and error message now correctly reflect'default'as the default and list all three supported formats.To address the Ruff TRY003 hint and improve maintainability, consider centralizing the supported formats:
def to_string(self, modified=True, mod_format: Literal['default', 'unimod', 'bracket'] = 'default') -> str: """ Get string representation of the sequence. Args: modified (bool): Whether to include modifications in the string. mod_format (Optional[Literal['default', 'unimod', 'bracket']]): Format for modifications. 'default' for OpenMS format, 'unimod' for UniMod format, 'bracket' for bracket notation. Default is 'default'. Returns: str: Amino acid sequence as string. Example: >>> seq = Py_AASequence.from_string("PEPTIDE") >>> seq_str = seq.to_string() """ if not modified: return self.unmodified_sequence - else: - if mod_format == 'default': - return self._sequence.toString() - elif mod_format == 'unimod': - return self._sequence.toUniModString() - elif mod_format == 'bracket': - return self._sequence.toBracketString() - else: - raise ValueError(f"Unsupported mod_format: {mod_format}, supported are 'default', 'unimod' and 'bracket'") + if mod_format == 'default': + return self._sequence.toString() + elif mod_format == 'unimod': + return self._sequence.toUniModString() + elif mod_format == 'bracket': + return self._sequence.toBracketString() + + supported = ('default', 'unimod', 'bracket') + raise ValueError( + f"Unsupported mod_format: {mod_format!r}; supported are {supported}" + )
258-299: Add test coverage or document limitations for concatenation with terminal-modified sequences.The
__add__and__radd__implementations concatenate sequences via string representation and re-parse:combined_str = self.sequence + other.sequence return Py_AASequence.from_string(combined_str)Terminal modifications are supported (as shown by
has_n_terminal_modificationandhas_c_terminal_modificationproperties), but concatenating sequences with terminal modifications may produce invalid results. For example, concatenating a sequence with a C-terminal modification to one with an N-terminal modification could create malformed strings during re-parsing.Either add test cases for modified sequence concatenation to verify behavior, or document this as a known limitation if not supported by the underlying C++ implementation.
| @classmethod | ||
| def from_native(cls, native_sequence: oms.AASequence) -> Py_AASequence: | ||
| """ | ||
| Creates Py_AASequence from native pyOpenMS AASequence. | ||
| Args: | ||
| native_sequence (oms.AASequence): | ||
| Returns: | ||
| Py_AASequence: New wrapped opject | ||
| """ | ||
| return cls(native_sequence) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typos and complete docstring in from_native.
The docstring has two issues:
- Line 66: Args description is incomplete—no text follows
native_sequence (oms.AASequence):. - Line 69: Typo "opject" should be "object".
Apply this diff:
@classmethod
def from_native(cls, native_sequence: oms.AASequence) -> Py_AASequence:
"""
Creates Py_AASequence from native pyOpenMS AASequence.
Args:
- native_sequence (oms.AASequence):
+ native_sequence (oms.AASequence): Native pyOpenMS AASequence object to wrap.
Returns:
- Py_AASequence: New wrapped opject
+ Py_AASequence: New wrapped object.
"""
return cls(native_sequence)🤖 Prompt for AI Agents
In openms_python/py_aasequence.py around lines 60 to 72, the from_native
docstring is incomplete and has a typo: fill in the Args description for
native_sequence to explain that it is a pyOpenMS AASequence to be wrapped (e.g.,
"native_sequence (oms.AASequence): the native pyOpenMS amino acid sequence to
wrap"), correct "opject" to "object" in the Returns section, and optionally
tighten wording to "New wrapped object" or "Wrapped Py_AASequence instance" so
the docstring reads clearly and completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
openms_python/py_aasequence.py (2)
61-73: Fix typos and complete docstring infrom_native.The docstring has two issues flagged in previous review:
- Line 67: Args description is incomplete—no text follows
native_sequence (oms.AASequence):.- Line 70: Typo "opject" should be "object".
Apply this diff:
- native_sequence (oms.AASequence): + native_sequence (oms.AASequence): Native pyOpenMS AASequence object to wrap. Returns: - Py_AASequence: New wrapped opject + Py_AASequence: New wrapped object.
324-326: Add type guard to__contains__for non-string inputs.Currently, passing a non‑
strto__contains__flows through tohas_substring, which callsAASequence.fromString(substring)and raises an unclear error. Per previous review, add a guard:def __contains__(self, substring: str) -> bool: - """Check if substring is in sequence.""" + """Check if substring is in sequence.""" + if not isinstance(substring, str): + return False return self.has_substring(substring)
🧹 Nitpick comments (5)
tests/test_py_aasequence.py (2)
260-283: Tests pass but miss key edge cases forto_string.The tests verify happy‑path behavior for each
mod_format, but don't cover:
- Unmodified sequences (where
modified=TruevsFalseshould yield the same result)- Multiple modifications in one sequence
- Edge‑case formats (e.g., what if the C++ method returns unexpected output?)
Consider adding:
def test_py_aasequence_to_string_unmodified(): """Verify to_string on unmodified sequence.""" seq = Py_AASequence.from_string("PEPTIDE") assert seq.to_string(modified=True) == "PEPTIDE" assert seq.to_string(modified=False) == "PEPTIDE"
285-297: Add test coverage forcount()with modifications.The current test only exercises unmodified sequences. Given that the implementation warns about not accounting for modifications (line 339 in py_aasequence.py), add a test demonstrating the behavior with modified sequences:
def test_count_with_modifications(): """Verify count behavior on modified sequences.""" seq = Py_AASequence.from_string("PEPTIDEM(Oxidation)") # Should count unmodified M assert seq.count('M') == 1 # Should not count modified notation assert seq.count('M(Oxidation)') == 0openms_python/py_aasequence.py (3)
258-299: **Verify concatenation behavior with terminal modifications.**Per pyOpenMS documentation, AASequence supports concatenation with the + operator, and N- and C-terminal modifications are represented by brackets to the right of the dots terminating the sequence. As raised in the PR discussion (PR objectives), concatenating sequences with terminal modifications may produce invalid results because the C‑terminus of the first sequence and the N‑terminus of the second are no longer terminal after joining.The current implementation (lines 279, 281) simply concatenates string representations and calls
from_string, which may parse successfully but create a biochemically invalid sequence. For example, concatenating a sequence with a C‑terminal modification and one with an N‑terminal modification could result in an intermediate residue incorrectly carrying terminal‑specific modifications.Per the PR discussion, the decision is to defer validation to the C++ side. However, document this limitation:
def __add__(self, other: Py_AASequence | str) -> Py_AASequence: """ Concatenate sequences. + + Note: Concatenation does not validate terminal modifications. If either + sequence has N- or C-terminal modifications, the result may be invalid. Args:Based on PR objectives and reviewer discussion.
337-340:count()correctly uses unmodified sequence but has static analysis hint.The implementation correctly uses
self.unmodified_sequence.count(residue)and warns users about modification handling, which is appropriate. The static analysis hint (Ruff B028) suggests addingstacklevel=2to the warning:- warnings.warn("count method does not account for modifications") + warnings.warn("count method does not account for modifications", stacklevel=2)This makes the warning point to the caller's location rather than this method.
398-428: RemoveOptionalfrom type hint ifNoneis unsupported.The signature allows
mod_formatto beNonevia theOptionaltype hint (implied by the import at line 5), butNoneis not handled and would fall through to theelseblock, raisingValueError. Either:
- Remove
OptionalifNoneis not a valid input, or- Handle
Noneexplicitly (e.g., treat as alias for'default').Additionally, the Ruff TRY003 hint suggests extracting the list of supported formats to reduce the error message length:
+SUPPORTED_MOD_FORMATS = ('default', 'unimod', 'bracket') + def to_string( self, modified: bool = True, - mod_format: Optional[Literal['default', 'unimod', 'bracket']] = 'default', + mod_format: Literal['default', 'unimod', 'bracket'] = 'default', ) -> str: ... else: - raise ValueError(f"Unsupported mod_format: {mod_format}, supported are 'default', 'unimod' and 'bracket'") + raise ValueError( + f"Unsupported mod_format: {mod_format!r}; supported: {SUPPORTED_MOD_FORMATS}" + )Similar to past review feedback.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
openms_python/py_aasequence.py(5 hunks)tests/test_py_aasequence.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
openms_python/py_aasequence.py
243-243: Avoid specifying long messages outside the exception class
(TRY003)
250-250: Avoid specifying long messages outside the exception class
(TRY003)
339-339: No explicit stacklevel keyword argument found
Set stacklevel=2
(B028)
428-428: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (1)
openms_python/py_aasequence.py (1)
222-252: LGTM: PTM preservation now correct.The reworked
__getitem__correctly usesgetSubsequence(start, length)for both slices (line 244) and single indices (line 251), preserving post‑translational modifications. This resolves the critical issue flagged in previous reviews wheregetResidue().getOneLetterCode()discarded PTMs.The negative‑index handling (line 247) and bounds check (line 249) are correct.
Minor: The Ruff TRY003 hints (lines 243, 250) suggest extracting long error messages into exception classes, but this is a nitpick; inline messages are acceptable for this use case.
Add some more Py_AASequence convenience functions
E.g.
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.