Skip to content

Conversation

@pavanimanchala53
Copy link

@pavanimanchala53 pavanimanchala53 commented Dec 11, 2025

PR: Add NL Parser With Typo Tolerance, Confidence Scoring, Clarifications & Unit Tests

This PR adds a standalone natural language parser for the cortex install flow, improving demo reliability and addressing all acceptance criteria in the bounty request.

✔ Features Implemented

Typo tolerance (e.g., kubernets → kubernetes, pyhton → python)

Rule-based + fuzzy semantic intent detection

Confidence scoring (0.0–1.0)

Ambiguity handling with clarification prompts

Short explanation showing AI reasoning

Slot extraction:

Python version

Platform

Packages mentioned

12+ unit tests validating NL behavior

✔ Files Added

nl_parser.py

tests/test_nl_parser.py

✔ How to Test

Run:

python -m pytest tests/test_nl_parser.py -q

✔ Bounty Requirement Status

All acceptance criteria met:

10+ NL test cases — ✔

Ambiguous handling — ✔

Typo tolerance — ✔

Explanation output — ✔

Confidence scoring — ✔

Unit tests — ✔

Closes: #248

Summary by CodeRabbit

  • New Features

    • Natural Language Parser now converts installation requests into structured intents with typo correction, confidence scoring, and clarification prompts for ambiguous requests.
  • Documentation

    • Added Natural Language Parser documentation with usage examples.
  • Tests

    • Added unit tests covering intent recognition, correction handling, and slot extraction.
  • Chores

    • Added CodeQL security scanning workflow for continuous code analysis.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

Walkthrough

A new natural language parsing module is introduced to convert user installation requests into structured intents, including normalization, spelling correction, rule-based and semantic intent detection, confidence scoring, slot extraction, and clarification suggestions, accompanied by comprehensive tests and documentation.

Changes

Cohort / File(s) Change Summary
Natural Language Parser Implementation
cortex/nl_parser.py
New module implementing NLP parsing with text normalization, tokenization, spelling correction, hybrid rule-based and semantic intent matching, confidence aggregation, slot extraction (python_version, platform, packages), and clarification suggestion logic; exports 11 public functions including main parse_request orchestrator.
Parser Testing
tests/test_nl_parser.py
New pytest-based test suite with parameterized test cases for intent detection (including ambiguous cases triggering clarifications), spelling correction validation, and slot extraction from natural language input.
Parser Documentation
docs/NL_PARSER.md
New documentation describing the NL Parser system, core functionality (typo tolerance, hybrid detection, confidence scoring, clarifications, slot extraction), and usage example.
CI/CD Infrastructure
.github/workflows/codeql.yml
New GitHub Actions workflow for CodeQL Advanced security analysis triggered on push and pull requests to main with matrix strategy for Python language analysis.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • cortex/nl_parser.py: Review the intent matching algorithms (fuzzy scoring, rule-based detection), confidence aggregation logic, and spelling correction implementation for correctness and robustness.
  • tests/test_nl_parser.py: Verify test coverage aligns with acceptance criteria (10+ test cases, ambiguity handling, typo tolerance).
  • Integration: Ensure parse_request orchestration correctly combines all components (normalization → correction → intent resolution → slot extraction → clarifications).

Poem

🐰 A parser emerges, fuzzy and wise,
Understanding requests despite typos and sighs—
"Install ML magic!" the user requests,
The rabbit detects it and passes the tests.
With confidence scored and clarifications in hand,
Natural language commands now clearly expand! 🌟

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.52% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Feature/nl parser improvements' is vague and uses non-descriptive terminology that doesn't convey the specific improvements (typo tolerance, confidence scoring, clarifications) implemented in this PR. Consider using a more specific title like 'Add NL parser with typo tolerance and confidence scoring' to clearly convey the main improvements.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The PR description is comprehensive and includes summary, features, files added, testing instructions, and bounty criteria status. However, it lacks the required template format with 'Related Issue' section properly formatted and missing the checklist items marked as complete.
Linked Issues check ✅ Passed All acceptance criteria from issue #248 are met: 12+ test cases added, ambiguity handling implemented, typo tolerance with spell correction, confidence scoring (0.0-1.0), AI reasoning in explanations, and comprehensive unit tests covering slot extraction and intent detection.
Out of Scope Changes check ✅ Passed The PR includes nl_parser.py and tests/test_nl_parser.py as primary deliverables, plus a CodeQL workflow file. While the CodeQL workflow is security-related and out of scope for the specific NL parser bounty, it represents a reasonable infrastructure improvement without interfering with the core objectives.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (13)
test/test_installation_history.py (1)

10-16: Confirm capitalized module name and consider PEP 8 alignment

Switching to from Installation_history import ... is fine if the module is actually named with a capital I. If the module is under your control, consider renaming it to a lowercase, underscore‑separated name and adjusting imports for PEP 8 compliance; otherwise, just ensure this import matches the real filename in the package layout.

src/intent/detector.py (1)

6-10: Tighten public API typing/docstrings for Intent and detect()

The detection logic itself looks solid and matches the intended flows. To better align with the project’s typing/docstring guidelines, consider:

  • Making details more explicit than Optional[dict], e.g. Optional[dict[str, object]] or a typed alias if you expect a fixed structure.
  • Adding a short docstring to Intent (or at least to detect) describing expected input text and the shape/ordering of returned intents.

This will make the new intent surface clearer to downstream callers and type checkers. As per coding guidelines, docstrings and strong type hints are expected on public APIs.

Also applies to: 26-49

src/intent/context.py (1)

13-69: Add explicit -> None return types on mutating SessionContext methods

SessionContext behavior looks fine, but several public methods (__init__, set_gpu, add_intents, add_installed, add_clarification, reset) currently omit their -> None return annotations. Adding these will fully satisfy the “type hints required” guideline and make intent clearer to type checkers:

-    def set_gpu(self, gpu_name: str):
+    def set_gpu(self, gpu_name: str) -> None:
@@
-    def add_intents(self, intents: List[Intent]):
+    def add_intents(self, intents: List[Intent]) -> None:
@@
-    def add_installed(self, pkg: str):
+    def add_installed(self, pkg: str) -> None:
@@
-    def add_clarification(self, question: str):
+    def add_clarification(self, question: str) -> None:
@@
-    def reset(self):
+    def reset(self) -> None:

(You can similarly annotate __init__ if you want full consistency.)

As per coding guidelines, type hints are required.

src/intent/planner.py (1)

6-49: Use GPU_PACKAGES in has_gpu and annotate it as ClassVar

Right now GPU_PACKAGES is never used, and has_gpu only checks for a gpu target, so a plan with only cuda/pytorch intents won’t trigger GPU detection/config steps. You can both fix this and satisfy the Ruff ClassVar warning by:

-from typing import List
+from typing import List, ClassVar
@@
-    GPU_PACKAGES = ["cuda", "cudnn", "pytorch", "tensorflow"]
+    GPU_PACKAGES: ClassVar[List[str]] = ["cuda", "cudnn", "pytorch", "tensorflow"]
@@
-        # 1. If GPU-related intents exist → add GPU detection
-        has_gpu = any(i.target == "gpu" for i in intents)
+        # 1. If GPU-related intents exist → add GPU detection
+        has_gpu = any(
+            i.target == "gpu" or i.target in self.GPU_PACKAGES
+            for i in intents
+        )

This keeps the behavior aligned with the intent of “GPU-related intents” while cleaning up the static-analysis complaint.

src/test_intent_detection.py (1)

1-15: Tests cover core detector behavior; type hints optional for consistency

These tests nicely exercise the happy path and an empty-input case for IntentDetector. If you want full alignment with the “type hints required” policy, you could add return annotations like def test_detector_basic() -> None, but that’s purely consistency polish on test code.

src/test_clarifier.py (1)

1-12: Clarifier integration test looks good; consider optional type annotations

The test correctly validates that a clarification question is produced for an underspecified ML request. As with the other tests, you could optionally add -> None on the test function to mirror the project’s typing convention, but the current test is functionally solid.

src/test_llm_agent.py (2)

3-9: Consider using a factory method or instance attribute for mutable content.

The content class attribute on line 8 is a mutable list. While this works for the current test, it could cause unexpected behavior if multiple Response instances share state. Since this is a simple mock, it's low risk, but using an instance attribute or property would be safer.

 class MockMessages:
     def create(self, **kwargs):
         class Response:
             class Content:
                 text = "install: tensorflow\ninstall: jupyter"
-            content = [Content()]
+            @property
+            def content(self):
+                return [self.Content()]
         return Response()

15-28: Test file location inconsistent with project structure.

This test file is in src/ while other NL parser tests are in tests/test_nl_parser.py. Consider moving to tests/test_llm_agent.py for consistency. Also, the test could benefit from additional assertions to verify the structure of the plan (e.g., that plan items are strings) and that suggestions is a list.

     assert "plan" in result
     assert len(result["plan"]) > 0
     assert "suggestions" in result
+    assert isinstance(result["plan"], list)
+    assert isinstance(result["suggestions"], list)
src/test_planner.py (1)

4-16: Test validates planner correctly; consider moving to tests/ and using constants.

The test properly validates the CUDA/PyTorch/GPU pipeline. Assertions against hardcoded strings (e.g., "Install CUDA 12.3 + drivers") may become brittle if planner output changes. Consider either:

  1. Moving to tests/test_planner.py for consistency
  2. Optionally defining expected plan steps as constants shared with the planner

Current test logic is correct.

src/intent/llm_agent.py (2)

67-70: Consider logging swallowed exceptions and using narrower exception types.

Bare except Exception catches can mask unexpected errors. While the fallback behavior is appropriate for resilience, logging the exception would aid debugging in production.

+import logging
+
+logger = logging.getLogger(__name__)
+
         # 4. Improve intents using LLM (safe)
         try:
             improved_intents = self.enhance_intents_with_llm(text, intents)
-        except Exception:
+        except Exception as e:
+            logger.warning("LLM enhancement failed, using rule-based intents: %s", e)
             improved_intents = intents

Also applies to: 79-82


100-106: LLM prompt may produce inconsistent output format.

The prompt at lines 96-106 asks for "install: package" format but doesn't strongly constrain the LLM output. Consider adding examples or using a more structured prompt to improve reliability of parsing.

 Return improvements or extra intents.
-Format: "install: package" or "configure: component"
+Format each intent on its own line using exactly one of:
+- install: <package_name>
+- configure: <component>
+- verify: <target>
+
+Example:
+install: tensorflow
+configure: gpu
 """
nl_parser.py (2)

142-142: Clarify deduplication idiom with a comment.

The dict.fromkeys() pattern for deduplication is clever but not immediately obvious to all readers.

Apply this diff:

-        slots["packages"] = list(dict.fromkeys(pkgs))  # unique preserve order
+        # Deduplicate while preserving order (dict.fromkeys maintains insertion order in Python 3.7+)
+        slots["packages"] = list(dict.fromkeys(pkgs))

173-220: Consider adding input length validation.

The function processes arbitrary-length text without validation. While not likely to be a practical issue for typical use cases, extremely long inputs could cause performance degradation.

Consider adding a length check at the start of the function:

 def parse_request(text: str) -> Dict[str, Any]:
     """Main function used by tests and demo."""
+    # Prevent potential DoS from extremely long inputs
+    if len(text) > 10000:
+        return {
+            "intent": "unknown",
+            "confidence": 0.0,
+            "explanation": "Input too long",
+            "slots": {},
+            "corrections": [],
+            "clarifications": ["Please provide a shorter request (max 10000 characters)"],
+        }
+    
     norm = normalize(text)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a1755c and ac7fbae.

📒 Files selected for processing (15)
  • .gitignore (0 hunks)
  • nl_parser.py (1 hunks)
  • requirements.txt (1 hunks)
  • src/intent/clarifier.py (1 hunks)
  • src/intent/context.py (1 hunks)
  • src/intent/detector.py (1 hunks)
  • src/intent/llm_agent.py (1 hunks)
  • src/intent/planner.py (1 hunks)
  • src/test_clarifier.py (1 hunks)
  • src/test_context.py (1 hunks)
  • src/test_intent_detection.py (1 hunks)
  • src/test_llm_agent.py (1 hunks)
  • src/test_planner.py (1 hunks)
  • test/test_installation_history.py (1 hunks)
  • tests/test_nl_parser.py (1 hunks)
💤 Files with no reviewable changes (1)
  • .gitignore
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • src/intent/detector.py
  • src/intent/context.py
  • src/test_clarifier.py
  • tests/test_nl_parser.py
  • src/intent/planner.py
  • src/intent/llm_agent.py
  • src/test_llm_agent.py
  • src/test_intent_detection.py
  • src/intent/clarifier.py
  • test/test_installation_history.py
  • src/test_context.py
  • src/test_planner.py
  • nl_parser.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Maintain >80% test coverage for pull requests

Files:

  • tests/test_nl_parser.py
**/*install*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*install*.py: Dry-run by default for all installations in command execution
No silent sudo execution - require explicit user confirmation
Implement audit logging to ~/.cortex/history.db for all package operations

Files:

  • test/test_installation_history.py
🧠 Learnings (1)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*.py : Type hints required in Python code

Applied to files:

  • requirements.txt
🧬 Code graph analysis (2)
tests/test_nl_parser.py (1)
nl_parser.py (1)
  • parse_request (173-220)
src/intent/llm_agent.py (1)
src/intent/detector.py (2)
  • Intent (7-10)
  • detect (26-49)
🪛 Ruff (0.14.8)
src/intent/planner.py

8-8: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

src/intent/llm_agent.py

69-69: Do not catch blind exception: Exception

(BLE001)


81-81: Do not catch blind exception: Exception

(BLE001)

src/test_llm_agent.py

4-4: Unused method argument: kwargs

(ARG002)


8-8: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


22-22: Unused lambda argument: a

(ARG005)


22-22: Unused lambda argument: k

(ARG005)

🔇 Additional comments (8)
requirements.txt (1)

12-12: Validate PyYAML pin against supported Python versions and security policy

Adding PyYAML==6.0.3 looks fine; just confirm this exact pin is required (vs >=) for your nl parsing/config use cases and that it’s compatible with your minimum Python version and current vulnerability guidance.

src/intent/clarifier.py (1)

6-33: Clarification logic is coherent and matches intended ambiguity handling

The branching in needs_clarification cleanly captures the key ambiguous cases (GPU presence, generic ML tools, CUDA without GPU mention, PyTorch version choice) with clear questions, and types/docstrings are in good shape. No changes needed here.

src/test_context.py (1)

4-13: Test logic is sound; consider moving to tests/ directory.

The test correctly validates SessionContext's core functionality: GPU setting, intent addition, and installed package tracking. For consistency with tests/test_nl_parser.py, consider relocating this file to tests/test_context.py.

tests/test_nl_parser.py (1)

1-37: Comprehensive test coverage aligns with PR requirements.

The test suite covers:

  • 12 parametrized intent cases (exceeds 10+ requirement)
  • Typo tolerance ("kubernets", "pyhton")
  • Slot extraction (python_version, platform)
  • Ambiguity handling with clarification checks
  • Confidence scoring (>= 0.5 threshold)

This satisfies the acceptance criteria from issue #248.

src/intent/llm_agent.py (1)

119-119: The .lower() call is appropriate and correctly designed for this codebase.

The downstream systems (detector.py and planner.py) consistently use lowercase for all internal package matching. The COMMON_PACKAGES dictionary in detector.py (line 17) and GPU_PACKAGES list in planner.py (line 8) define packages as lowercase: "pytorch", "tensorflow", "cuda", etc. All package comparisons in planner.py (lines 29, 32) check against lowercase strings. The lowercasing at line 119 is intentional and necessary—user-facing display strings with proper casing (e.g., "Install PyTorch") are generated separately in the output messages.

nl_parser.py (3)

60-67: LGTM!

The spell correction logic is sound, with a reasonable fuzzy match cutoff of 0.75 to avoid over-correction.


85-97: LGTM!

The semantic matching logic correctly finds the best intent by comparing against all examples. The O(n*m) complexity is acceptable given the small, fixed set of intents and examples.


173-220: LGTM on the overall parsing logic!

The main parsing workflow is well-structured: normalization → spell correction → dual intent detection (rule + semantic) → slot extraction → confidence aggregation → clarification decisions. The cascading intent selection logic (lines 184-195) appropriately prioritizes rule-based matches, validates with semantic agreement, and falls back gracefully to semantic-only or unknown.

Copy link
Member

@mikejmorgan-ai mikejmorgan-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Request Changes

Great work @pavanimanchala53! The NL parsing logic is solid and meets all acceptance criteria. A few structural fixes needed before merge:

Issues to Fix

1. Code Location (High Priority)

  • nl_parser.py is in the root directory but should be in cortex/ package
  • The src/intent/ module creates confusion - Cortex uses cortex/ not src/
  • Please consolidate into cortex/nl_parser.py and cortex/intent/ if needed

2. Tests Location (Medium)

  • Tests in src/test_*.py should be in tests/ directory
  • Move all test files to tests/ to match project convention

3. Duplicate Dependency (Low)

  • requirements.txt now has both pyyaml>=6.0.0 and PyYAML==6.0.3
  • Remove the duplicate line

4. Missing Documentation (Medium)

  • Add docs/NL_PARSER.md with:
    • Feature overview
    • Usage examples
    • API reference

5. Missing Newline

  • Add final newline to tests/test_nl_parser.py

What's Working Well ✅

  • Typo correction (kubernets → kubernetes)
  • Confidence scoring
  • Clarification prompts for ambiguous requests
  • 12+ test cases
  • Clean intent detection logic

Once these structural fixes are done, this is ready to merge. Happy to help if you have questions!


Bounty: $75 upon merge (+ $75 bonus after funding)

@pavanimanchala53 pavanimanchala53 force-pushed the feature/nl-parser-improvements branch from 5dabcd6 to 397ca8a Compare December 13, 2025 04:41
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
cortex/nl_parser.py (1)

184-195: Intent selection logic may suppress valid semantic matches.

When rule_int != "unknown" (lines 187-189), the semantic intent is ignored even if sem_int has a higher confidence score. This could lead to suboptimal intent selection when rules produce lower-confidence matches than semantic matching.

Consider incorporating semantic confidence into the decision:

-    elif rule_int != "unknown":
-        chosen_intent = rule_int
-        c_classifier = 0.0
+    elif rule_int != "unknown":
+        # Prefer semantic if it has significantly higher confidence
+        if c_sem > c_rule + 0.2 and sem_int != "unknown":
+            chosen_intent = sem_int
+            c_classifier = 0.0
+        else:
+            chosen_intent = rule_int
+            c_classifier = 0.0
tests/test_clarifier.py (1)

4-12: Expand test coverage for Clarifier.

The test only verifies that a clarification question is returned for one input. Consider adding:

  1. A test where clarification is NOT needed (high-confidence, unambiguous intent)
  2. A test validating the question content is relevant to GPU/ML context
def test_clarifier_no_clarification_needed():
    """Test that specific requests don't trigger clarification."""
    d = IntentDetector()
    c = Clarifier()
    
    text = "install pytorch with CUDA 12.3"
    intents = d.detect(text)
    
    question = c.needs_clarification(intents, text)
    # Verify behavior for specific requests
    # (assertion depends on expected Clarifier logic)
tests/test_planner.py (2)

13-16: Assertions are tightly coupled to exact output strings.

The assertions depend on exact wording like "Install CUDA 12.3 + drivers". If the planner's output format changes slightly, these tests will fail even if the functionality is correct.

Consider asserting on key characteristics rather than exact strings:

# More resilient assertions
assert any("CUDA" in step for step in plan)
assert any("PyTorch" in step for step in plan)
assert any("GPU" in step and "environment" in step.lower() for step in plan)
assert "Verify" in plan[-1]

4-16: Add test cases for non-GPU and edge scenarios.

The test only covers the GPU/CUDA pipeline. Consider adding tests for:

  1. Non-GPU intents (e.g., web server installation)
  2. Empty intent list
  3. Mixed intents (some GPU, some not)
tests/test_llm_agent.py (1)

3-9: Mock response content is not verified in assertions.

The mock returns "install: tensorflow\ninstall: jupyter" but the test only verifies that plan exists and is non-empty. Consider asserting that the mocked intents actually influence the plan:

# Verify mock intents are reflected in the plan
assert any("tensorflow" in step.lower() or "TensorFlow" in step for step in result["plan"])
tests/test_context.py (1)

4-13: Consider splitting into focused test cases and adding coverage for untested methods.

The single test combines GPU, intents, and installed tracking. Per test best practices, consider:

  1. Splitting into test_context_gpu, test_context_intents, test_context_installed
  2. Adding tests for reset(), add_clarification(), and get_clarifications() methods
def test_context_reset():
    ctx = SessionContext()
    ctx.set_gpu("NVIDIA RTX 4090")
    ctx.add_installed("cuda")
    
    ctx.reset()
    
    assert ctx.get_gpu() is None
    assert ctx.is_installed("cuda") is False
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5dabcd6 and 397ca8a.

📒 Files selected for processing (8)
  • cortex/nl_parser.py (1 hunks)
  • docs/NL_PARSER.md (1 hunks)
  • tests/test_clarifier.py (1 hunks)
  • tests/test_context.py (1 hunks)
  • tests/test_intent_detection.py (1 hunks)
  • tests/test_llm_agent.py (1 hunks)
  • tests/test_nl_parser.py (1 hunks)
  • tests/test_planner.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/NL_PARSER.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_nl_parser.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • tests/test_context.py
  • tests/test_llm_agent.py
  • tests/test_planner.py
  • tests/test_intent_detection.py
  • cortex/nl_parser.py
  • tests/test_clarifier.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Maintain >80% test coverage for pull requests

Files:

  • tests/test_context.py
  • tests/test_llm_agent.py
  • tests/test_planner.py
  • tests/test_intent_detection.py
  • tests/test_clarifier.py
🧠 Learnings (2)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*.py : Docstrings required for all public APIs

Applied to files:

  • cortex/nl_parser.py
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*.py : Type hints required in Python code

Applied to files:

  • cortex/nl_parser.py
🧬 Code graph analysis (5)
tests/test_context.py (1)
src/intent/detector.py (1)
  • Intent (7-10)
tests/test_llm_agent.py (1)
src/intent/llm_agent.py (1)
  • process (46-89)
tests/test_planner.py (1)
src/intent/detector.py (1)
  • Intent (7-10)
tests/test_intent_detection.py (1)
src/intent/detector.py (3)
  • IntentDetector (12-49)
  • Intent (7-10)
  • detect (26-49)
tests/test_clarifier.py (1)
src/intent/detector.py (1)
  • detect (26-49)
🪛 Ruff (0.14.8)
tests/test_llm_agent.py

4-4: Unused method argument: kwargs

(ARG002)


8-8: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


22-22: Unused lambda argument: a

(ARG005)


22-22: Unused lambda argument: k

(ARG005)

🔇 Additional comments (5)
cortex/nl_parser.py (2)

173-220: LGTM: parse_request orchestrates the parsing pipeline correctly.

The function properly sequences normalization, tokenization, spell correction, dual-path intent detection, slot extraction, confidence aggregation, and clarification generation. The explanation output fulfills the PR requirement to show AI reasoning.


147-157: The original review identifies a mathematical scenario that does not actually occur in the code. When examining the parse_request function (lines 184-195), c_classifier is set to 0.95 only when both rule_int and sem_int agree. In this case, both c_rule and c_sem would typically be high (not 0.95 and 1.0 as claimed), resulting in maximum confidence approaching 0.99, not 0.78.

The scenario described in the review—"0.4 * 0.95 + 0.4 * 1.0 + 0.2 * 0.0"—incorrectly mixes the c_classifier=0.95 case (agreement) with the c_classifier=0.0 weighting pattern, which is inconsistent with how the code executes.

The actual design is sound: clarification thresholds (< 0.6 at line 161, < 0.75 at line 168) appropriately trigger when confidence is lower, and test expectations (>= 0.5) are consistent with this behavior.

Likely an incorrect or invalid review comment.

tests/test_llm_agent.py (1)

15-28: LGTM: Mock-based LLM testing approach is appropriate.

The test correctly isolates LLM behavior by injecting a mock, preventing real API calls during testing. The lambda to disable clarification is a valid pattern for controlling test behavior.

tests/test_context.py (1)

1-13: LGTM: Core SessionContext functionality is verified.

The test validates the essential state management: GPU context, intent history, and installation tracking work as expected.

tests/test_intent_detection.py (1)

1-1: LGTM!

The import statement correctly imports the necessary classes from the intent detector module.

@pavanimanchala53 pavanimanchala53 force-pushed the feature/nl-parser-improvements branch from 36ad2fc to 3630728 Compare December 13, 2025 05:06
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/test_nl_parser.py (1)

14-14: This issue was already flagged in a previous review.

The expected intent value should be "unknown" instead of "ambiguous" to match the actual return value from parse_request.

🧹 Nitpick comments (2)
.github/workflows/codeql.yml (1)

85-94: Manual build placeholder will fail if enabled.

The manual build step exits with code 1 as a placeholder. Since the current build-mode is "none", this step won't execute. However, if the build mode is changed to "manual" in the future, the workflow will fail at this step.

This is acceptable as it forces explicit implementation, but consider documenting this behavior in comments or removing the step entirely since Python typically doesn't require manual builds.

tests/test_nl_parser.py (1)

34-37: Consider tightening the platform assertion.

The test accepts both "mac" and "macos" for the input "on mac", but the regex in nl_parser.py (line 125) will capture "mac" exactly as written. The assertion could be more precise:

-    assert r["slots"].get("platform") in ("mac", "macos")
+    assert r["slots"].get("platform") == "mac"

However, the current approach provides flexibility for future implementation changes, so this is optional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36ad2fc and 3630728.

📒 Files selected for processing (3)
  • .github/workflows/codeql.yml (1 hunks)
  • nl_parser.py (1 hunks)
  • tests/test_nl_parser.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • tests/test_nl_parser.py
  • nl_parser.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Maintain >80% test coverage for pull requests

Files:

  • tests/test_nl_parser.py
🧠 Learnings (2)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*.py : Docstrings required for all public APIs

Applied to files:

  • nl_parser.py
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*.py : Type hints required in Python code

Applied to files:

  • nl_parser.py
🧬 Code graph analysis (1)
tests/test_nl_parser.py (1)
nl_parser.py (1)
  • parse_request (173-220)
🔇 Additional comments (9)
.github/workflows/codeql.yml (1)

14-56: LGTM!

The CodeQL workflow configuration is appropriate for the Python project. Triggers are correctly set for push/PR to main with scheduled scans, permissions are properly scoped, and the matrix configuration with build-mode: none is correct for Python (no compilation needed).

tests/test_nl_parser.py (2)

29-32: LGTM!

The test correctly verifies spell correction behavior by checking that the misspelled token appears in the corrections list and that the final intent is correctly determined after correction.


1-37: No issues found. Test coverage is 95%, which exceeds the >80% requirement specified in the coding guidelines.

nl_parser.py (6)

60-67: LGTM!

The spell correction logic correctly uses difflib.get_close_matches with a reasonable cutoff (0.75) and properly tracks whether a correction was made. The approach of returning a tuple (corrected_token, was_corrected) provides clear semantics.


85-97: LGTM!

The semantic intent matching logic correctly compares the input text against all canonical examples using fuzzy matching and selects the intent with the highest similarity score. The approach is straightforward and appropriate for the lightweight NLP requirements.


100-121: Rule-based intent detection provides good coverage.

The keyword-based rules provide a solid fallback with reasonable priority ordering (Docker+K8s > Docker > K8s > web > Python > ML). The confidence scores (0.9-0.95) are appropriately high for exact keyword matches.

Note: The kubernetes-only case (lines 109-110) was flagged in a previous review for clarification.


124-144: Slot extraction logic is well-structured.

The regex patterns correctly use word boundaries and capture groups for version numbers, platforms, and packages. The deduplication technique using dict.fromkeys() (line 142) efficiently preserves order while removing duplicates.


147-157: Confidence aggregation formula is sound.

The weighted average (40% rule, 40% semantic, 20% classifier) with spell-correction penalty (10% per correction) provides reasonable confidence scoring. The penalty clamping ensures graceful degradation with multiple corrections, and the final clamping to [0.0, 1.0] prevents edge cases.


173-220: Main parse_request workflow is well-orchestrated.

The function correctly sequences normalization, spell correction, dual intent detection (rule + semantic), intent selection logic, slot extraction, confidence aggregation, and clarifications. The intent selection logic (lines 184-195) appropriately prioritizes agreement between methods and provides sensible fallbacks.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (12)
cortex/nl_parser.py (12)

1-4: Redundant import already flagged in previous review.


7-13: Missing type annotation already flagged in previous review.


16-45: Missing type annotation and logic issues already flagged in previous reviews.


48-53: Missing docstring already flagged in previous review.


56-57: Missing docstring already flagged in previous review.


70-78: Missing docstring already flagged in previous review.


81-82: Missing docstring already flagged in previous review.


100-121: Kubernetes-only intent handling already flagged; minor: unnecessary variable alias.

Line 102 (t = text) creates an alias that doesn't add clarity. Consider using text directly.

 def rule_intent(text: str) -> Tuple[str, float]:
     """Simple keyword/rule-based detection."""
-    t = text
-
-    if "docker" in t:
-        if "kubernetes" in t or "k8s" in t or "kubectl" in t:
+    if "docker" in text:
+        if "kubernetes" in text or "k8s" in text or "kubectl" in text:
             return "install_docker_k8s", 0.95
         return "install_docker", 0.9
-
-    if "kubernetes" in t or "k8s" in t or "kubectl" in t:
+    if "kubernetes" in text or "k8s" in text or "kubectl" in text:
         return "install_docker_k8s", 0.9
-
-    if "nginx" in t or "apache" in t or "httpd" in t or "web server" in t:
+    if "nginx" in text or "apache" in text or "httpd" in text or "web server" in text:
         return "install_web_server", 0.9
-
-    if "python" in t or "venv" in t or "conda" in t or "anaconda" in t:
+    if "python" in text or "venv" in text or "conda" in text or "anaconda" in text:
         return "setup_python_env", 0.9
-
-    if any(word in t for word in ("tensorflow", "pytorch", "torch", "machine learning", "ml")):
+    if any(word in text for word in ("tensorflow", "pytorch", "torch", "machine learning", "ml")):
         return "install_ml", 0.9
-
     return "unknown", 0.0

124-126: Missing type annotations already flagged in previous review.


129-144: Missing docstring already flagged in previous review.


147-157: Missing type hints and docstring already flagged in previous review.


160-170: Missing type hints and docstring already flagged in previous review.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3630728 and 94ea467.

📒 Files selected for processing (2)
  • cortex/nl_parser.py (1 hunks)
  • docs/NL_PARSER.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/NL_PARSER.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • cortex/nl_parser.py
🧠 Learnings (2)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*.py : Docstrings required for all public APIs

Applied to files:

  • cortex/nl_parser.py
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*.py : Type hints required in Python code

Applied to files:

  • cortex/nl_parser.py
🔇 Additional comments (5)
cortex/nl_parser.py (5)

60-67: LGTM!

The function has type hints and a brief docstring. The spell correction logic using get_close_matches with a 0.75 cutoff is reasonable for typo tolerance.


85-97: LGTM!

The semantic matching logic is straightforward and correct. The function properly returns the best-matching intent with its score, allowing callers to apply their own thresholds.


184-195: LGTM with note on design choice.

The intent resolution prioritizes rule-based detection over semantic matching, which is a reasonable design for reliability. When both methods agree, the confidence bonus (c_classifier=0.95) effectively rewards consensus.


205-211: LGTM!

The explanation string provides good transparency into the parser's reasoning, showing spell corrections and both rule-based and semantic scores. This meets the PR objective of displaying reasoning to the user.


213-220: LGTM!

The return structure is well-organized with all required fields for the cortex install flow: intent, confidence, explanation, slots, corrections, and clarifications.

@sonarqubecloud
Copy link

@pavanimanchala53
Copy link
Author

Code Review: Request Changes

Great work @pavanimanchala53! The NL parsing logic is solid and meets all acceptance criteria. A few structural fixes needed before merge:

Issues to Fix

1. Code Location (High Priority)

  • nl_parser.py is in the root directory but should be in cortex/ package
  • The src/intent/ module creates confusion - Cortex uses cortex/ not src/
  • Please consolidate into cortex/nl_parser.py and cortex/intent/ if needed

2. Tests Location (Medium)

  • Tests in src/test_*.py should be in tests/ directory
  • Move all test files to tests/ to match project convention

3. Duplicate Dependency (Low)

  • requirements.txt now has both pyyaml>=6.0.0 and PyYAML==6.0.3
  • Remove the duplicate line

4. Missing Documentation (Medium)

  • Add docs/NL_PARSER.md with:

    • Feature overview
    • Usage examples
    • API reference

5. Missing Newline

  • Add final newline to tests/test_nl_parser.py

What's Working Well ✅

  • Typo correction (kubernets → kubernetes)
  • Confidence scoring
  • Clarification prompts for ambiguous requests
  • 12+ test cases
  • Clean intent detection logic

Once these structural fixes are done, this is ready to merge. Happy to help if you have questions!

Bounty: $75 upon merge (+ $75 bonus after funding)

Hi @mikejmorgan-ai ,

This branch is ready for review. Please let me know if everything looks good so we can proceed with the merge.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Natural Language Install: Polish and Edge Cases

2 participants