Skip to content

Conversation

Vikaspal8923
Copy link
Contributor

@Vikaspal8923 Vikaspal8923 commented Jun 23, 2025

Details

Implemented the "Structured Output Compliance" evaluation metric, which validates model outputs as JSON/JSON-LD and returns a boolean result plus a "reason."
This extends the LLM-as-a-judge evaluation in both the frontend (Online Evaluation tab) and the Python SDK.

Change checklist

  • Code follows repository coding style
  • Added/updated unit and integration tests
  • Documentation updated where needed

Issues

Closes #2558
Resolves #2528
/claim #2528

Testing

  • Verified that the metric appears and works in the Online Evaluation tab (frontend)
  • Ran SDK unit + integration tests with sample structured output inputs

Documentation

  • Updated /docs/evaluation/metrics/structure_output_compliance.mdx with new metric details
  • Updated changelog with feature entry
  • Added usage example in SDK docs

Demo

video ::
https://github.com/user-attachments/assets/47ffd3e9-6642-4678-9e72-87765c747bac

@Vikaspal8923 Vikaspal8923 requested a review from a team as a code owner June 23, 2025 14:40
@aadereiko aadereiko self-assigned this Jun 25, 2025
@Vikaspal8923
Copy link
Contributor Author

@vincentkoc Can I get a review on this if there is no update from the first PR raised by the other person?

@andrescrz
Copy link
Member

Hi @Vikaspal8923

Apologies for the delay in reviewing this PR. Could you please resolve the current conflicts? I’ll make sure someone reviews it as soon as possible once the conflicts are addressed.

Thank you for your patience!

@Vikaspal8923
Copy link
Contributor Author

Vikaspal8923 commented Jul 28, 2025

@andrescrz Sorry for the delay from my side. I will resolve the conflict soon.

@Vikaspal8923
Copy link
Contributor Author

@andrescrz I have resolved the conflict. You can review now and let me know if any changes are needed in the PR; I would like to address them.

@Vikaspal8923
Copy link
Contributor Author

Hi @andrescrz, sorry to message again, but could I get a review on my PR?

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new "Structured Output Compliance" evaluation metric that validates whether LLM outputs conform to expected JSON schemas or valid JSON format. The metric uses LLM-as-a-judge approach and is integrated into both the Python SDK and frontend UI for online evaluations.

Key changes:

  • Implementation of the StructuredOutputCompliance metric in the Python SDK with template, parser, and metric components
  • Frontend integration adding the metric to LLM judge options and UI templates
  • Documentation for the new metric with usage examples

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
sdks/python/src/opik/evaluation/metrics/llm_judges/structure_output_compliance/template.py Defines the prompt template and query generation for structured output validation
sdks/python/src/opik/evaluation/metrics/llm_judges/structure_output_compliance/parser.py Parses LLM output and validates the response format for the compliance metric
sdks/python/src/opik/evaluation/metrics/llm_judges/structure_output_compliance/metric.py Main metric implementation with sync/async scoring methods
sdks/python/src/opik/evaluation/metrics/init.py Exports the new StructuredOutputCompliance metric
sdks/python/examples/metrics.py Adds usage example for the new metric
apps/opik-frontend/src/types/llm.ts Adds structure_compliance to LLM_JUDGE enum
apps/opik-frontend/src/constants/llm.ts Defines frontend template configuration for structured output compliance
apps/opik-documentation/documentation/fern/docs/evaluation/metrics/structure_output_compliance.mdx Documentation for the new metric with examples and usage

@yaricom
Copy link
Contributor

yaricom commented Aug 7, 2025

Hi @Vikaspal8923 ! Thank you for contribution! Please fix linter errors:

You need to install pre-commit as described in Contribution guide and run it locally:

cd sdks/python
pre-commit run --all-files

@yaricom
Copy link
Contributor

yaricom commented Aug 7, 2025

@Vikaspal8923 The metric implementation looks very promising. However, you should add unit and integration tests to ensure reliability. I’ve left comments highlighting the areas where tests are needed.

Please refer to how other metrics are covered in Opik.

@Vikaspal8923
Copy link
Contributor Author

@yaricom sure

Vikaspal8923 and others added 4 commits August 8, 2025 00:08
@yaricom
Copy link
Contributor

yaricom commented Aug 28, 2025

Hi @Vikaspal8923 ! You can register and get your own OpenAI API key at https://platform.openai.com

@Vikaspal8923
Copy link
Contributor Author

@yaricom, I have tested all the integration tests and fixed test failures. Can you take a look and let me know if there are any further changes?
screenshot :
image

@yaricom
Copy link
Contributor

yaricom commented Aug 29, 2025

@Vikaspal8923
Copy link
Contributor Author

@yaricom any changes I have to address or is it ready now ?

@yaricom
Copy link
Contributor

yaricom commented Sep 2, 2025

@Vikaspal8923 Please fix this error:

https://github.com/comet-ml/opik/actions/runs/17410830550/job/49427360921?pr=2554

### 📋 PR Linter Failed\\n\\n❌ **Invalid Title Format.** Your PR title must include a ticket/issue number and may optionally include component tags (`[FE]`, `[BE]`, etc.).\\n\\n  - **Internal contributors: Open a JIRA ticket and link to it:** `[OPIK-xxxx] [COMPONENT] Your change`\\n  - **External contributors: Open a Github Issue and link to it via its number:** `[issue-xxxx] [COMPONENT] Your change`\\n\\n  *Example: `[issue-3108] [BE] [FE] Fix authentication bug` or `[OPIK-1234] Fix bug`*\\n\\n---\\n\\n❌ **Missing Section.** The description is missing the `## Change checklist` section.\\n\\n---\\n\\n❌ **Missing Section.** The description is missing the `## Issues` section.\\n\\n---\\n\\n❌ **Missing Section.** The description is missing the `## Testing` section.\\n\\n---\\n\\n❌ **Missing Section.** The description is missing the `## Documentation` section."

@Vikaspal8923 Vikaspal8923 changed the title add evaluation structure output compliance [issue-2554] [SDK] Add Structured Output Compliance evaluation metric Sep 3, 2025
@Vikaspal8923 Vikaspal8923 changed the title [issue-2554] [SDK] Add Structured Output Compliance evaluation metric [issue-2558] [SDK] Add Structured Output Compliance evaluation metric Sep 3, 2025
@Vikaspal8923 Vikaspal8923 changed the title [issue-2558] [SDK] Add Structured Output Compliance evaluation metric [issue-2528] [SDK] Add Structured Output Compliance evaluation metric Sep 3, 2025
@Vikaspal8923
Copy link
Contributor Author

@yaricom, I’ve updated the PR title and description to follow the required format. Please recheck it and let me know if any issues persist

@yaricom
Copy link
Contributor

yaricom commented Sep 3, 2025

@Vikaspal8923 please add integration test that checks JSON schema as you mentioned in the example usage.

@model_parametrizer
def test__structured_output_compliance__with_json_schema(model):
    """Test structured output compliance with schema validation."""
    structured_output_metric = metrics.StructuredOutputCompliance(
        model=model, track=False
    )
    schema = '{"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "integer"}}, "required": ["name", "age"]}'

    result = structured_output_metric.score(
        output='{"name": "John", "age": 30}', schema=schema
    )

    assert_helpers.assert_score_result(result)
    assert result.value > 0.5

@Vikaspal8923
Copy link
Contributor Author

@yaricom Added 👍.
Screenshot 2025-09-04 121536

LOGGER = logging.getLogger(__name__)


class StructuredOutputComplianceResponseFormat(pydantic.BaseModel):
Copy link
Contributor

@yaricom yaricom Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this class (StructuredOutputComplianceResponseFormat) into schema.py module.

@Vikaspal8923
Copy link
Contributor Author

@yaricom Done 👍.

@yaricom yaricom merged commit 8d21105 into comet-ml:main Sep 5, 2025
@yaricom
Copy link
Contributor

yaricom commented Sep 5, 2025

@Vikaspal8923 Thank you for contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR]: New Evaluaton Metric "Structured Output Compliance"
9 participants