[issue-2528] [SDK] Add Structured Output Compliance evaluation metric #2554

Vikaspal8923 · 2025-06-23T14:03:19Z

Details

Implemented the "Structured Output Compliance" evaluation metric, which validates model outputs as JSON/JSON-LD and returns a boolean result plus a "reason."
This extends the LLM-as-a-judge evaluation in both the frontend (Online Evaluation tab) and the Python SDK.

Change checklist

Code follows repository coding style
Added/updated unit and integration tests
Documentation updated where needed

Issues

Closes #2558
Resolves #2528
/claim #2528

Testing

Verified that the metric appears and works in the Online Evaluation tab (frontend)
Ran SDK unit + integration tests with sample structured output inputs

Documentation

Updated /docs/evaluation/metrics/structure_output_compliance.mdx with new metric details
Updated changelog with feature entry
Added usage example in SDK docs

Demo

video ::
https://github.com/user-attachments/assets/47ffd3e9-6642-4678-9e72-87765c747bac

Vikaspal8923 · 2025-07-01T16:23:13Z

@vincentkoc Can I get a review on this if there is no update from the first PR raised by the other person?

andrescrz · 2025-07-14T10:45:51Z

Hi @Vikaspal8923

Apologies for the delay in reviewing this PR. Could you please resolve the current conflicts? I’ll make sure someone reviews it as soon as possible once the conflicts are addressed.

Thank you for your patience!

Vikaspal8923 · 2025-07-28T09:25:27Z

@andrescrz Sorry for the delay from my side. I will resolve the conflict soon.

Vikaspal8923 · 2025-07-30T18:02:10Z

@andrescrz I have resolved the conflict. You can review now and let me know if any changes are needed in the PR; I would like to address them.

Vikaspal8923 · 2025-08-07T05:26:19Z

Hi @andrescrz, sorry to message again, but could I get a review on my PR?

Copilot

Pull Request Overview

This PR introduces a new "Structured Output Compliance" evaluation metric that validates whether LLM outputs conform to expected JSON schemas or valid JSON format. The metric uses LLM-as-a-judge approach and is integrated into both the Python SDK and frontend UI for online evaluations.

Key changes:

Implementation of the StructuredOutputCompliance metric in the Python SDK with template, parser, and metric components
Frontend integration adding the metric to LLM judge options and UI templates
Documentation for the new metric with usage examples

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
sdks/python/src/opik/evaluation/metrics/llm_judges/structure_output_compliance/template.py	Defines the prompt template and query generation for structured output validation
sdks/python/src/opik/evaluation/metrics/llm_judges/structure_output_compliance/parser.py	Parses LLM output and validates the response format for the compliance metric
sdks/python/src/opik/evaluation/metrics/llm_judges/structure_output_compliance/metric.py	Main metric implementation with sync/async scoring methods
sdks/python/src/opik/evaluation/metrics/init.py	Exports the new StructuredOutputCompliance metric
sdks/python/examples/metrics.py	Adds usage example for the new metric
apps/opik-frontend/src/types/llm.ts	Adds structure_compliance to LLM_JUDGE enum
apps/opik-frontend/src/constants/llm.ts	Defines frontend template configuration for structured output compliance
apps/opik-documentation/documentation/fern/docs/evaluation/metrics/structure_output_compliance.mdx	Documentation for the new metric with examples and usage

sdks/python/src/opik/evaluation/metrics/llm_judges/structure_output_compliance/template.py

sdks/python/src/opik/evaluation/metrics/llm_judges/structure_output_compliance/parser.py

sdks/python/examples/metrics.py

...pik-documentation/documentation/fern/docs/evaluation/metrics/structure_output_compliance.mdx

yaricom · 2025-08-07T10:45:56Z

Hi @Vikaspal8923 ! Thank you for contribution! Please fix linter errors:

You need to install pre-commit as described in Contribution guide and run it locally:

cd sdks/python
pre-commit run --all-files

sdks/python/src/opik/evaluation/metrics/llm_judges/structure_output_compliance/template.py

sdks/python/src/opik/evaluation/metrics/llm_judges/structure_output_compliance/parser.py

sdks/python/src/opik/evaluation/metrics/llm_judges/structure_output_compliance/metric.py

yaricom · 2025-08-07T10:56:05Z

@Vikaspal8923 The metric implementation looks very promising. However, you should add unit and integration tests to ensure reliability. I’ve left comments highlighting the areas where tests are needed.

Please refer to how other metrics are covered in Opik.

Vikaspal8923 · 2025-08-07T18:31:01Z

@yaricom sure

Co-authored-by: Copilot <[email protected]>

…utput_compliance/template.py Co-authored-by: Copilot <[email protected]>

…rics/structure_output_compliance.mdx Co-authored-by: Copilot <[email protected]>

yaricom · 2025-08-28T14:37:40Z

Hi @Vikaspal8923 ! You can register and get your own OpenAI API key at https://platform.openai.com

Vikaspal8923 · 2025-08-29T11:48:01Z

@yaricom, I have tested all the integration tests and fixed test failures. Can you take a look and let me know if there are any further changes?
screenshot :

yaricom · 2025-08-29T13:17:53Z

Hi @Vikaspal8923 ! Thank you for contributing!

Please fix:

…nce' into add_evaluation_structure_compliance

Vikaspal8923 · 2025-09-02T15:13:26Z

@yaricom any changes I have to address or is it ready now ?

yaricom · 2025-09-02T17:29:56Z

@Vikaspal8923 Please fix this error:

https://github.com/comet-ml/opik/actions/runs/17410830550/job/49427360921?pr=2554

### 📋 PR Linter Failed\\n\\n❌ **Invalid Title Format.** Your PR title must include a ticket/issue number and may optionally include component tags (`[FE]`, `[BE]`, etc.).\\n\\n  - **Internal contributors: Open a JIRA ticket and link to it:** `[OPIK-xxxx] [COMPONENT] Your change`\\n  - **External contributors: Open a Github Issue and link to it via its number:** `[issue-xxxx] [COMPONENT] Your change`\\n\\n  *Example: `[issue-3108] [BE] [FE] Fix authentication bug` or `[OPIK-1234] Fix bug`*\\n\\n---\\n\\n❌ **Missing Section.** The description is missing the `## Change checklist` section.\\n\\n---\\n\\n❌ **Missing Section.** The description is missing the `## Issues` section.\\n\\n---\\n\\n❌ **Missing Section.** The description is missing the `## Testing` section.\\n\\n---\\n\\n❌ **Missing Section.** The description is missing the `## Documentation` section."

Vikaspal8923 · 2025-09-03T05:53:20Z

@yaricom, I’ve updated the PR title and description to follow the required format. Please recheck it and let me know if any issues persist

yaricom · 2025-09-03T17:41:47Z

@Vikaspal8923 please add integration test that checks JSON schema as you mentioned in the example usage.

@model_parametrizer
def test__structured_output_compliance__with_json_schema(model):
    """Test structured output compliance with schema validation."""
    structured_output_metric = metrics.StructuredOutputCompliance(
        model=model, track=False
    )
    schema = '{"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "integer"}}, "required": ["name", "age"]}'

    result = structured_output_metric.score(
        output='{"name": "John", "age": 30}', schema=schema
    )

    assert_helpers.assert_score_result(result)
    assert result.value > 0.5

…compliance metric

…cture_compliance

…com/Vikaspal8923/opik into add_evaluation_structure_compliance

Vikaspal8923 · 2025-09-04T06:55:32Z

@yaricom Added 👍.

yaricom · 2025-09-04T11:25:47Z

sdks/python/src/opik/evaluation/metrics/llm_judges/structure_output_compliance/metric.py

+LOGGER = logging.getLogger(__name__)
+
+
+class StructuredOutputComplianceResponseFormat(pydantic.BaseModel):


Please move this class (StructuredOutputComplianceResponseFormat) into schema.py module.

… schema.py

…com/Vikaspal8923/opik into add_evaluation_structure_compliance

Vikaspal8923 · 2025-09-05T05:04:18Z

@yaricom Done 👍.

yaricom · 2025-09-05T13:28:04Z

@Vikaspal8923 Thank you for contribution!

add evaluation structure output compliance

a5e8669

Vikaspal8923 requested a review from a team as a code owner June 23, 2025 14:03

algora-pbc bot added the 🙋 Bounty claim label Jun 23, 2025

algora-pbc bot mentioned this pull request Jun 23, 2025

[FR]: New Evaluaton Metric "Structured Output Compliance" #2528

Closed

update doc

7b514b5

Vikaspal8923 requested a review from a team as a code owner June 23, 2025 14:40

aadereiko self-assigned this Jun 25, 2025

resolve conflict

d0b43b1

andrescrz assigned yaricom, andriidudar, awkoy, alexkuzmik and jacques-comet Jul 31, 2025

Merge branch 'main' into add_evaluation_structure_compliance

06dd741

vincentkoc requested review from Copilot and alexkuzmik August 7, 2025 10:36

Copilot AI reviewed Aug 7, 2025

View reviewed changes

yaricom requested changes Aug 7, 2025

View reviewed changes

Vikaspal8923 and others added 4 commits August 8, 2025 00:08

Update sdks/python/examples/metrics.py

6bf1e3b

Co-authored-by: Copilot <[email protected]>

Update sdks/python/src/opik/evaluation/metrics/llm_judges/structure_o…

ca15670

…utput_compliance/template.py Co-authored-by: Copilot <[email protected]>

Update apps/opik-documentation/documentation/fern/docs/evaluation/met…

2e450a4

…rics/structure_output_compliance.mdx Co-authored-by: Copilot <[email protected]>

Update apps/opik-documentation/documentation/fern/docs/evaluation/met…

716b7f2

…rics/structure_output_compliance.mdx Co-authored-by: Copilot <[email protected]>

fixed integration test failing

46f4d57

Merge branch 'main' into add_evaluation_structure_compliance

128ab88

Vikaspal8923 added 2 commits August 29, 2025 23:39

fix parser unit test failing

f1756be

Merge remote-tracking branch 'origin/add_evaluation_structure_complia…

ba50ef6

…nce' into add_evaluation_structure_compliance

Merge branch 'main' into add_evaluation_structure_compliance

9aa6195

Vikaspal8923 changed the title ~~add evaluation structure output compliance~~ [issue-2554] [SDK] Add Structured Output Compliance evaluation metric Sep 3, 2025

Vikaspal8923 changed the title ~~[issue-2554] [SDK] Add Structured Output Compliance evaluation metric~~ [issue-2558] [SDK] Add Structured Output Compliance evaluation metric Sep 3, 2025

Vikaspal8923 changed the title ~~[issue-2558] [SDK] Add Structured Output Compliance evaluation metric~~ [issue-2528] [SDK] Add Structured Output Compliance evaluation metric Sep 3, 2025

Merge branch 'main' into add_evaluation_structure_compliance

76bb192

Vikaspal8923 added 3 commits September 4, 2025 12:20

Add New integration test to validate json schema format in strucutre …

ecdf3d6

…compliance metric

Merge remote-tracking branch 'upstream/main' into add_evaluation_stru…

73c053c

…cture_compliance

Merge branch 'add_evaluation_structure_compliance' of https://github.…

1f8f5a8

…com/Vikaspal8923/opik into add_evaluation_structure_compliance

Merge branch 'main' into add_evaluation_structure_compliance

f39275e

yaricom requested changes Sep 4, 2025

View reviewed changes

Vikaspal8923 added 2 commits September 5, 2025 10:29

move StructuredOutputComplianceResponseFormat class from metric.py to…

70b09c5

… schema.py

Merge branch 'add_evaluation_structure_compliance' of https://github.…

6bbb537

…com/Vikaspal8923/opik into add_evaluation_structure_compliance

yaricom added 2 commits September 5, 2025 15:35

Merge branch 'main' into add_evaluation_structure_compliance

0572516

Merge branch 'main' into add_evaluation_structure_compliance

c070037

yaricom approved these changes Sep 5, 2025

View reviewed changes

yaricom merged commit 8d21105 into comet-ml:main Sep 5, 2025

		LOGGER = logging.getLogger(__name__)


		class StructuredOutputComplianceResponseFormat(pydantic.BaseModel):

[issue-2528] [SDK] Add Structured Output Compliance evaluation metric #2554

[issue-2528] [SDK] Add Structured Output Compliance evaluation metric #2554

Uh oh!

Conversation

Vikaspal8923 commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Change checklist

Issues

Testing

Documentation

Demo

Uh oh!

Vikaspal8923 commented Jul 1, 2025

Uh oh!

andrescrz commented Jul 14, 2025

Uh oh!

Vikaspal8923 commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vikaspal8923 commented Jul 30, 2025

Uh oh!

Vikaspal8923 commented Aug 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yaricom commented Aug 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yaricom commented Aug 7, 2025

Uh oh!

Vikaspal8923 commented Aug 7, 2025

Uh oh!

yaricom commented Aug 28, 2025

Uh oh!

Vikaspal8923 commented Aug 29, 2025

Uh oh!

yaricom commented Aug 29, 2025

Uh oh!

Vikaspal8923 commented Sep 2, 2025

Uh oh!

yaricom commented Sep 2, 2025 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vikaspal8923 commented Sep 3, 2025

Uh oh!

yaricom commented Sep 3, 2025

Uh oh!

Vikaspal8923 commented Sep 4, 2025

Uh oh!

yaricom Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Vikaspal8923 commented Sep 5, 2025

Uh oh!

yaricom commented Sep 5, 2025

Uh oh!

Uh oh!

Vikaspal8923 commented Jun 23, 2025 •

edited

Loading

Vikaspal8923 commented Jul 28, 2025 •

edited

Loading

yaricom commented Sep 2, 2025 •

edited by atlassian bot

Loading

yaricom Sep 4, 2025 •

edited

Loading