diff --git a/docs/ai/conceptual/evaluation-libraries.md b/docs/ai/conceptual/evaluation-libraries.md index bb5be136cb82e..9fb92323e0833 100644 --- a/docs/ai/conceptual/evaluation-libraries.md +++ b/docs/ai/conceptual/evaluation-libraries.md @@ -4,9 +4,9 @@ description: Learn about the Microsoft.Extensions.AI.Evaluation libraries, which ms.topic: concept-article ms.date: 05/13/2025 --- -# The Microsoft.Extensions.AI.Evaluation libraries (Preview) +# The Microsoft.Extensions.AI.Evaluation libraries -The Microsoft.Extensions.AI.Evaluation libraries (currently in preview) simplify the process of evaluating the quality and accuracy of responses generated by AI models in .NET intelligent apps. Various metrics measure aspects like relevance, truthfulness, coherence, and completeness of the responses. Evaluations are crucial in testing, because they help ensure that the AI model performs as expected and provides reliable and accurate results. +The Microsoft.Extensions.AI.Evaluation libraries simplify the process of evaluating the quality and accuracy of responses generated by AI models in .NET intelligent apps. Various metrics measure aspects like relevance, truthfulness, coherence, and completeness of the responses. Evaluations are crucial in testing, because they help ensure that the AI model performs as expected and provides reliable and accurate results. The evaluation libraries, which are built on top of the [Microsoft.Extensions.AI abstractions](../microsoft-extensions-ai.md), are composed of the following NuGet packages: @@ -31,16 +31,16 @@ You can also customize to add your own evaluations by implementing the | -| `Completeness` | Evaluates how comprehensive and accurate a response is | | -| `Retrieval` | Evaluates performance in retrieving information for additional context | | -| `Fluency` | Evaluates grammatical accuracy, vocabulary range, sentence complexity, and overall readability| | -| `Coherence` | Evaluates the logical and orderly presentation of ideas | | -| `Equivalence` | Evaluates the similarity between the generated text and its ground truth with respect to a query | | -| `Groundedness` | Evaluates how well a generated response aligns with the given context | | -| `Relevance (RTC)`, `Truth (RTC)`, and `Completeness (RTC)` | Evaluates how relevant, truthful, and complete a response is | † | +| Evaluator type | Metric | Description | +|----------------------------------------------------------------------|-------------|-------------| +| | `Relevance` | Evaluates how relevant a response is to a query | +| | `Completeness` | Evaluates how comprehensive and accurate a response is | +| | `Retrieval` | Evaluates performance in retrieving information for additional context | +| | `Fluency` | Evaluates grammatical accuracy, vocabulary range, sentence complexity, and overall readability| +| | `Coherence` | Evaluates the logical and orderly presentation of ideas | +| | `Equivalence` | Evaluates the similarity between the generated text and its ground truth with respect to a query | +| | `Groundedness` | Evaluates how well a generated response aligns with the given context | +| † | `Relevance (RTC)`, `Truth (RTC)`, and `Completeness (RTC)` | Evaluates how relevant, truthful, and complete a response is | † This evaluator is marked [experimental](../../fundamentals/syslib-diagnostics/experimental-overview.md). @@ -48,17 +48,17 @@ Quality evaluators measure response quality. They use an LLM to perform the eval Safety evaluators check for presence of harmful, inappropriate, or unsafe content in a response. They rely on the Azure AI Foundry Evaluation service, which uses a model that's fine tuned to perform evaluations. -| Metric | Description | Evaluator type | -|--------------------|-----------------------------------------------------------------------|------------------------------| -| `Groundedness Pro` | Uses a fine-tuned model hosted behind the Azure AI Foundry Evaluation service to evaluate how well a generated response aligns with the given context | | -| `Protected Material` | Evaluates response for the presence of protected material | | -| `Ungrounded Attributes` | Evaluates a response for the presence of content that indicates ungrounded inference of human attributes | | -| `Hate And Unfairness` | Evaluates a response for the presence of content that's hateful or unfair | † | -| `Self Harm` | Evaluates a response for the presence of content that indicates self harm | † | -| `Violence` | Evaluates a response for the presence of violent content | † | -| `Sexual` | Evaluates a response for the presence of sexual content | † | -| `Code Vulnerability` | Evaluates a response for the presence of vulnerable code | | -| `Indirect Attack` | Evaluates a response for the presence of indirect attacks, such as manipulated content, intrusion, and information gathering | | +| Evaluator type | Metric | Description | +|---------------------------------------------------------------------------|--------------------|-------------| +| | `Groundedness Pro` | Uses a fine-tuned model hosted behind the Azure AI Foundry Evaluation service to evaluate how well a generated response aligns with the given context | +| | `Protected Material` | Evaluates response for the presence of protected material | +| | `Ungrounded Attributes` | Evaluates a response for the presence of content that indicates ungrounded inference of human attributes | +| † | `Hate And Unfairness` | Evaluates a response for the presence of content that's hateful or unfair | +| † | `Self Harm` | Evaluates a response for the presence of content that indicates self harm | +| † | `Violence` | Evaluates a response for the presence of violent content | +| † | `Sexual` | Evaluates a response for the presence of sexual content | +| | `Code Vulnerability` | Evaluates a response for the presence of vulnerable code | +| | `Indirect Attack` | Evaluates a response for the presence of indirect attacks, such as manipulated content, intrusion, and information gathering | † In addition, the provides single-shot evaluation for the four metrics supported by `HateAndUnfairnessEvaluator`, `SelfHarmEvaluator`, `ViolenceEvaluator`, and `SexualEvaluator`. diff --git a/docs/ai/quickstarts/build-chat-app.md b/docs/ai/quickstarts/build-chat-app.md index fa15f67858db4..572a399e990e5 100644 --- a/docs/ai/quickstarts/build-chat-app.md +++ b/docs/ai/quickstarts/build-chat-app.md @@ -14,9 +14,6 @@ zone_pivot_groups: openai-library In this quickstart, you learn how to create a conversational .NET console chat app using an OpenAI or Azure OpenAI model. The app uses the library so you can write code using AI abstractions rather than a specific SDK. AI abstractions enable you to change the underlying AI model with minimal code changes. -> [!NOTE] -> The [`Microsoft.Extensions.AI`](https://www.nuget.org/packages/Microsoft.Extensions.AI/) library is currently in Preview. - :::zone target="docs" pivot="openai" [!INCLUDE [openai-prereqs](includes/prerequisites-openai.md)] diff --git a/docs/ai/quickstarts/evaluate-ai-response.md b/docs/ai/quickstarts/evaluate-ai-response.md index e515cc9b7caff..c2a84bf605dad 100644 --- a/docs/ai/quickstarts/evaluate-ai-response.md +++ b/docs/ai/quickstarts/evaluate-ai-response.md @@ -1,19 +1,17 @@ --- -title: Quickstart - Evaluate a model's response +title: Quickstart - Evaluate the quality of a model's response description: Learn how to create an MSTest app to evaluate the AI chat response of a language model. ms.date: 03/18/2025 ms.topic: quickstart ms.custom: devx-track-dotnet, devx-track-dotnet-ai --- -# Evaluate a model's response +# Evaluate the quality of a model's response -In this quickstart, you create an MSTest app to evaluate the chat response of an OpenAI model. The test app uses the [Microsoft.Extensions.AI.Evaluation](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation) libraries. +In this quickstart, you create an MSTest app to evaluate the quality of a chat response from an OpenAI model. The test app uses the [Microsoft.Extensions.AI.Evaluation](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation) libraries. > [!NOTE] -> -> - The `Microsoft.Extensions.AI.Evaluation` library is currently in Preview. -> - This quickstart demonstrates the simplest usage of the evaluation API. Notably, it doesn't demonstrate use of the [response caching](../conceptual/evaluation-libraries.md#cached-responses) and [reporting](../conceptual/evaluation-libraries.md#reporting) functionality, which are important if you're authoring unit tests that run as part of an "offline" evaluation pipeline. The scenario shown in this quickstart is suitable in use cases such as "online" evaluation of AI responses within production code and logging scores to telemetry, where caching and reporting aren't relevant. For a tutorial that demonstrates the caching and reporting functionality, see [Tutorial: Evaluate a model's response with response caching and reporting](../tutorials/evaluate-with-reporting.md) +> This quickstart demonstrates the simplest usage of the evaluation API. Notably, it doesn't demonstrate use of the [response caching](../conceptual/evaluation-libraries.md#cached-responses) and [reporting](../conceptual/evaluation-libraries.md#reporting) functionality, which are important if you're authoring unit tests that run as part of an "offline" evaluation pipeline. The scenario shown in this quickstart is suitable in use cases such as "online" evaluation of AI responses within production code and logging scores to telemetry, where caching and reporting aren't relevant. For a tutorial that demonstrates the caching and reporting functionality, see [Tutorial: Evaluate a model's response with response caching and reporting](../tutorials/evaluate-with-reporting.md) ## Prerequisites @@ -39,9 +37,9 @@ Complete the following steps to create an MSTest project that connects to the `g ```dotnetcli dotnet add package Azure.AI.OpenAI dotnet add package Azure.Identity - dotnet add package Microsoft.Extensions.AI.Abstractions --prerelease - dotnet add package Microsoft.Extensions.AI.Evaluation --prerelease - dotnet add package Microsoft.Extensions.AI.Evaluation.Quality --prerelease + dotnet add package Microsoft.Extensions.AI.Abstractions + dotnet add package Microsoft.Extensions.AI.Evaluation + dotnet add package Microsoft.Extensions.AI.Evaluation.Quality dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease dotnet add package Microsoft.Extensions.Configuration dotnet add package Microsoft.Extensions.Configuration.UserSecrets @@ -51,9 +49,9 @@ Complete the following steps to create an MSTest project that connects to the `g ```bash dotnet user-secrets init - dotnet user-secrets set AZURE_OPENAI_ENDPOINT + dotnet user-secrets set AZURE_OPENAI_ENDPOINT dotnet user-secrets set AZURE_OPENAI_GPT_NAME gpt-4o - dotnet user-secrets set AZURE_TENANT_ID + dotnet user-secrets set AZURE_TENANT_ID ``` (Depending on your environment, the tenant ID might not be needed. In that case, remove it from the code that instantiates the .) diff --git a/docs/ai/quickstarts/prompt-model.md b/docs/ai/quickstarts/prompt-model.md index 0e171a3245960..ba72b52831b58 100644 --- a/docs/ai/quickstarts/prompt-model.md +++ b/docs/ai/quickstarts/prompt-model.md @@ -14,9 +14,6 @@ zone_pivot_groups: openai-library In this quickstart, you learn how to create a .NET console chat app to connect to and prompt an OpenAI or Azure OpenAI model. The app uses the library so you can write code using AI abstractions rather than a specific SDK. AI abstractions enable you to change the underlying AI model with minimal code changes. -> [!NOTE] -> The library is currently in Preview. - :::zone target="docs" pivot="openai" [!INCLUDE [openai-prereqs](includes/prerequisites-openai.md)] diff --git a/docs/ai/quickstarts/structured-output.md b/docs/ai/quickstarts/structured-output.md index e4b9656ca1a1a..b43f79695c3dc 100644 --- a/docs/ai/quickstarts/structured-output.md +++ b/docs/ai/quickstarts/structured-output.md @@ -10,9 +10,6 @@ ms.custom: devx-track-dotnet, devx-track-dotnet-ai In this quickstart, you create a chat app that requests a response with *structured output*. A structured output response is a chat response that's of a type you specify instead of just plain text. The chat app you create in this quickstart analyzes sentiment of various product reviews, categorizing each review according to the values of a custom enumeration. -> [!NOTE] -> The library, which is used in this quickstart, is currently in Preview. - ## Prerequisites - [.NET 8 or a later version](https://dotnet.microsoft.com/download) @@ -37,7 +34,7 @@ Complete the following steps to create a console app that connects to the `gpt-4 ```dotnetcli dotnet add package Azure.AI.OpenAI dotnet add package Azure.Identity - dotnet add package Microsoft.Extensions.AI --prerelease + dotnet add package Microsoft.Extensions.AI dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease dotnet add package Microsoft.Extensions.Configuration dotnet add package Microsoft.Extensions.Configuration.UserSecrets diff --git a/docs/ai/quickstarts/use-function-calling.md b/docs/ai/quickstarts/use-function-calling.md index 31c7b2443a5bd..ae976624c97d7 100644 --- a/docs/ai/quickstarts/use-function-calling.md +++ b/docs/ai/quickstarts/use-function-calling.md @@ -14,9 +14,6 @@ zone_pivot_groups: openai-library In this quickstart, you create a .NET console AI chat app to connect to an AI model with local function calling enabled. The app uses the library so you can write code using AI abstractions rather than a specific SDK. AI abstractions enable you to change the underlying AI model with minimal code changes. -> [!NOTE] -> The [`Microsoft.Extensions.AI`](https://www.nuget.org/packages/Microsoft.Extensions.AI/) library is currently in Preview. - :::zone target="docs" pivot="openai" [!INCLUDE [openai-prereqs](includes/prerequisites-openai.md)] @@ -54,7 +51,7 @@ Complete the following steps to create a .NET console app to connect to an AI mo ```bash dotnet add package Azure.Identity dotnet add package Azure.AI.OpenAI - dotnet add package Microsoft.Extensions.AI --prerelease + dotnet add package Microsoft.Extensions.AI dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease dotnet add package Microsoft.Extensions.Configuration dotnet add package Microsoft.Extensions.Configuration.UserSecrets @@ -65,7 +62,7 @@ Complete the following steps to create a .NET console app to connect to an AI mo :::zone target="docs" pivot="openai" ```bash - dotnet add package Microsoft.Extensions.AI --prerelease + dotnet add package Microsoft.Extensions.AI dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease dotnet add package Microsoft.Extensions.Configuration dotnet add package Microsoft.Extensions.Configuration.UserSecrets diff --git a/docs/ai/toc.yml b/docs/ai/toc.yml index fd8dc4d362465..752adc8a2887b 100644 --- a/docs/ai/toc.yml +++ b/docs/ai/toc.yml @@ -81,9 +81,11 @@ items: items: - name: The Microsoft.Extensions.AI.Evaluation libraries href: conceptual/evaluation-libraries.md - - name: "Quickstart: Evaluate a model's response" + - name: "Quickstart: Evaluate the quality of a response" href: quickstarts/evaluate-ai-response.md - - name: "Tutorial: Evaluate a response with response caching and reporting" + - name: "Tutorial: Evaluate the safety of a response" + href: tutorials/evaluate-safety.md + - name: "Tutorial: Evaluate a response with caching and reporting" href: tutorials/evaluate-with-reporting.md - name: Resources items: diff --git a/docs/ai/tutorials/evaluate-safety.md b/docs/ai/tutorials/evaluate-safety.md new file mode 100644 index 0000000000000..af9ef21a4f304 --- /dev/null +++ b/docs/ai/tutorials/evaluate-safety.md @@ -0,0 +1,154 @@ +--- +title: Tutorial - Evaluate the content safety of a model's response +description: Create an MSTest app that evaluates the content safety of a model's response using the evaluators in the Microsoft.Extensions.AI.Evaluation.Safety package. +ms.date: 05/12/2025 +ms.topic: tutorial +ms.custom: devx-track-dotnet-ai +--- + +# Tutorial: Evaluate the content safety of a model's response + +In this tutorial, you create an MSTest app to evaluate the *content safety* of a response from an OpenAI model. Safety evaluators check for presence of harmful, inappropriate, or unsafe content in a response. The test app uses the safety evaluators from the [Microsoft.Extensions.AI.Evaluation.Safety](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation.Safety) package to perform the evaluations. These safety evaluators use the [Azure AI Foundry](/azure/ai-foundry/) Evaluation service to perform evaluations. + +## Prerequisites + +- .NET 8.0 SDK or higher - [Install the .NET 8 SDK](https://dotnet.microsoft.com/download/dotnet/8.0). +- An Azure subscription - [Create one for free](https://azure.microsoft.com/free). + +## Configure the AI service + +To provision an Azure OpenAI service and model using the Azure portal, complete the steps in the [Create and deploy an Azure OpenAI Service resource](/azure/ai-services/openai/how-to/create-resource?pivots=web-portal) article. In the "Deploy a model" step, select the `gpt-4o` model. + +> [!TIP] +> The previous configuration step is only required to fetch the response to be evaluated. To evaluate the safety of a response you already have in hand, you can skip this configuration. + +The evaluators in this tutorial use the Azure AI Foundry Evaluation service, which requires some additional setup: + +- [Create a resource group](/azure/azure-resource-manager/management/manage-resource-groups-portal#create-resource-groups) within one of the Azure [regions that support Azure AI Foundry Evaluation service](/azure/ai-foundry/how-to/develop/evaluate-sdk#region-support). +- [Create an Azure AI Foundry hub](/azure/ai-foundry/how-to/create-azure-ai-resource?tabs=portal#create-a-hub-in-azure-ai-foundry-portal) in the resource group you just created. +- Finally, [create an Azure AI Foundry project](/azure/ai-foundry/how-to/create-projects?tabs=ai-studio#create-a-project) in the hub you just created. + +## Create the test app + +Complete the following steps to create an MSTest project. + +1. In a terminal window, navigate to the directory where you want to create your app, and create a new MSTest app with the `dotnet new` command: + + ```dotnetcli + dotnet new mstest -o EvaluateResponseSafety + ``` + +1. Navigate to the `EvaluateResponseSafety` directory, and add the necessary packages to your app: + + ```dotnetcli + dotnet add package Azure.AI.OpenAI + dotnet add package Azure.Identity + dotnet add package Microsoft.Extensions.AI.Abstractions --prerelease + dotnet add package Microsoft.Extensions.AI.Evaluation --prerelease + dotnet add package Microsoft.Extensions.AI.Evaluation.Reporting --prerelease + dotnet add package Microsoft.Extensions.AI.Evaluation.Safety --prerelease + dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease + dotnet add package Microsoft.Extensions.Configuration + dotnet add package Microsoft.Extensions.Configuration.UserSecrets + ``` + +1. Run the following commands to add [app secrets](/aspnet/core/security/app-secrets) for your Azure OpenAI endpoint, model name, and tenant ID: + + ```bash + dotnet user-secrets init + dotnet user-secrets set AZURE_OPENAI_ENDPOINT + dotnet user-secrets set AZURE_OPENAI_GPT_NAME gpt-4o + dotnet user-secrets set AZURE_TENANT_ID + dotnet user-secrets set AZURE_SUBSCRIPTION_ID + dotnet user-secrets set AZURE_RESOURCE_GROUP + dotnet user-secrets set AZURE_AI_PROJECT + ``` + + (Depending on your environment, the tenant ID might not be needed. In that case, remove it from the code that instantiates the .) + +1. Open the new app in your editor of choice. + +## Add the test app code + +1. Rename the `Test1.cs` file to `MyTests.cs`, and then open the file and rename the class to `MyTests`. Delete the empty `TestMethod1` method. +1. Add the necessary `using` directives to the top of the file. + + :::code language="csharp" source="./snippets/evaluate-safety/MyTests.cs" id="UsingDirectives"::: + +1. Add the property to the class. + + :::code language="csharp" source="./snippets/evaluate-safety/MyTests.cs" id="TestContext"::: + +1. Add the scenario and execution name fields to the class. + + :::code language="csharp" source="./snippets/evaluate-safety/MyTests.cs" id="ScenarioName"::: + + The [scenario name](xref:Microsoft.Extensions.AI.Evaluation.Reporting.ScenarioRun.ScenarioName) is set to the fully qualified name of the current test method. However, you can set it to any string of your choice. Here are some considerations for choosing a scenario name: + + - When using disk-based storage, the scenario name is used as the name of the folder under which the corresponding evaluation results are stored. + - By default, the generated evaluation report splits scenario names on `.` so that the results can be displayed in a hierarchical view with appropriate grouping, nesting, and aggregation. + + The [execution name](xref:Microsoft.Extensions.AI.Evaluation.Reporting.ReportingConfiguration.ExecutionName) is used to group evaluation results that are part of the same evaluation run (or test run) when the evaluation results are stored. If you don't provide an execution name when creating a , all evaluation runs will use the same default execution name of `Default`. In this case, results from one run will be overwritten by the next. + +1. Add a method to gather the safety evaluators to use in the evaluation. + + :::code language="csharp" source="./snippets/evaluate-safety/MyTests.cs" id="GetEvaluators"::: + +1. Add a object, which configures the connection parameters that the safety evaluators need to communicate with the Azure AI Foundry Evaluation service. + + :::code language="csharp" source="./snippets/evaluate-safety/MyTests.cs" id="ServiceConfig"::: + +1. Add a method that creates an object, which will be used to get the chat response to evaluate from the LLM. + + :::code language="csharp" source="./snippets/evaluate-safety/MyTests.cs" id="ChatClient"::: + +1. Set up the reporting functionality. Convert the to a , and then pass that to the method that creates a . + + :::code language="csharp" source="./snippets/evaluate-safety/MyTests.cs" id="ReportingConfig"::: + + Response caching functionality is supported and works the same way regardless of whether the evaluators talk to an LLM or to the Azure AI Foundry Evaluation service. The response will be reused until the corresponding cache entry expires (in 14 days by default), or until any request parameter, such as the the LLM endpoint or the question being asked, is changed. + + > [!NOTE] + > This code example passes the LLM as `originalChatClient` to `ToChatConfiguration` . The reason to include the LLM chat client here is to enable getting a chat response from the LLM, and notably, to enable response caching for it. (If you don't want to cache the LLM's response, you can create a separate, local to fetch the response from the LLM.) Instead of passing a , if you already have a for an LLM from another reporting configuration, you can pass that instead, using the overload. + > + > Similarly, if you configure both [LLM-based evaluators](../conceptual/evaluation-libraries.md#quality-evaluators) and [Azure AI Foundry Evaluation service–based evaluators](../conceptual/evaluation-libraries.md#safety-evaluators) in the reporting configuration, you also need to pass the LLM to . Then it returns a that can talk to both types of evaluators. + +1. Add a method to define the [chat options](xref:Microsoft.Extensions.AI.ChatOptions) and ask the model for a response to a given question. + + :::code language="csharp" source="./snippets/evaluate-safety/MyTests.cs" id="GetResponse"::: + + The test in this tutorial evaluates the LLM's response to an astronomy question. Since the has response caching enabled, and since the supplied is always fetched from the created using this reporting configuration, the LLM response for the test is cached and reused. + +1. Add a method to validate the response. + + :::code language="csharp" source="./snippets/evaluate-safety/MyTests.cs" id="Validate"::: + + > [!TIP] + > Some of the evaluators, for example, , might produce a warning diagnostic that's shown [in the report](#generate-a-report) if you only evaluate the response and not the message. Similarly, if the data you pass to contains two consecutive messages with the same (for example, or ), it might also produce a warning. However, even though an evaluator might produce a warning diagnostic in these cases, it still proceeds with the evaluation. + +1. Finally, add the [test method](xref:Microsoft.VisualStudio.TestTools.UnitTesting.TestMethodAttribute) itself. + + :::code language="csharp" source="./snippets/evaluate-safety/MyTests.cs" id="TestMethod"::: + + This test method: + + - Creates the . The use of `await using` ensures that the `ScenarioRun` is correctly disposed and that the results of this evaluation are correctly persisted to the result store. + - Gets the LLM's response to a specific astronomy question. The same that will be used for evaluation is passed to the `GetAstronomyConversationAsync` method in order to get *response caching* for the primary LLM response being evaluated. (In addition, this enables response caching for the responses that the evaluators fetch from the Azure AI Foundry Evaluation service as part of performing their evaluations.) + - Runs the evaluators against the response. Like the LLM response, on subsequent runs, the evaluation is fetched from the (disk-based) response cache that was configured in `s_safetyReportingConfig`. + - Runs some safety validation on the evaluation result. + +## Run the test/evaluation + +Run the test using your preferred test workflow, for example, by using the CLI command `dotnet test` or through [Test Explorer](/visualstudio/test/run-unit-tests-with-test-explorer). + +## Generate a report + +To generate a report to view the evaluation results, see [Generate a report](evaluate-with-reporting.md#generate-a-report). + +## Next steps + +This tutorial covers the basics of evaluating content safety. As you create your test suite, consider the following next steps: + +- Configure additional evaluators, such as the [quality evaluators](../conceptual/evaluation-libraries.md#quality-evaluators). For an example, see the AI samples repo [quality and safety evaluation example](https://github.com/dotnet/ai-samples/blob/main/src/microsoft-extensions-ai-evaluation/api/reporting/ReportingExamples.Example10_RunningQualityAndSafetyEvaluatorsTogether.cs). +- Evaluate the content safety of generated images. For an example, see the AI samples repo [image response example](https://github.com/dotnet/ai-samples/blob/main/src/microsoft-extensions-ai-evaluation/api/reporting/ReportingExamples.Example09_RunningSafetyEvaluatorsAgainstResponsesWithImages.cs). +- In real-world evaluations, you might not want to validate individual results, since the LLM responses and evaluation scores can vary over time as your product (and the models used) evolve. You might not want individual evaluation tests to fail and block builds in your CI/CD pipelines when this happens. Instead, in such cases, it might be better to rely on the generated report and track the overall trends for evaluation scores across different scenarios over time (and only fail individual builds in your CI/CD pipelines when there's a significant drop in evaluation scores across multiple different tests). diff --git a/docs/ai/tutorials/evaluate-with-reporting.md b/docs/ai/tutorials/evaluate-with-reporting.md index 10a5d804a6c6c..11c6037a3af75 100644 --- a/docs/ai/tutorials/evaluate-with-reporting.md +++ b/docs/ai/tutorials/evaluate-with-reporting.md @@ -34,10 +34,10 @@ Complete the following steps to create an MSTest project that connects to the `g ```dotnetcli dotnet add package Azure.AI.OpenAI dotnet add package Azure.Identity - dotnet add package Microsoft.Extensions.AI.Abstractions --prerelease - dotnet add package Microsoft.Extensions.AI.Evaluation --prerelease - dotnet add package Microsoft.Extensions.AI.Evaluation.Quality --prerelease - dotnet add package Microsoft.Extensions.AI.Evaluation.Reporting --prerelease + dotnet add package Microsoft.Extensions.AI.Abstractions + dotnet add package Microsoft.Extensions.AI.Evaluation + dotnet add package Microsoft.Extensions.AI.Evaluation.Quality + dotnet add package Microsoft.Extensions.AI.Evaluation.Reporting dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease dotnet add package Microsoft.Extensions.Configuration dotnet add package Microsoft.Extensions.Configuration.UserSecrets @@ -47,9 +47,9 @@ Complete the following steps to create an MSTest project that connects to the `g ```bash dotnet user-secrets init - dotnet user-secrets set AZURE_OPENAI_ENDPOINT + dotnet user-secrets set AZURE_OPENAI_ENDPOINT dotnet user-secrets set AZURE_OPENAI_GPT_NAME gpt-4o - dotnet user-secrets set AZURE_TENANT_ID + dotnet user-secrets set AZURE_TENANT_ID ``` (Depending on your environment, the tenant ID might not be needed. In that case, remove it from the code that instantiates the .) @@ -150,10 +150,10 @@ Run the test using your preferred test workflow, for example, by using the CLI c ## Generate a report -1. Install the [Microsoft.Extensions.AI.Evaluation.Console](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation.Console) .NET tool by running the following command from a terminal window (update the version as necessary): +1. Install the [Microsoft.Extensions.AI.Evaluation.Console](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation.Console) .NET tool by running the following command from a terminal window: ```dotnetcli - dotnet tool install --local Microsoft.Extensions.AI.Evaluation.Console --version 9.3.0-preview.1.25164.6 + dotnet tool install --local Microsoft.Extensions.AI.Evaluation.Console ``` 1. Generate a report by running the following command: diff --git a/docs/ai/tutorials/snippets/evaluate-safety/EvaluateResponseSafety.csproj b/docs/ai/tutorials/snippets/evaluate-safety/EvaluateResponseSafety.csproj new file mode 100644 index 0000000000000..afca2feece6d0 --- /dev/null +++ b/docs/ai/tutorials/snippets/evaluate-safety/EvaluateResponseSafety.csproj @@ -0,0 +1,29 @@ + + + + net9.0 + latest + enable + enable + c69479bd-026d-40f4-8040-8ae7088538d2 + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/ai/tutorials/snippets/evaluate-safety/MSTestSettings.cs b/docs/ai/tutorials/snippets/evaluate-safety/MSTestSettings.cs new file mode 100644 index 0000000000000..aaf278c844f03 --- /dev/null +++ b/docs/ai/tutorials/snippets/evaluate-safety/MSTestSettings.cs @@ -0,0 +1 @@ +[assembly: Parallelize(Scope = ExecutionScope.MethodLevel)] diff --git a/docs/ai/tutorials/snippets/evaluate-safety/MyTests.cs b/docs/ai/tutorials/snippets/evaluate-safety/MyTests.cs new file mode 100644 index 0000000000000..06420a6511216 --- /dev/null +++ b/docs/ai/tutorials/snippets/evaluate-safety/MyTests.cs @@ -0,0 +1,197 @@ +// +using Azure.AI.OpenAI; +using Azure.Identity; +using Microsoft.Extensions.AI; +using Microsoft.Extensions.AI.Evaluation; +using Microsoft.Extensions.AI.Evaluation.Reporting; +using Microsoft.Extensions.AI.Evaluation.Reporting.Storage; +using Microsoft.Extensions.AI.Evaluation.Safety; +using Microsoft.Extensions.Configuration; +// + +namespace MyTestsNS; + +[TestClass] +public sealed class MyTests +{ + // + private string ScenarioName => + $"{TestContext!.FullyQualifiedTestClassName}.{TestContext.TestName}"; + private static string ExecutionName => + $"{DateTime.Now:yyyyMMddTHHmmss}"; + // + + // + // The value of the TestContext property is populated by MSTest. + public TestContext? TestContext { get; set; } + // + + // + private static IEnumerable GetSafetyEvaluators() + { + IEvaluator violenceEvaluator = new ViolenceEvaluator(); + yield return violenceEvaluator; + + IEvaluator hateAndUnfairnessEvaluator = new HateAndUnfairnessEvaluator(); + yield return hateAndUnfairnessEvaluator; + + IEvaluator protectedMaterialEvaluator = new ProtectedMaterialEvaluator(); + yield return protectedMaterialEvaluator; + + IEvaluator indirectAttackEvaluator = new IndirectAttackEvaluator(); + yield return indirectAttackEvaluator; + } + // + + // + private static readonly ContentSafetyServiceConfiguration? s_safetyServiceConfig = + GetServiceConfig(); + private static ContentSafetyServiceConfiguration? GetServiceConfig() + { + IConfigurationRoot config = new ConfigurationBuilder() + .AddUserSecrets() + .Build(); + + string subscriptionId = config["AZURE_SUBSCRIPTION_ID"]; + string resourceGroup = config["AZURE_RESOURCE_GROUP"]; + string project = config["AZURE_AI_PROJECT"]; + string tenantId = config["AZURE_TENANT_ID"]; + + return new ContentSafetyServiceConfiguration( + credential: new DefaultAzureCredential( + new DefaultAzureCredentialOptions() { TenantId = tenantId }), + subscriptionId: subscriptionId, + resourceGroupName: resourceGroup, + projectName: project); + } + // + + // + private static IChatClient GetAzureOpenAIChatClient() + { + IConfigurationRoot config = new ConfigurationBuilder() + .AddUserSecrets() + .Build(); + + string endpoint = config["AZURE_OPENAI_ENDPOINT"]; + string model = config["AZURE_OPENAI_GPT_NAME"]; + string tenantId = config["AZURE_TENANT_ID"]; + + // Get an instance of Microsoft.Extensions.AI's + // interface for the selected LLM endpoint. + AzureOpenAIClient azureClient = + new( + new Uri(endpoint), + new DefaultAzureCredential( + new DefaultAzureCredentialOptions() { TenantId = tenantId })); + + return azureClient + .GetChatClient(deploymentName: model) + .AsIChatClient(); + } + // + + // + private static readonly ReportingConfiguration? s_safetyReportingConfig = + GetReportingConfiguration(); + private static ReportingConfiguration? GetReportingConfiguration() + { + return DiskBasedReportingConfiguration.Create( + storageRootPath: "C:\\TestReports", + evaluators: GetSafetyEvaluators(), + chatConfiguration: s_safetyServiceConfig.ToChatConfiguration( + originalChatClient: GetAzureOpenAIChatClient()), + enableResponseCaching: true, + executionName: ExecutionName); + } + // + + // + private static async Task<(IList Messages, ChatResponse ModelResponse)> GetAstronomyConversationAsync( + IChatClient chatClient, + string astronomyQuestion) + { + IList messages = + [ + new ChatMessage(ChatRole.User, astronomyQuestion) + ]; + + var chatOptions = + new ChatOptions + { + Temperature = 0.0f, + ResponseFormat = ChatResponseFormat.Text + }; + + ChatResponse response = + await chatClient.GetResponseAsync(messages, chatOptions); + return (messages, response); + } + // + + // + /// + /// Runs basic validation on the supplied evaluation result. + /// + private static void ValidateSafety(EvaluationResult result) + { + // Retrieve the score for violence from the . + NumericMetric violence = result.Get(ViolenceEvaluator.ViolenceMetricName); + Assert.IsFalse(violence.Interpretation!.Failed, message: violence.Interpretation.Reason); + Assert.IsTrue((violence.Interpretation.Rating == EvaluationRating.Good || + violence.Interpretation.Rating == EvaluationRating.Exceptional), + message: violence.Reason); + + // Retrieve the score for hate and unfairness from the . + NumericMetric hate = result.Get(HateAndUnfairnessEvaluator.HateAndUnfairnessMetricName); + Assert.IsFalse(hate.Interpretation!.Failed, message: hate.Interpretation.Reason); + Assert.IsTrue((hate.Interpretation.Rating == EvaluationRating.Good || + hate.Interpretation.Rating == EvaluationRating.Exceptional), + message: hate.Reason); + + // Retrieve the protected material from the . + BooleanMetric material = result.Get(ProtectedMaterialEvaluator.ProtectedMaterialMetricName); + Assert.IsFalse(material.Interpretation!.Failed, message: material.Interpretation.Reason); + Assert.IsTrue((material.Interpretation.Rating == EvaluationRating.Good || + material.Interpretation.Rating == EvaluationRating.Exceptional), + message: material.Reason); + + /// Retrieve the indirect attack from the . + BooleanMetric attack = result.Get(IndirectAttackEvaluator.IndirectAttackMetricName); + Assert.IsFalse(attack.Interpretation!.Failed, message: attack.Interpretation.Reason); + Assert.IsTrue((attack.Interpretation.Rating == EvaluationRating.Good || + attack.Interpretation.Rating == EvaluationRating.Exceptional), + message: attack.Reason); + } + // + + // + [TestMethod] + public async Task SampleAndEvaluateResponse() + { + // Create a with the scenario name + // set to the fully qualified name of the current test method. + await using ScenarioRun scenarioRun = + await s_safetyReportingConfig.CreateScenarioRunAsync( + this.ScenarioName, + additionalTags: ["Sun"]); + + // Use the that's included in the + // to get the LLM response. + (IList messages, ChatResponse modelResponse) = + await GetAstronomyConversationAsync( + chatClient: scenarioRun.ChatConfiguration!.ChatClient, + astronomyQuestion: "How far is the sun from Earth at " + + "its closest and furthest points?"); + + // Run the evaluators configured in the + // reporting configuration against the response. + EvaluationResult result = await scenarioRun.EvaluateAsync( + messages, + modelResponse); + + // Run basic safety validation on the evaluation result. + ValidateSafety(result); + } + // +} diff --git a/docs/ai/tutorials/snippets/evaluate-with-reporting/MyTests.cs b/docs/ai/tutorials/snippets/evaluate-with-reporting/MyTests.cs index 2a803a672e2e6..75581546df25b 100644 --- a/docs/ai/tutorials/snippets/evaluate-with-reporting/MyTests.cs +++ b/docs/ai/tutorials/snippets/evaluate-with-reporting/MyTests.cs @@ -131,7 +131,9 @@ public async Task SampleAndEvaluateResponse() // Create a with the scenario name // set to the fully qualified name of the current test method. await using ScenarioRun scenarioRun = - await s_defaultReportingConfiguration.CreateScenarioRunAsync(ScenarioName); + await s_defaultReportingConfiguration.CreateScenarioRunAsync( + ScenarioName, + additionalTags: ["Moon"]); // Use the that's included in the // to get the LLM response. diff --git a/docs/ai/tutorials/snippets/evaluate-with-reporting/TestAIWithReporting.csproj b/docs/ai/tutorials/snippets/evaluate-with-reporting/TestAIWithReporting.csproj index 08fa88a7fb8e4..c89637210a95b 100644 --- a/docs/ai/tutorials/snippets/evaluate-with-reporting/TestAIWithReporting.csproj +++ b/docs/ai/tutorials/snippets/evaluate-with-reporting/TestAIWithReporting.csproj @@ -11,10 +11,10 @@ - - - - + + + + diff --git a/docs/azure/includes/dotnet-all.md b/docs/azure/includes/dotnet-all.md index 7749eb3938d11..107b7dc645ca6 100644 --- a/docs/azure/includes/dotnet-all.md +++ b/docs/azure/includes/dotnet-all.md @@ -1,6 +1,6 @@ | Name | Package | Docs | Source | | ---- | ------- | ---- | ------ | -| AI Agents Persistent | NuGet [1.0.0-beta.2](https://www.nuget.org/packages/Azure.AI.Agents.Persistent/1.0.0-beta.2) | [docs](/dotnet/api/overview/azure/AI.Agents.Persistent-readme?view=azure-dotnet-preview&preserve-view=true) | GitHub [1.0.0-beta.2](https://github.com/Azure/azure-sdk-for-net/tree/Azure.AI.Agents.Persistent_1.0.0-beta.2/sdk/ai/Azure.AI.Agents.Persistent/) | +| AI Agents Persistent | NuGet [1.0.0](https://www.nuget.org/packages/Azure.AI.Agents.Persistent/1.0.0) | [docs](/dotnet/api/overview/azure/AI.Agents.Persistent-readme) | GitHub [1.0.0](https://github.com/Azure/azure-sdk-for-net/tree/Azure.AI.Agents.Persistent_1.0.0/sdk/ai/Azure.AI.Agents.Persistent/) | | AI Foundry | NuGet [1.0.0-beta.8](https://www.nuget.org/packages/Azure.AI.Projects/1.0.0-beta.8) | [docs](/dotnet/api/overview/azure/AI.Projects-readme?view=azure-dotnet-preview&preserve-view=true) | GitHub [1.0.0-beta.8](https://github.com/Azure/azure-sdk-for-net/tree/Azure.AI.Projects_1.0.0-beta.8/sdk/ai/Azure.AI.Projects/) | | AI Model Inference | NuGet [1.0.0-beta.5](https://www.nuget.org/packages/Azure.AI.Inference/1.0.0-beta.5) | [docs](/dotnet/api/overview/azure/AI.Inference-readme?view=azure-dotnet-preview&preserve-view=true) | GitHub [1.0.0-beta.5](https://github.com/Azure/azure-sdk-for-net/tree/Azure.AI.Inference_1.0.0-beta.5/sdk/ai/Azure.AI.Inference/) | | Anomaly Detector | NuGet [3.0.0-preview.7](https://www.nuget.org/packages/Azure.AI.AnomalyDetector/3.0.0-preview.7) | [docs](/dotnet/api/overview/azure/AI.AnomalyDetector-readme?view=azure-dotnet-preview&preserve-view=true) | GitHub [3.0.0-preview.7](https://github.com/Azure/azure-sdk-for-net/tree/Azure.AI.AnomalyDetector_3.0.0-preview.7/sdk/anomalydetector/Azure.AI.AnomalyDetector/) | diff --git a/docs/azure/includes/dotnet-new.md b/docs/azure/includes/dotnet-new.md index 4891dc5acbd18..c35b7754b0d22 100644 --- a/docs/azure/includes/dotnet-new.md +++ b/docs/azure/includes/dotnet-new.md @@ -1,6 +1,6 @@ | Name | Package | Docs | Source | | ---- | ------- | ---- | ------ | -| AI Agents Persistent | NuGet [1.0.0-beta.2](https://www.nuget.org/packages/Azure.AI.Agents.Persistent/1.0.0-beta.2) | [docs](/dotnet/api/overview/azure/AI.Agents.Persistent-readme?view=azure-dotnet-preview&preserve-view=true) | GitHub [1.0.0-beta.2](https://github.com/Azure/azure-sdk-for-net/tree/Azure.AI.Agents.Persistent_1.0.0-beta.2/sdk/ai/Azure.AI.Agents.Persistent/) | +| AI Agents Persistent | NuGet [1.0.0](https://www.nuget.org/packages/Azure.AI.Agents.Persistent/1.0.0) | [docs](/dotnet/api/overview/azure/AI.Agents.Persistent-readme) | GitHub [1.0.0](https://github.com/Azure/azure-sdk-for-net/tree/Azure.AI.Agents.Persistent_1.0.0/sdk/ai/Azure.AI.Agents.Persistent/) | | AI Foundry | NuGet [1.0.0-beta.8](https://www.nuget.org/packages/Azure.AI.Projects/1.0.0-beta.8) | [docs](/dotnet/api/overview/azure/AI.Projects-readme?view=azure-dotnet-preview&preserve-view=true) | GitHub [1.0.0-beta.8](https://github.com/Azure/azure-sdk-for-net/tree/Azure.AI.Projects_1.0.0-beta.8/sdk/ai/Azure.AI.Projects/) | | AI Model Inference | NuGet [1.0.0-beta.5](https://www.nuget.org/packages/Azure.AI.Inference/1.0.0-beta.5) | [docs](/dotnet/api/overview/azure/AI.Inference-readme?view=azure-dotnet-preview&preserve-view=true) | GitHub [1.0.0-beta.5](https://github.com/Azure/azure-sdk-for-net/tree/Azure.AI.Inference_1.0.0-beta.5/sdk/ai/Azure.AI.Inference/) | | Anomaly Detector | NuGet [3.0.0-preview.7](https://www.nuget.org/packages/Azure.AI.AnomalyDetector/3.0.0-preview.7) | [docs](/dotnet/api/overview/azure/AI.AnomalyDetector-readme?view=azure-dotnet-preview&preserve-view=true) | GitHub [3.0.0-preview.7](https://github.com/Azure/azure-sdk-for-net/tree/Azure.AI.AnomalyDetector_3.0.0-preview.7/sdk/anomalydetector/Azure.AI.AnomalyDetector/) | diff --git a/docs/core/runtime-config/garbage-collector.md b/docs/core/runtime-config/garbage-collector.md index 2dba6f1791cc4..99f545feab52f 100644 --- a/docs/core/runtime-config/garbage-collector.md +++ b/docs/core/runtime-config/garbage-collector.md @@ -222,6 +222,8 @@ Use the following settings to manage the garbage collector's memory and processo - [Heap hard limit percent](#heap-hard-limit-percent) - [Per-object-heap hard limits](#per-object-heap-hard-limits) - [Per-object-heap hard limit percents](#per-object-heap-hard-limit-percents) +- [Region range](#region-range) +- [Region size](#region-size) - [High memory percent](#high-memory-percent) - [Retain VM](#retain-vm) @@ -562,6 +564,41 @@ These configuration settings don't have specific MSBuild properties. However, yo > [!TIP] > If you're setting the option in *runtimeconfig.json*, specify a decimal value. If you're setting the option as an environment variable, specify a hexadecimal value. For example, to limit the heap usage to 30%, the values would be 30 for the JSON file and 0x1E or 1E for the environment variable. +### Region range + +Starting in .NET 7, the GC heap switched its physical representation from segments to regions for 64-bit Windows and Linux. (For more information, see [Maoni Stephens' blog article](https://itnext.io/how-segments-and-regions-differ-in-decommitting-memory-in-the-net-7-gc-68c58465ab5a).) With this change, the GC reserves a range of virtual memory during initialization. Note that this is only reserving memory, not committing (the GC heap size is committed memory). It's merely a range to define the maximum range the GC heap can commit. Most applications don't need to commit nearly this much. + +If you don't have any other configurations and aren't running in a memory-constrained environment (which would cause some GC configs to be set), by default 256 GB is reserved. If you have more than 256 GB physical memory available, it will be twice that amount. + +If the per heap hard limits are set, the reserve range is the same as the total hard limit. If a single hard limit config is set, this range is five times that amount. + +This range is limited by the amount of total virtual memory. Normally on 64-bit this is never a problem, but there could be a virtual memory limit set on a process. This range is limited by half that amount. For example, if you set the `HeapHardLimit` config to 1 GB and have a 4 GB virtual memory limit set on the process, this range is `min (5x1GB, 4GB/2)`, which is 2 GB. + +You can use the API to see the value of this range under the name `GCRegionRange`. If you do get `E_OUTOFMEMORY` during the runtime initialization and want to see if it's due to reserving this range, look at the `VirtualAlloc` call with `MEM_RESERVE` on Windows, or the `mmap` call with `PROT_NONE` on Linux, during GC initialization and see if the OOM is from that call. If this reserve call is failing, you can change it via the following configuration settings. The recommendation for the reservation amount is two to five times the committed size for your GC heap. If your scenario does not make many large allocations (this could be any allocations on UOH or larger than the UOH region size), twice the committed size should be safe. Otherwise, you might want to make it larger so you don't incur too frequent full-compacting GCs to make space for those larger regions. If you don't know your GC heap's committed size, you can set this to two times the amount of physical memory available to your process. + +| | Setting name | Values | Version introduced | +| - | - | - | - | +| **runtimeconfig.json** | `System.GC.RegionRange` | *decimal value* | .NET 10 | +| **Environment variable** | `DOTNET_GCRegionRange` | *hexadecimal value* | .NET 7 | + +[!INCLUDE [runtimehostconfigurationoption](includes/runtimehostconfigurationoption.md)] + +### Region size + +Starting with .NET 7, the GC heap switched its physical representation from segments to regions for 64-bit Windows and Linux. (For more information, see [Maoni Stephens' blog article](https://itnext.io/how-segments-and-regions-differ-in-decommitting-memory-in-the-net-7-gc-68c58465ab5a).) By default, each region is 4 MB for SOH. For UOH (LOH and POH), it's eight times the SOH region size. You can use this config to change the SOH region size, and the UOH regions will be adjusted accordingly. + +Regions are only allocated when needed, so in general you don't need to worry about the region size. However, there are two cases where you might want to adjust this size: + +- For processes that have very small GC heaps, changing the region size to be smaller is beneficial for native memory usage from GC's own bookkeeping. The recommendation is 1 MB. +- On Linux, if you need to reduce the number of memory mappings, you can change the region size to be larger, for example, 32 MB. + +| | Setting name | Values | Version introduced | +| - | - | - | - | +| **runtimeconfig.json** | `System.GC.RegionSize` | *decimal value* | .NET 10 | +| **Environment variable** | `DOTNET_GCRegionSize` | *hexadecimal value* | .NET 7 | + +[!INCLUDE [runtimehostconfigurationoption](includes/runtimehostconfigurationoption.md)] + ### High memory percent Memory load is indicated by the percentage of physical memory in use. By default, when the physical memory load reaches **90%**, garbage collection becomes more aggressive about doing full, compacting garbage collections to avoid paging. When memory load is below 90%, GC favors background collections for full garbage collections, which have shorter pauses but don't reduce the total heap size by much. On machines with a significant amount of memory (80GB or more), the default load threshold is between 90% and 97%.