|
| 1 | +--- |
| 2 | +title: Retrieval Augmented Generation with Feast |
| 3 | +description: How Feast empowers ML Engineers to ship RAG applications to Production. |
| 4 | +date: 2025-03-17 |
| 5 | +authors: ["Francisco Javier Arceo"] |
| 6 | +--- |
| 7 | + |
| 8 | +<div class="hero-image"> |
| 9 | + <img src="/images/blog/space.jpg" alt="Exploring the Possibilities of AI" loading="lazy"> |
| 10 | +</div> |
| 11 | + |
| 12 | + |
| 13 | +## Why Feature Stores Make Sense for GenAI and RAG |
| 14 | + |
| 15 | +Feature stores have been developed over the [past decade](./what-is-a-feature-store) to address the challenges AI |
| 16 | +practitioners face in managing, serving, and scaling machine learning models in production. |
| 17 | + |
| 18 | +Some of the key challenges include: |
| 19 | +* Accessing the right raw data |
| 20 | +* Building features from raw data |
| 21 | +* Combining features into training data |
| 22 | +* Calculating and serving features in production |
| 23 | +* Monitoring features in production |
| 24 | + |
| 25 | +And Feast was specifically designed to address these challenges. |
| 26 | + |
| 27 | +These same challenges extend naturally to Generative AI (GenAI) applications, with the exception of model training. In |
| 28 | +GenAI applications, the foundation model is typically pre-trained and the focus is on fine-tuning or using the model simply as |
| 29 | +an endpoint from some provider (e.g., OpenAI, Anthropic, etc.). |
| 30 | + |
| 31 | +For GenAI use cases, feature stores enable the efficient management of context and metadata, both during |
| 32 | +training/fine-tuning and at inference time. |
| 33 | + |
| 34 | +By using a feature store for your application, you have the ability to treat the LLM context, including the prompt, |
| 35 | +as features. This means you can manage not only input context, document processing, data formatting, tokenization, |
| 36 | +chunking, and embeddings, but also track and version the context used during model inference, ensuring consistency, |
| 37 | +transparency, and reproducibility across models and iterations. |
| 38 | + |
| 39 | +With Feast, ML engineers can streamline the embedding generation process, ensure consistency across both offline and |
| 40 | +online environments, and track the lineage of data and transformations. By leveraging a feature store, GenAI |
| 41 | +applications benefit from enhanced scalability, maintainability, and reproducibility, making them ideal for complex |
| 42 | +AI applications and enterprise needs. |
| 43 | + |
| 44 | +## Feast Now Supports RAG |
| 45 | + |
| 46 | +With the rise of generative AI applications, the need to serve vectors has grown quickly. Feast now has alpha support |
| 47 | +for vector similarity search to power retrieval augmented generation (RAG) systems in production. |
| 48 | + |
| 49 | +<div class="content-image"> |
| 50 | + <img src="/images/blog/milvus-rag.png" alt="Retrieval Augmented Generation with Milvus and Feast" loading="lazy"> |
| 51 | +</div> |
| 52 | + |
| 53 | +This allows ML Engineers and Data Scientists to use the power of their feature store to easily deploy GenAI |
| 54 | +applications using RAG to production. More importantly, Feast offers the flexibility to customize and scale your |
| 55 | +production RAG applications through our scalable transformation systems (streaming, request-time, and batch). |
| 56 | + |
| 57 | +## Retrieval Augmented Generation (RAG) |
| 58 | +[RAG](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) is a technique that combines generative models |
| 59 | +(e.g., LLMs) with retrieval systems to generate contextually relevant output for a particular goal (e.g., |
| 60 | +question and answering). |
| 61 | + |
| 62 | +The typical RAG process involves: |
| 63 | +1. Sourcing text data relevant for your application |
| 64 | +2. Transforming each text document into smaller chunks of text |
| 65 | +3. Transforming those chunks of text into embeddings |
| 66 | +4. Inserting those chunks of text along with some identifier for the chunk and document in some database |
| 67 | +5. Retrieving those chunks of text along with the identifiers at run-time to inject that text into the LLM's context |
| 68 | +6. Calling some API to run inference with your LLM to generate contextually relevant output |
| 69 | +7. Returning the output to some end user |
| 70 | + |
| 71 | +Implicit in (1)-(4) is the potential of scaling to large amounts of data (i.e., using some form of distributed computing), |
| 72 | +orchestrating that scaling through some batch or streaming pipeline, and customization of key transformation decisions |
| 73 | +(e.g., tokenization, model, chunking, data formatting, etc.). |
| 74 | + |
| 75 | +## Powering Retrieval in Production |
| 76 | +To power the Retrieval step of RAG in production, we need to handle data ingestion, data transformation, indexing, |
| 77 | +and serving web requests from an API. |
| 78 | + |
| 79 | +Building high availability software that can handle these requirements and scale as your data scales is a |
| 80 | +non-trivial task. This is a strength of Feast, using the power of Kubernetes, large scale data frameworks like |
| 81 | +Spark and Flink, and the ability to ingest and transform data in real-time through the Feast Feature Server is |
| 82 | +a powerful combination. |
| 83 | + |
| 84 | +## Beyond Vector Similarity Search |
| 85 | +RAG patterns often use vector similarity search for the retrieval step, but this is not the |
| 86 | +only retrieval pattern that can be useful. In fact, standard entity-based retrieval can be very powerful for |
| 87 | +applications where relevant user-context is necessary. |
| 88 | + |
| 89 | +For example, many RAG applications are customer Chat Bots and they benefit significantly from user data (e.g., |
| 90 | +account balance, location, etc.) to generate contextually relevant output. Feast can help you manage this user data |
| 91 | +using its existing entity based retrieval patterns. |
| 92 | + |
| 93 | +## The Benefits of Feast |
| 94 | +Fine-tuning is the holy grail to optimize your RAG systems, and by logging the documents/data and context retrieved |
| 95 | +and during inference, you can ensure that you can fine-tune both the generator and *the retriever* your LLMs for |
| 96 | +your particular needs. |
| 97 | + |
| 98 | +This means that Feast can help you not only serve your documents, user data, and other metadata for production |
| 99 | +RAG applications, but it can also help you scale your embeddings on large amounts of data (e.g,. using Spark to embed |
| 100 | +gigabytes of documents), re-use the same code online and offline, track changes to your transformations, data sources, and |
| 101 | +RAG-sources to provide you with replayability and data lineage, and prepare your datasets so you can fine-tune your |
| 102 | +embedding, retrieval, or generator models later. |
| 103 | + |
| 104 | +Historically, Feast catered to Data Scientists and ML Engineers who implemented their own types of data/feature transformations but, now, |
| 105 | +many RAG providers handle this out of the box for you. We will invest in creating extendable implementations to make it easier |
| 106 | +to ship your applications. |
| 107 | + |
| 108 | +## Feast Powered by Milvus |
| 109 | + |
| 110 | +[Milvus](https://milvus.io/) is a high performance open source vector database that provides a powerful and efficient way to store |
| 111 | +and retrieve embeddings. By using Feast with Milvus, you can easily deploy RAG applications to production and scale |
| 112 | +your retrieval systems on Kubernetes using the Feast Operator or the [Feature Server Helm Chart](https://github.com/feast-dev/feast/tree/master/infra/charts/feast-feature-server). |
| 113 | + |
| 114 | +This tutorial will walk you through building a basic RAG application with Milvus and Feast; i.e., ingesting embedded |
| 115 | +documents in Milvus and retrieving the most similar documents for a given query embedding. |
| 116 | + |
| 117 | +This example consists of 5 steps: |
| 118 | +1. Configuring Milvus |
| 119 | +2. Defining your Data Sources and Views |
| 120 | +3. Updating your Registry |
| 121 | +4. Ingesting the Data |
| 122 | +5. Retrieving the Data |
| 123 | + |
| 124 | +The full demo is available on our [GitHub repository](https://github.com/feast-dev/feast/tree/master/examples/rag). |
| 125 | + |
| 126 | +### Step 1: Configure Milvus |
| 127 | +Configure milvus in a simple `yaml` file. |
| 128 | +```yaml |
| 129 | +project: rag |
| 130 | +provider: local |
| 131 | +registry: data/registry.db |
| 132 | +online_store: |
| 133 | + type: milvus |
| 134 | + path: data/online_store.db |
| 135 | + vector_enabled: true |
| 136 | + embedding_dim: 384 |
| 137 | + index_type: "IVF_FLAT" |
| 138 | + |
| 139 | +offline_store: |
| 140 | + type: file |
| 141 | +entity_key_serialization_version: 3 |
| 142 | +# By default, no_auth for authentication and authorization, other possible values kubernetes and oidc. Refer the documentation for more details. |
| 143 | +auth: |
| 144 | + type: no_auth |
| 145 | +``` |
| 146 | +
|
| 147 | +### Step 2: Define your Data Sources and Views |
| 148 | +You define your data declaratively using Feast's `FeatureView` and `Entity` objects, which are meant to be an easy way |
| 149 | +to give your software engineers and data scientists a common language to define data they want to ship to production. |
| 150 | + |
| 151 | +Here is an example of how you might define a `FeatureView` for a document retrieval. Notice how we define the `vector` |
| 152 | +field and enable vector search by setting `vector_index=True` and the distance metric to `COSINE`. |
| 153 | + |
| 154 | +That's it, the rest of the implementation is already handled for you by Feast and Milvus. |
| 155 | + |
| 156 | +```python |
| 157 | +document = Entity( |
| 158 | + name="document_id", |
| 159 | + description="Document ID", |
| 160 | + value_type=ValueType.INT64, |
| 161 | +) |
| 162 | +
|
| 163 | +source = FileSource( |
| 164 | + file_format=ParquetFormat(), |
| 165 | + path="./data/my_data.parquet", |
| 166 | + timestamp_field="event_timestamp", |
| 167 | +) |
| 168 | +
|
| 169 | +# Define the view for retrieval |
| 170 | +city_embeddings_feature_view = FeatureView( |
| 171 | + name="city_embeddings", |
| 172 | + entities=[document], |
| 173 | + schema=[ |
| 174 | + Field( |
| 175 | + name="vector", |
| 176 | + dtype=Array(Float32), |
| 177 | + vector_index=True, # Vector search enabled |
| 178 | + vector_search_metric="COSINE", # Distance metric configured |
| 179 | + ), |
| 180 | + Field(name="state", dtype=String), |
| 181 | + Field(name="sentence_chunks", dtype=String), |
| 182 | + Field(name="wiki_summary", dtype=String), |
| 183 | + ], |
| 184 | + source=source, |
| 185 | + ttl=timedelta(hours=2), |
| 186 | +) |
| 187 | +``` |
| 188 | + |
| 189 | +### Step 3: Update your Registry |
| 190 | +After we have defined our code we use the `feast apply` syntax in the same folder as the `feature_store.yaml` file and |
| 191 | +update the registry with our metadata. |
| 192 | +```bash |
| 193 | +feast apply |
| 194 | +``` |
| 195 | + |
| 196 | +### Step 4: Ingest your Data |
| 197 | +Now that we have defined our metadata, we can ingest our data into Milvus using the following code: |
| 198 | +```python |
| 199 | +store.write_to_online_store(feature_view_name='city_embeddings', df=df) |
| 200 | +``` |
| 201 | + |
| 202 | +### Step 5: Retrieve your Data |
| 203 | +Now that the data is actually stored in Milvus, we can easily query it using the SDK (and corresponding REST API) to |
| 204 | +retrieve the most similar documents for a given query embedding. |
| 205 | +```python |
| 206 | +context_data = store.retrieve_online_documents_v2( |
| 207 | + features=[ |
| 208 | + "city_embeddings:vector", |
| 209 | + "city_embeddings:document_id", |
| 210 | + "city_embeddings:state", |
| 211 | + "city_embeddings:sentence_chunks", |
| 212 | + "city_embeddings:wiki_summary", |
| 213 | + ], |
| 214 | + query=query_embedding, |
| 215 | + top_k=3, |
| 216 | + distance_metric='COSINE', |
| 217 | +).to_df() |
| 218 | +``` |
| 219 | + |
| 220 | +### The Benefits from using Feast for RAG |
| 221 | +We've discussed some of the high-level benefits from using Feast for a RAG application. |
| 222 | +More specifically, here are some of the concrete benefits you can expect from using Feast for RAG: |
| 223 | +1. [Real-time, Stream, and Batch data Ingestion](https://docs.feast.dev/getting-started/concepts/data-ingestion) support to the Feature Server for online retrieval |
| 224 | +1. [Data dictionary/metadata catalog](https://docs.feast.dev/getting-started/components/registry) autogenerated from code |
| 225 | +3. [UI exposing the metadata catalog](https://docs.feast.dev/reference/alpha-web-ui) |
| 226 | +2. [FastAPI Server](https://docs.feast.dev/getting-started/components/feature-server) to serve your data |
| 227 | +3. [Role Based Access Control (RBAC)](https://docs.feast.dev/getting-started/concepts/permission) to govern access to your data |
| 228 | +6. [Deployment on Kubernetes](https://docs.feast.dev/how-to-guides/running-feast-in-production) using our Helm Chart or our Operator |
| 229 | +7. [Multiple vector database providers](https://docs.feast.dev/reference/alpha-vector-database) |
| 230 | +8. [Multiple data warehouse providers](https://docs.feast.dev/reference/offline-stores/overview#functionality-matrix) |
| 231 | +9. Support for different [data sources](https://docs.feast.dev/reference/data-sources/overview#functionality-matrix) |
| 232 | +10. Support for stream and [batch processors (e.g., Spark and Flink)](https://docs.feast.dev/tutorials/building-streaming-features) |
| 233 | + |
| 234 | +And more! |
| 235 | + |
| 236 | +## The Future of Feast and GenAI |
| 237 | + |
| 238 | +Feast will continue to invest in GenAI use cases. |
| 239 | + |
| 240 | +In particular, we will invest in (1) NLP as a first-class citizen, (2) support for images, (3) support for |
| 241 | +transforming unstructured data (e.g., PDFs), (4) an enhanced GenAI focused feature server to allow our end-users to |
| 242 | +more easily ship RAG to production, (4) an out of the box chat UI meant for internal development and fast iteration, |
| 243 | +and (5) making [Milvus]([url](https://milvus.io/intro)) a fully supported and core online store for RAG. |
| 244 | + |
| 245 | +## Join the Conversation |
| 246 | + |
| 247 | +Are you interested in learning more about how Feast can help you build and deploy RAG applications to production? |
| 248 | +Reach out to us on Slack or [GitHub](https://github.com/feast-dev/feast), we'd love to hear from you! |
0 commit comments