Skip to content

Commit b9e2e6c

Browse files
feat: Adding blog on RAG with Milvus (#5161)
* feat: Adding blog on RAG with Milvus Signed-off-by: Francisco Javier Arceo <[email protected]> * minor changes Signed-off-by: Francisco Javier Arceo <[email protected]> * Adding diagram * Rename image.png to milvus-rag.png * Update retrieval-augmentation-with-feast.md * incorporating Willem's feedback Signed-off-by: Francisco Javier Arceo <[email protected]> * adjust blog and image Signed-off-by: Francisco Javier Arceo <[email protected]> * updated copy Signed-off-by: Francisco Javier Arceo <[email protected]> * adding github link Signed-off-by: Francisco Javier Arceo <[email protected]> * finished blog post, good enough Signed-off-by: Francisco Javier Arceo <[email protected]> --------- Signed-off-by: Francisco Javier Arceo <[email protected]> Signed-off-by: Francisco Javier Arceo <[email protected]>
1 parent 569404b commit b9e2e6c

File tree

4 files changed

+262
-2
lines changed

4 files changed

+262
-2
lines changed
Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
---
2+
title: Retrieval Augmented Generation with Feast
3+
description: How Feast empowers ML Engineers to ship RAG applications to Production.
4+
date: 2025-03-17
5+
authors: ["Francisco Javier Arceo"]
6+
---
7+
8+
<div class="hero-image">
9+
<img src="/images/blog/space.jpg" alt="Exploring the Possibilities of AI" loading="lazy">
10+
</div>
11+
12+
13+
## Why Feature Stores Make Sense for GenAI and RAG
14+
15+
Feature stores have been developed over the [past decade](./what-is-a-feature-store) to address the challenges AI
16+
practitioners face in managing, serving, and scaling machine learning models in production.
17+
18+
Some of the key challenges include:
19+
* Accessing the right raw data
20+
* Building features from raw data
21+
* Combining features into training data
22+
* Calculating and serving features in production
23+
* Monitoring features in production
24+
25+
And Feast was specifically designed to address these challenges.
26+
27+
These same challenges extend naturally to Generative AI (GenAI) applications, with the exception of model training. In
28+
GenAI applications, the foundation model is typically pre-trained and the focus is on fine-tuning or using the model simply as
29+
an endpoint from some provider (e.g., OpenAI, Anthropic, etc.).
30+
31+
For GenAI use cases, feature stores enable the efficient management of context and metadata, both during
32+
training/fine-tuning and at inference time.
33+
34+
By using a feature store for your application, you have the ability to treat the LLM context, including the prompt,
35+
as features. This means you can manage not only input context, document processing, data formatting, tokenization,
36+
chunking, and embeddings, but also track and version the context used during model inference, ensuring consistency,
37+
transparency, and reproducibility across models and iterations.
38+
39+
With Feast, ML engineers can streamline the embedding generation process, ensure consistency across both offline and
40+
online environments, and track the lineage of data and transformations. By leveraging a feature store, GenAI
41+
applications benefit from enhanced scalability, maintainability, and reproducibility, making them ideal for complex
42+
AI applications and enterprise needs.
43+
44+
## Feast Now Supports RAG
45+
46+
With the rise of generative AI applications, the need to serve vectors has grown quickly. Feast now has alpha support
47+
for vector similarity search to power retrieval augmented generation (RAG) systems in production.
48+
49+
<div class="content-image">
50+
<img src="/images/blog/milvus-rag.png" alt="Retrieval Augmented Generation with Milvus and Feast" loading="lazy">
51+
</div>
52+
53+
This allows ML Engineers and Data Scientists to use the power of their feature store to easily deploy GenAI
54+
applications using RAG to production. More importantly, Feast offers the flexibility to customize and scale your
55+
production RAG applications through our scalable transformation systems (streaming, request-time, and batch).
56+
57+
## Retrieval Augmented Generation (RAG)
58+
[RAG](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) is a technique that combines generative models
59+
(e.g., LLMs) with retrieval systems to generate contextually relevant output for a particular goal (e.g.,
60+
question and answering).
61+
62+
The typical RAG process involves:
63+
1. Sourcing text data relevant for your application
64+
2. Transforming each text document into smaller chunks of text
65+
3. Transforming those chunks of text into embeddings
66+
4. Inserting those chunks of text along with some identifier for the chunk and document in some database
67+
5. Retrieving those chunks of text along with the identifiers at run-time to inject that text into the LLM's context
68+
6. Calling some API to run inference with your LLM to generate contextually relevant output
69+
7. Returning the output to some end user
70+
71+
Implicit in (1)-(4) is the potential of scaling to large amounts of data (i.e., using some form of distributed computing),
72+
orchestrating that scaling through some batch or streaming pipeline, and customization of key transformation decisions
73+
(e.g., tokenization, model, chunking, data formatting, etc.).
74+
75+
## Powering Retrieval in Production
76+
To power the Retrieval step of RAG in production, we need to handle data ingestion, data transformation, indexing,
77+
and serving web requests from an API.
78+
79+
Building high availability software that can handle these requirements and scale as your data scales is a
80+
non-trivial task. This is a strength of Feast, using the power of Kubernetes, large scale data frameworks like
81+
Spark and Flink, and the ability to ingest and transform data in real-time through the Feast Feature Server is
82+
a powerful combination.
83+
84+
## Beyond Vector Similarity Search
85+
RAG patterns often use vector similarity search for the retrieval step, but this is not the
86+
only retrieval pattern that can be useful. In fact, standard entity-based retrieval can be very powerful for
87+
applications where relevant user-context is necessary.
88+
89+
For example, many RAG applications are customer Chat Bots and they benefit significantly from user data (e.g.,
90+
account balance, location, etc.) to generate contextually relevant output. Feast can help you manage this user data
91+
using its existing entity based retrieval patterns.
92+
93+
## The Benefits of Feast
94+
Fine-tuning is the holy grail to optimize your RAG systems, and by logging the documents/data and context retrieved
95+
and during inference, you can ensure that you can fine-tune both the generator and *the retriever* your LLMs for
96+
your particular needs.
97+
98+
This means that Feast can help you not only serve your documents, user data, and other metadata for production
99+
RAG applications, but it can also help you scale your embeddings on large amounts of data (e.g,. using Spark to embed
100+
gigabytes of documents), re-use the same code online and offline, track changes to your transformations, data sources, and
101+
RAG-sources to provide you with replayability and data lineage, and prepare your datasets so you can fine-tune your
102+
embedding, retrieval, or generator models later.
103+
104+
Historically, Feast catered to Data Scientists and ML Engineers who implemented their own types of data/feature transformations but, now,
105+
many RAG providers handle this out of the box for you. We will invest in creating extendable implementations to make it easier
106+
to ship your applications.
107+
108+
## Feast Powered by Milvus
109+
110+
[Milvus](https://milvus.io/) is a high performance open source vector database that provides a powerful and efficient way to store
111+
and retrieve embeddings. By using Feast with Milvus, you can easily deploy RAG applications to production and scale
112+
your retrieval systems on Kubernetes using the Feast Operator or the [Feature Server Helm Chart](https://github.com/feast-dev/feast/tree/master/infra/charts/feast-feature-server).
113+
114+
This tutorial will walk you through building a basic RAG application with Milvus and Feast; i.e., ingesting embedded
115+
documents in Milvus and retrieving the most similar documents for a given query embedding.
116+
117+
This example consists of 5 steps:
118+
1. Configuring Milvus
119+
2. Defining your Data Sources and Views
120+
3. Updating your Registry
121+
4. Ingesting the Data
122+
5. Retrieving the Data
123+
124+
The full demo is available on our [GitHub repository](https://github.com/feast-dev/feast/tree/master/examples/rag).
125+
126+
### Step 1: Configure Milvus
127+
Configure milvus in a simple `yaml` file.
128+
```yaml
129+
project: rag
130+
provider: local
131+
registry: data/registry.db
132+
online_store:
133+
type: milvus
134+
path: data/online_store.db
135+
vector_enabled: true
136+
embedding_dim: 384
137+
index_type: "IVF_FLAT"
138+
139+
offline_store:
140+
type: file
141+
entity_key_serialization_version: 3
142+
# By default, no_auth for authentication and authorization, other possible values kubernetes and oidc. Refer the documentation for more details.
143+
auth:
144+
type: no_auth
145+
```
146+
147+
### Step 2: Define your Data Sources and Views
148+
You define your data declaratively using Feast's `FeatureView` and `Entity` objects, which are meant to be an easy way
149+
to give your software engineers and data scientists a common language to define data they want to ship to production.
150+
151+
Here is an example of how you might define a `FeatureView` for a document retrieval. Notice how we define the `vector`
152+
field and enable vector search by setting `vector_index=True` and the distance metric to `COSINE`.
153+
154+
That's it, the rest of the implementation is already handled for you by Feast and Milvus.
155+
156+
```python
157+
document = Entity(
158+
name="document_id",
159+
description="Document ID",
160+
value_type=ValueType.INT64,
161+
)
162+
163+
source = FileSource(
164+
file_format=ParquetFormat(),
165+
path="./data/my_data.parquet",
166+
timestamp_field="event_timestamp",
167+
)
168+
169+
# Define the view for retrieval
170+
city_embeddings_feature_view = FeatureView(
171+
name="city_embeddings",
172+
entities=[document],
173+
schema=[
174+
Field(
175+
name="vector",
176+
dtype=Array(Float32),
177+
vector_index=True, # Vector search enabled
178+
vector_search_metric="COSINE", # Distance metric configured
179+
),
180+
Field(name="state", dtype=String),
181+
Field(name="sentence_chunks", dtype=String),
182+
Field(name="wiki_summary", dtype=String),
183+
],
184+
source=source,
185+
ttl=timedelta(hours=2),
186+
)
187+
```
188+
189+
### Step 3: Update your Registry
190+
After we have defined our code we use the `feast apply` syntax in the same folder as the `feature_store.yaml` file and
191+
update the registry with our metadata.
192+
```bash
193+
feast apply
194+
```
195+
196+
### Step 4: Ingest your Data
197+
Now that we have defined our metadata, we can ingest our data into Milvus using the following code:
198+
```python
199+
store.write_to_online_store(feature_view_name='city_embeddings', df=df)
200+
```
201+
202+
### Step 5: Retrieve your Data
203+
Now that the data is actually stored in Milvus, we can easily query it using the SDK (and corresponding REST API) to
204+
retrieve the most similar documents for a given query embedding.
205+
```python
206+
context_data = store.retrieve_online_documents_v2(
207+
features=[
208+
"city_embeddings:vector",
209+
"city_embeddings:document_id",
210+
"city_embeddings:state",
211+
"city_embeddings:sentence_chunks",
212+
"city_embeddings:wiki_summary",
213+
],
214+
query=query_embedding,
215+
top_k=3,
216+
distance_metric='COSINE',
217+
).to_df()
218+
```
219+
220+
### The Benefits from using Feast for RAG
221+
We've discussed some of the high-level benefits from using Feast for a RAG application.
222+
More specifically, here are some of the concrete benefits you can expect from using Feast for RAG:
223+
1. [Real-time, Stream, and Batch data Ingestion](https://docs.feast.dev/getting-started/concepts/data-ingestion) support to the Feature Server for online retrieval
224+
1. [Data dictionary/metadata catalog](https://docs.feast.dev/getting-started/components/registry) autogenerated from code
225+
3. [UI exposing the metadata catalog](https://docs.feast.dev/reference/alpha-web-ui)
226+
2. [FastAPI Server](https://docs.feast.dev/getting-started/components/feature-server) to serve your data
227+
3. [Role Based Access Control (RBAC)](https://docs.feast.dev/getting-started/concepts/permission) to govern access to your data
228+
6. [Deployment on Kubernetes](https://docs.feast.dev/how-to-guides/running-feast-in-production) using our Helm Chart or our Operator
229+
7. [Multiple vector database providers](https://docs.feast.dev/reference/alpha-vector-database)
230+
8. [Multiple data warehouse providers](https://docs.feast.dev/reference/offline-stores/overview#functionality-matrix)
231+
9. Support for different [data sources](https://docs.feast.dev/reference/data-sources/overview#functionality-matrix)
232+
10. Support for stream and [batch processors (e.g., Spark and Flink)](https://docs.feast.dev/tutorials/building-streaming-features)
233+
234+
And more!
235+
236+
## The Future of Feast and GenAI
237+
238+
Feast will continue to invest in GenAI use cases.
239+
240+
In particular, we will invest in (1) NLP as a first-class citizen, (2) support for images, (3) support for
241+
transforming unstructured data (e.g., PDFs), (4) an enhanced GenAI focused feature server to allow our end-users to
242+
more easily ship RAG to production, (4) an out of the box chat UI meant for internal development and fast iteration,
243+
and (5) making [Milvus]([url](https://milvus.io/intro)) a fully supported and core online store for RAG.
244+
245+
## Join the Conversation
246+
247+
Are you interested in learning more about how Feast can help you build and deploy RAG applications to production?
248+
Reach out to us on Slack or [GitHub](https://github.com/feast-dev/feast), we'd love to hear from you!
204 KB
Loading
265 KB
Loading

infra/website/src/pages/index.astro

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,19 @@ features = store.get_online_features(
3333
"product_features:price"
3434
],
3535
entity_rows=[{"customer_id": "C123", "product_id": "P456"}]
36-
).to_dict()`;
36+
).to_dict()
37+
38+
# Retrieve your documents using vector similarity search for RAG
39+
features = store.retrieve_online_documents(
40+
features=[
41+
"corpus:document_id",
42+
"corpus:chunk_id",
43+
"corpus:chunk_text",
44+
"corpus:chunk_embedding",
45+
],
46+
query="What is the biggest city in the USA?"
47+
).to_dict()
48+
`;
3749
---
3850

3951
<BaseLayout title="Feast - The Open Source Feature Store for Machine Learning">
@@ -42,7 +54,7 @@ features = store.get_online_features(
4254
<div class="bordered-container">
4355
<section class="hero-section">
4456
<div class="max-width-wrapper">
45-
<h1 class="hero-title text-smooth">Feature Serving for Production AI</h1>
57+
<h1 class="hero-title text-smooth">Serving Data for Production AI</h1>
4658
<p class="hero-subtitle text-smooth text-center">
4759
Feast is an open source feature store that delivers structured data to AI and LLM applications at high scale during training and inference
4860
</p>

0 commit comments

Comments
 (0)