Retrieval Augmented Generation with Spring AI

In our last post, we looked at enriching the OpenAI model with custom data through function calls. While this technique is useful, it has its limitations and performance trade-offs. Today, we explore a more efficient way of incorporating relevant data into prompts to receive accurate and relevant model responses. Retrieval Augmented Generation, or RAG, relies on preprocessed data that is readily available upon request. In this post, we will build an Extract, Transform, Load (ETL) pipeline that stores a large corpus of weather forecasts and learn how to efficiently retrieve relevant information from a vector store.

Leveraging data that’s readily available provides a significant advantage over making real-time calls to external services. SpringAI ships with a concise toolkit that allows you to ingest data in the most popular formats, transform it, and save it in a database. In this post, we will take a closer look at how to load JSON responses from a weather API into Redis. More specifically, we will explore using Redis as a vector database.

Why Vector Database?
Loading JSON Data
Only Use What You Need
Redis as a Vector Store
Time to Answer Questions
Summary

Why Vector Database?

Vector databases offer unique advantages over traditional relational databases, particularly for AI applications.

Unlike relational databases that rely on exact matches, vector databases excel at finding items that are similar, given the query. This is done through vector similarity searches, where the database identifies and returns data points that closely resemble the input vector.

This makes vector databases better suited to work with AI models. The process begins by loading data into the database. When a query is made, the database retrieves a set of documents that are similar to the query. These documents then provide an additional context to the AI model, so that it can respond in the best possible way — a technique known as Retrieval Augmented Generation (RAG).

Loading JSON Data

There’s a variety of unstructured and semi-structured data formats you can use in your ingest pipeline. In our example, we stick to JSON. More specifically, we will load a weekly weather forecast from an online weather service (details can be found on GitHub).

I have downloaded weekly data of four US cities from a publicly available endpoint of the National Weather Service. Here’s an example of a forecast for a specific hour in a day in one of the cities:

      {
        "number": 21,
        "name": "",
        "startTime": "2024-07-31T22:00:00-05:00",
        "endTime": "2024-07-31T23:00:00-05:00",
        "isDaytime": false,
        "temperature": 84,
        "temperatureUnit": "F",
        "temperatureTrend": "",
        "probabilityOfPrecipitation": {
          "unitCode": "wmoUnit:percent",
          "value": 0
        },
        "dewpoint": {
          "unitCode": "wmoUnit:degC",
          "value": 22.222222222222221
        },
        "relativeHumidity": {
          "unitCode": "wmoUnit:percent",
          "value": 67
        },
        "windSpeed": "10 mph",
        "windDirection": "SSE",
        "icon": "https://api.weather.gov/icons/land/night/haze?size=small",
        "shortForecast": "Haze",
        "detailedForecast": ""
      },

That’s a lot of information. However, we mostly care about temperature, temperatureUnit and the time period: startTime, endTime.

When it comes to data ingest, SpringAI obviously provides its own toolkit. However, I advise against using it. As of time of this writing, Spring’s JsonReader has severe limitations and I can’t imagine using it in production. Instead, we simply leverage Jackson object mapper that offers both convenience and flexibility. Especially, when it comes to extracting small chunks of data from large documents.

Only Use What You Need

Now it’s on time to think about our use case. We’re building a generic chatbot that’s aware of the current weather forecast. Given an hour of the day, it can advise us on how hot or cold it’s going to be at a particular location.

Instead of loading large chunks of JSON in our vector database and hope our AI model is smart enough to make sense of all the data, we focus only on providing information we need to accomplish the task at hand. This saves not only space in the database but, more importantly, caters for faster and more accurate chatbot responses.

Our raw JSON data comprises the following key sections:

metadata: This is a key-value store for custom data. In our case it identifies the location the forecast relates to. Examples: Austin, TX or Los Angeles, CA
properties: A top-level data container.
a list of time periods: An hourly breakdown of the weather forecast for a particular day.

Our data model closely reflects the structure above, which caters for an easy implementation and maintenance.

data class HourlyForecast(
    val metadata: Map<String, String>,
    val properties: Properties
)

data class Properties(
    val periods: List<TimePeriod>
)

data class TimePeriod(
    val startTime: String,
    val endTime: String,
    val temperature: Int,
    val temperatureUnit: String,
)

Please note that we only extract essential information and ignore everything else. The object mapper caters to that:

@Bean
fun objectMapper(): ObjectMapper {
    return jacksonObjectMapper()
       .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
}

// Later in our code
val forecast = objectMapper.readValue(file, HourlyForecast::class.java)

Redis as a Vector Store

SpringAI provides integration with Redis out of the box:

spring:
  ai:
    vectorstore:
      redis:
        uri: "redis://localhost:6379"
        index: "vectorstore"
        prefix: "default:"

In our example, we use docker-compose to run Redis locally.

In order to populate the store, we leverage Spring’s VectorStore interface:

val documents = forecast.properties.periods.chunked(50).map { periods ->
    val json = objectMapper.writeValueAsString(periods)
    Document(json, forecast.metadata)
}
vectorStore.add(documents)

There’s a couple of things to watch out for:

Splitting documents to chunks: Trying to store large documents can take a significant amount of time. If you’re able to sensibly split input documents into smaller chunks, you are rewarded with much faster ingest. You can read more on splitting input in the SpringAI reference manual.
Adding metadata: Enriching documents with metadata provides additional context to our AI model. In this case, we attach location names.

Time to Answer Questions

Since we populated Redis with weather data, we are ready to leverage the vector store as a source of enrichment. This is known as a question answering problem. Question Answering (QA) systems are designed to answer questions posed by users in natural language. These systems can be simple, answering questions with predefined responses, or complex — generating answers from a larger corpus of text, which is what we are trying to do here.

SpringAI provides a QA assistant backed by the vector store we’ve just populated:

import org.springframework.ai.chat.client.ChatClient
import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor
import org.springframework.ai.vectorstore.SearchRequest

val response = ChatClient.builder(model).build().prompt()
                .advisors(
                   QuestionAnswerAdvisor(
                      vectorStore, 
                      SearchRequest.query(message)
                   )
                )
                .user(message)
                .call().content()

Now, when we ask a question that can only be answered by exploring documents in the vector store, we get a reasonable and accurate answer. Here is an example:

@Service
class ChatService(
  private val model: OpenAiChatModel,    
  private val vectorStore: VectorStore
) { .. }

// Later in the code
val response = chatService.getResponse(
  "What's the temperature in Austin on 31 July 2024 at 03:00?"
)

// The response might look like this:
// The temperature in Austin on 31 July 2024 at 03:00 was 80°F.

The answer perfectly matches the record in one of the stored documents:

      {
        "startTime": "2024-07-31T02:00:00-05:00",
        "endTime": "2024-07-31T03:00:00-05:00",
        "temperature": 80,
        "temperatureUnit": "F",
        ...
      },

Please note that we didn’t have to train the model or configure it in any way. Loading the documents in the vector store was enough for our OpenAI model (gpt-3.5-turbo) to extract relevant information and answer the question.

Summary

In this post, we looked into Retrieval Augmented Generation (RAG) backed by Redis as a vector store. It showcases SpringAI as a productivity tool, albeit not the most efficient one, as detailed further in the previous post. The ability to leverage a whole bunch of semi-structured documents without having to retrain or re-configure the OpenAI model is surely an insightful experience. We’ve seen how the model autonomously extracted useful information from the vector store. This highlights the potential for innovative use of AI in software development, opening new avenues to increased efficiency.

Thanks for reading. Feel free to check out the project from GitHub and experiment on your own.

Tomas Zezula

Retrieval Augmented Generation with Spring AI

Table of Contents

Why Vector Database?

Loading JSON Data

Only Use What You Need

Redis as a Vector Store

Time to Answer Questions

Summary

New Series on Spring Security and WebFlux

Spring MVC, TeeOutputStream, grep4j and How it All Fits Together

Conquer Authentication with Ktor: Part 5 – Introduction into JSON Web Tokens

Implementing Stateless OAuth in Ktor Using Google and JWT

Spring series, part 4: @Lazy on injection points

Conquer Authentication with Ktor: Part 6 – Implementing JSON Web Tokens

Table of Contents

Why Vector Database?

Loading JSON Data

Only Use What You Need

Redis as a Vector Store

Time to Answer Questions

Summary

Similar Posts