PGvector
This section walks you through setting up the PGvector VectorStore
to store document embeddings and perform similarity searches.
What is PGvector?
PGvector is an open-source extension for PostgreSQL that enables storing and searching over machine learning-generated embeddings. It provides different capabilities that let users identify both exact and approximate nearest neighbors. It is designed to work seamlessly with other PostgreSQL features, including indexing and querying.
Prerequisites
-
OpenAI Account: Create an account at OpenAI Signup and generate the token at API Keys.
-
Access to PostgreSQL instance with the following configurations
The setup local Postgres/PGVector appendix shows how to set up a DB locally with a Docker container.
On startup, the PgVectorStore
will attempt to install the required database extensions and create the required vector_store
table with an index. Optionally, you can do this manually like so:
CREATE EXTENSION IF NOT EXISTS vector CREATE EXTENSION IF NOT EXISTS hstore CREATE EXTENSION IF NOT EXISTS "uuid-ossp" CREATE TABLE IF NOT EXISTS vector_store ( id uuid DEFAULT uuid_generate_v4() PRIMARY KEY, content text, metadata json, embedding vector(1536) ) CREATE INDEX ON vector_store USING HNSW (embedding vector_cosine_ops)
Configuration
To set up PgVectorStore
, you need to provide (via application.yaml
) configurations to your PostgreSQL database.
Additionally, you’ll need to provide your OpenAI API Key. Set it as an environment variable like so:
export SPRING_AI_OPENAI_API_KEY='Your_OpenAI_API_Key'
Repository
To acquire Spring AI artifacts, declare the Spring Snapshot repository:
<repository>
<id>spring-snapshots</id>
<name>Spring Snapshots</name>
<url>https://repo.spring.io/snapshot</url>
<releases>
<enabled>false</enabled>
</releases>
</repository>
Dependencies
Add these dependencies to your project:
-
PostgreSQL connection and
JdbcTemplate
auto-configuration.
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<scope>runtime</scope>
</dependency>
-
OpenAI: Required for calculating embeddings.
<dependency>
<groupId>org.springframework.experimental.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
<version>0.7.1-SNAPSHOT</version>
</dependency>
-
PGvector
<dependency>
<groupId>org.springframework.experimental.ai</groupId>
<artifactId>spring-ai-pgvector-store</artifactId>
<version>0.7.1-SNAPSHOT</version>
</dependency>
Sample Code
To configure PgVectorStore
in your application, you can use the following setup:
Add to application.yml
(using your DB credentials):
spring: datasource: url: jdbc:postgresql://localhost:5432/vector_store username: postgres password: postgres
Integrate with OpenAI’s embeddings by adding the Spring Boot OpenAI Starter to your project. This provides you with an implementation of the Embeddings client:
@Bean
public VectorStore vectorStore(JdbcTemplate jdbcTemplate, EmbeddingClient embeddingClient) {
return new PgVectorStore(jdbcTemplate, embeddingClient);
}
In your main code, create some documents:
List<Document> documents = List.of(
new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
new Document("The World is Big and Salvation Lurks Around the Corner"),
new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));
Add the documents to your vector store:
vectorStore.add(List.of(document));
And finally, retrieve documents similar to a query:
List<Document> results = vectorStore.similaritySearch(SearchRequest.query("Spring").withTopK(5));
If all goes well, you should retrieve the document containing the text "Spring AI rocks!!".