blog-banner

Introducing Vector Search with pgvector in CockroachDB

Last edited on October 2, 2024

0 minute read

    We are thrilled to introduce Vector Search in the 24.2 release of CockroachDB in preview. Vector Search is essential for enhancing functionalities such as semantic search, recommendation systems, and natural language processing, which are critical for AI-driven applications, including Large Language Models (LLMs).

    Resilient and scalable database for AI/ML Copy Icon

    Investing in CockroachDB as a database is a future-proof decision for modernizing legacy applications. The addition of Vector Search to CockroachDB, combined with the 99.999% SLA of our cloud offering plus advanced security features such as PCI compliance and HIPAA readiness, ensures applications remain resilient, scalable, and capable of meeting the evolving demands of AI and machine learning workloads across a variety of industries.

    Additionally, you can take advantage of CockroachDB's unlimited horizontal scale to store hundreds of millions to billions of vector data points. This capability allows for efficient management and retrieval of vast amounts of vector data, ensuring that your AI/ML applications can handle large-scale data processing and storage requirements seamlessly. 

    CockroachDB's distributed architecture ensures that as your data grows, the system can scale out horizontally. This maintains performance and reliability without the need for complex manual sharding or extensive reconfiguration.

    What is Vector Search?Copy Icon

    At its core, Vector Search is a method of finding data points that are similar to a given query by leveraging mathematical representations known as vectors. In contrast to traditional search methods that rely on exact matches, Vector Search uses the concept of similarity to deliver more relevant and nuanced results. 

    In the context of text search, vectors can capture the semantic meaning of words, phrases, or entire documents. For example, using Vector Search, a user who searches for "classical piano music" can also get recommendations for "instrumental piano pieces" and "baroque keyboard compositions”. 

    This is achieved through techniques like word embeddings (e.g., Word2Vec, GloVe) or more advanced models like GPT, which transform text into high-dimensional vectors. Text embeddings can also be extracted using APIs from leading AI providers like OpenAI and Anthropic and seamlessly stored in CockroachDB. This allows customers to augment LLMs with additional context tied to a specific use case which results in higher quality answers.

    Cockroach Labs Vector Search AI workflow

    CockroachDB’s new Vector Search capabilities can be used to power a variety of AI related use cases.

    Why CockroachDB + Vector Search?Copy Icon

    CockroachDB's integration of Vector Search capabilities makes it an ideal choice for generative AI applications by combining the strengths of a vector database and an operational database into a single, horizontally scalable solution. Having a single database for both your vector and operational data simplifies your architecture, while eliminating the operational and financial costs associated with maintaining a dedicated vector database. 

    This means your generative AI applications can take advantage of CockroachDB’s industry standard high availability and resilience features out of the box. CockroachDB's ability to distribute data evenly across many nodes supports the extensive data processing needs of generative AI applications, while its global data access and low latency enable fast and consistent search results across multiple regions.

    In addition to taking advantage of CockroachDB’s resilience and data domiciling features in your generative AI applications, you can also take advantage of CockroachDB’s distributed SQL engine and perform complex SQL operations on your vector data. For example, consider a scenario where you have a table of items with various categories and associated vector data. By creating a secondary index on the category column, you can quickly filter items by category before performing a Vector Search. This approach ensures that only the relevant subset of vector data is processed during the Vector Search, which improves efficiency. 

    For instance, if you are looking for items in the ”electronics” category that are most similar to a given vector, the secondary index on the category column will first filter out non-electronics items. Then, Vector Search will be applied to this filtered subset, resulting in faster query execution times and more efficient use of resources.

    Compatible with pgvectorCopy Icon

    pgvector is an open-source extension for PostgreSQL that provides efficient storage, retrieval, and similarity search of vector data. CockroachDB’s Vector Search implementation uses the same interface as that of pgvector’s and aims to be compatible with pgvector’s API. 

    Our support for pgvector’s API facilitates seamless integration with tools like Langchain and Hugging Face, making it easier to incorporate real-time data into AI models. This comprehensive approach allows CockroachDB to serve as a robust backend for Retrieval Augmented Generation (RAG) frameworks, providing up-to-date and accurate data to enhance AI-generated content without the need for costly fine-tuning.

    We are also working on adding support for vector indexing in a future release of CockroachDB. By indexing vectors, the system can quickly narrow down the search space, leading to faster query execution times and reduced computational overhead. This will further optimize the performance of AI and machine learning workloads, making CockroachDB an even more powerful solution for handling large-scale data intensive AI applications.

    Vector Search will be available in preview across our CockroachDB Cloud offerings as well as for CockroachDB Self-Hosted deployments starting with the 24.2 release of CockroachDB. 

    How to start with Vector Search in CockroachDBCopy Icon

    Explore the following resources to get started with Vector Search in the 24.2 release of CockroachDB:

    Learn more about how CockroachDB makes you AI-ready: Visit here to speak with an expert.

    AI/ML
    Vector Search
    pgvector