Vector Databases — a list…

Koa Labs
KoaLabs
Published in
4 min readAug 7, 2023

--

Vector Databases are all the rage as useful for LLM/GenAI. Figured I’d publish a list that some of my friends have been building of the Vector Database options. Exceptional resource here as well. Great description by Andy Pavlo from OtterTune at post here that emphasizes how existing DBs are rapidly implementing capabilities that one would expect from a built for purpose vector database.

If you have adds/removes/changes — don’t hesitate to add in comments.

Faiss — (https://faiss.ai/#)

  • Faiss is fully open source.

Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.Faiss is written in C++ with complete wrappers for Python. Some of the most useful algorithms are implemented on the GPU. It is developed by Facebook AI Research.

Milvus — (https://milvus.io/)

Milvus was created in 2019 with a singular goal: store, index, and manage massive embedding vectors generated by deep neural networks and other machine learning (ML) models.

As a database specifically designed to handle queries over input vectors, it is capable of indexing vectors on a trillion scale. Unlike existing relational databases which mainly deal with structured data following a pre-defined pattern, Milvus is designed from the bottom-up to handle embedding vectors converted from unstructured data.

As the Internet grew and evolved, unstructured data became more and more common, including emails, papers, IoT sensor data, Facebook photos, protein structures, and much more. In order for computers to understand and process unstructured data, these are converted into vectors using embedding techniques. Milvus stores and indexes these vectors. Milvus is able to analyze the correlation between two vectors by calculating their similarity distance. If the two embedding vectors are very similar, it means that the original data sources are similar as well.

Weaviate — (https://weaviate.io/)

  • Open Source and Managed version

Weaviate is an open-source vector database. It allows you to store data objects and vector embeddings from your favorite ML-models, and scale seamlessly into billions of data objects.

Chroma — (https://www.trychroma.com/)

  • Open Source

Chroma is a database for building AI applications with embeddings. It comes with everything you need to get started built in, and runs on your machine. A hosted version is coming soon!

qdrant — (https://qdrant.tech/)

  • Open Source and Managed version

Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!

Vespa — (https://vespa.ai/)

  • Open Source and Managed version

Vespa is a fully featured search engine and vector database. It supports vector search (ANN), lexical search, and search in structured data, all in the same query. Integrated machine-learned model inference allows you to apply AI to make sense of your data in real time. Together with Vespa’s proven scaling and high availability, this empowers you to create production ready search applications at any scale, and with any combination of features.

Pgvector — (https://github.com/pgvector/pgvector)

  • Open Source

An open-source extension for PostgreSQL that allows you to store and query vector embeddings within your database. pgvector is easy to use and can be installed with a single command.

opensearch — (https://opensearch.org/)

  • Open Source

A community-driven, open source fork of Elasticsearch and Kibana following the license change in early 2021. It includes a vector database functionality that allows you to store and index vectors and metadata, and perform vector similarity search using k-NN indexes.

B) Proprietary products

Elasticsearch — (https://www.elastic.co/elasticsearch/)

  • Paid/Licensed

A distributed search and analytics engine that supports various types of data. One of the data types that Elasticsearch supports is vector fields, which store dense vectors of numeric values. In version 7.10, Elasticsearch added support for indexing vectors into a specialized data structure to support fast kNN retrieval through the kNN search API. In version 8.0, Elasticsearch added support for native natural language processing (NLP) with vector fields.

Pinecone — (https://www.pinecone.io/)

  • Paid/Licensed

Pinecone makes it easy to provide long-term memory for high-performance AI applications. It’s a managed, cloud-native vector database with a simple API and no infrastructure hassles. Pinecone serves fresh, filtered query results with low latency at the scale of billions of vectors.

Redis — (https://redis.io/)

  • Paid/Licensed

Redis Enterprise manages vectors in an index data structure to enable intelligent similarity search that balances search speed and search quality. Choose from two popular techniques, FLAT (a brute force approach) and HNSW (a faster, and approximate approach), based on your data and use cases.

Singlestore — (https://www.singlestore.com/)

  • Paid/Licensed

SingleStoreDB unifies transactions and analytics in a single engine to drive low-latencyaccess to large datasets, simplifying the development of fast, modern enterprise applications.Built for developers and architects, SingleStoreDB is based on a distributed SQL architecture, delivering 10–100 millisecond performance on complex queries — all while ensuring your business can effortlessly scale.

SinglestoreDB offers built-in vector database and also full text search capabilities.

--

--

Koa Labs
KoaLabs

Located in the heart of Harvard Square, Koa Labs is a Seed Fund for promising start-ups. http://koalabs.com