Unlock the Meaning Behind Your Data with Vector Search

Unlock the Meaning Behind Your Data with Vector Search

In a relational database, data is stored as such without any special meaning attached to it, and when relational data is searched, column values are used to match data stored in a table. A search of this type is called a lexical search.

What Is Semantic Search?

A semantic search is a search that takes into consideration the meaning of the query rather than doing a plain word-match lexical search. Semantic search has to interpret the meaning of a search query before trying to match the query with stored data. As an example, a search for "orange" could mean the color orange or the fruit orange. Data has to be stored in a meaningful way to be searchable in a meaningful way. Vector databases store data in a meaningful way using vector embeddings, which are introduced in the next section. Vector search can make a semantic search on data stored using vector embeddings. Vector search can't be made on just any type of data.

Vector Embeddings

Vector embeddings are numerical representations of data elements in a multi-dimensional vector space. Examples of data that are typically converted to embeddings are an image, a word, or a phrase. The dimensions represent features of a data element. Creating vector embeddings for non-numerical, unstructured data is no trivial task. Generating vector embeddings is a machine learning task by itself, and models (for example, word2vec) designed specially to generate vector embeddings have to be used.

A question one may ask is why a numerical representation is needed. Can’t the data be stored as a tuple? As an example, store the phrases, “orange is my favorite fruit”, and “green is my favorite color” in tuples such as (orange,favorite,fruit), and (green,favorite,color). While storing data in an array form while preserving its meaning may seem an obvious choice, machine learning (ML) models won’t understand it. ML models only understand numerical values, and that is why we need to store the non-numerical data numerically while still preserving its meaning.

The dimensions have to be defined, and can be dozens or hundreds of them to define complex data. The choice of dimensions can be made by using a ML model, such as the word2vec model. A data element may get a high numerical value along one dimension and a much lower value along another dimension. An example vector is (25, 723, -3, 256, ..., 43).

A New Data Type Called VECTOR

Some databases such as Oracle Database 23ai introduced a new data type called VECTOR to store vector embeddings data. Vector search can be used to search structured and unstructured data by taking into consideration the meaning, or semantics, of the query in addition to its lexical aspect. An example table created with a VECTOR type column is CREATE TABLE t (INT id, CLOB doc_text, VECTOR doc_vector);

How to Measure the Similarity of Data Elements?

The numerical representations of data stored in vector embeddings can be compared using different techniques that are already used with vectors, such as cosine similarity. Cosine similarity is always in the range [-1,1]. Proportional, or similar vectors would have cosine similarity of 1, orthogonal vectors 0, and opposite vectors -1. Machine learning models can be used to determine vector similarity and return semantic search results.

How to Score Vector Search?

Search using exact search terms is predictable. It invariably returns the same result on the same data. Lexical search can be scored precisely on the basis of whether it returns all data that matches a search term. Semantic search is not easy to score because of its variability. Some techniques such as semantic textual similarity and semantic ranking have been developed to score semantic search.

Conclusion

For some types of searches, it may be relevant for a search engine to know the meaning implied by a search term. As an example, if a user searches for “orange is my favorite fruit” a semantic search will only search for data in which “orange” is used in the context of a fruit, and not a color. If semantic search is not used, the lexical search would search for all mentions of “orange”, whether the context is color or fruit. Vector search is a search technique that can be used to enable semantic search by encoding the semantic relationships between data elements. As a benefit, semantic search can be made in which the meaning of a search query is also taken into consideration along with the exact search term matching.

Tags: 

Up Next

About the Author

TechWell Insights To Go

(* Required fields)

Get the latest stories delivered to your inbox every month.