SEMANTIC

About Semantic Search

What is semantic?

The term 'semantic' in semantic search generally refers to 'meaning' or 'semantics.' In IT terminology, it is used to describe technologies that enable computers to understand the 'meaning' of documents and information.

Semantic search refers to

Semantic search is an information retrieval process where the search engine understands the 'meaning' of the search query and the content being searched. KandaSearch leverages AI technology to provide functionality that allows the search engine to understand the meanings of text and images in a way similar to human comprehension.

In traditional keyword search, documents containing words that match the keywords were sought. In contrast, semantic search returns documents that are 'semantically' appropriate. Additionally, the search targets include not only text but also images and data.

Innovative Search Technology

This semantic search is gaining attention as a groundbreaking new technology, and its application areas are expanding. The following types of searches, which were difficult to achieve with traditional keyword searches, are now possible:

  • Searches that understand the meaning of queries in natural language
  • Content matching that performs semantic similarity searches based on the content of a document within a collection of documents
  • Meaning-based associative searches
  • Searching English documents using Japanese (and applicable across other languages)
  • Image searches via text, as well as multimodal searches between different types of objects
  • Combining semantic search results with generative AI (LLM: Large Language Model) to generate more accurate and up-to-date responses (RAG: Retrieval-Augmented Generation)

Generative AI, such as ChatGPT, is an innovative and impressive technology. However, there is the troublesome phenomenon of "hallucination", where the AI provides answers as if they were true, even when they are based on non-existent information. With semantic search, it is possible to find more reliable and accurate information from a collection of trustworthy documents with clear sources.

Comparison between Semantic Search and traditional Keyword Search:

  1. In traditional keyword search, users need to input keywords present in the document to perform a search.

    • A certain level of domain knowledge (specialized knowledge for each industry) is required to conduct a search.
    • Even with domain knowledge, since there are multiple keywords with the same meaning, maintenance of a thesaurus is essential.
  2. In Semantic Search, it is not necessary for the query to match keywords exactly.

    • Even without domain knowledge, users can search by expressing their desired information in their own way (using their own level of expression).
    • Searches can be conducted even if there is a time gap between the document collection period and the query period.
  3. Traditional keyword search required the following techniques and maintenance to improve search performance, which are unnecessary in Semantic Search, leading to reduction in effort and cost:

    • Various types of normalization such as character normalization.
    • Differentiation and combination of morphological analysis and character N-gram.
    • Countermeasures for keyword spelling variations and synonym definitions.
    • Weighting of fields, etc.

Examples of Semantic Search Application:

By leveraging the characteristics of semantic search, it can be applied to the following business applications:

Searching FAQs and manuals

In searching FAQs and manuals, while the documents being searched are highly specialized, the searchers (such as product users, typically external individuals) are not necessarily experts. This often leads to vocabulary and knowledge gaps between the two parties.
Unlike traditional keyword searches where a query must match the keywords exactly, semantic search can retrieve results even if the keywords do not match exactly but have similar meanings. This enables users to resolve issues on their own and reduces inquiries to call centers/help desks.

Internal enterprise search, knowledge sharing, search at call centers/help desks

In scenes of internal enterprise search and knowledge sharing, the documents being searched are highly specialized, thus possessing strong 'meaning (semantic)'. Semantic search, which allows searching for items with similar meanings even without knowing specialized terminology, is well-suited for such scenarios.

Similar Product Search on E-commerce/Online Shopping Sites

By using semantic search on EC sites/online shopping platforms, it's possible to present other products similar to those specified during the search, based on text and image similarities. This can lead to increased opportunities for purchases.

Similar design and trademark searches, among others, involve searches beyond just text

Semantic search can be executed even when the search target or query is not text. Therefore, it is possible to build an application that searches for similarities to designs or trademarks that are about to be registered with the patent office, based on illustration image files on hand, and displays results in order of the 'meaning' conveyed by the images.

Inquiry based learning and research and development

Meaning-based associative search can be easily implemented with educational resources (dictionaries, encyclopedias, textbooks) and highly specialized papers. It is suitable for exploratory learning and discovery-oriented searches such as research and development (R&D).

Semantic search allows searching based on the 'meaning' of both the query and the search target document, enabling seamless searching across different types of objects. For example, it's possible to search for similar images by specifying an image, search for images using text, or search for an English manual using Japanese.

Comparison with ChatGPT

ChatGPT is a generative AI chat tool that engages in real-time conversations with humans through text prompts and commands.
On the other hand, KandaSearch's search engine, which supports semantic search, is a tool for finding desired items from large amounts of data such as text, images, videos, and audio. The two serve entirely different purposes.

ChatGPT excels at inference without a correct answer. Therefore, users need to be cautious about whether ChatGPT's output is correct. In contrast, KandaSearch's semantic search retrieves results that are close to the 'meaning' of the specified query at the time of search, from a reliable set of documents. Therefore, there is no concern about 'hallucinations', making it suitable for business applications that involve direct interaction with external stakeholders such as customers.

Are you hesitating to introduce semantic search engines just because generative AI chats like ChatGPT are making waves on the internet? The video below explains clearly the situations where semantic search engines should be introduced instead of generative AI chats. Be sure to check it out.

AI Chat vs. Semantic Search Engines

We also provide various information and data related to semantic search on other pages. (Including the above introduction)

For estimates and details,
please feel free to contact our development team.

Contact Us
TOP