EXTENSIONS

RONDHUIT Semantic Search Plugin
無料

ご利用にはKandaSearchへの
ユーザー登録(無料)が必要です

Solr Plugin
RONDHUIT Semantic Search Plugin

最新バージョン: 1.1.0

開発者: RONDHUIT

ダウンロード数: 29

最終更新日: 2023-01-24

Copyright: RONDHUIT Co.,LTD

最新バージョン: 1.1.0

開発者: RONDHUIT

ダウンロード数: 29

最終更新日: 2023-01-24

Copyright: RONDHUIT Co.,LTD

ロンウイット開発によるApache Solrのプラグインです。セマンティックサーチモデルをSolrサーバーにデプロイし、インデクシングおよび検索時にSolr内部で密ベクトルを生成できるようにします。

Semantic search plugin

Placement

The JAR file generated by gradle has to be placed under {SOLR_HOME}/{collection}/lib to be loaded by Solr.

Update Request Processor

About

A custom Solr UpdateRequestProcessor to get the embeddings of a specified field and store them in a valid knn_vector field.

URP Configuration

The URP is configured in the collection's solrconfig as follows:


<updateRequestProcessorChain name="get-embeddings">
    <processor class="com.rondhuit.solr.update.dense.EmbeddingsProcessorFactory">
        <str name="hostName">host.docker.internal</str>
        <int name="portNo">8080</int>
        <str name="modelName">sonoisa</str>
        <str name="sourceField">title</str>
        <str name="targetField">title_vector</str>
    </processor>

    <processor class="solr.LogUpdateProcessorFactory"/>
    <processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
  • hostName and portNo can be omitted. Their default values are localhost and 8080 respectively.
  • modelName has to match a model deployed in TorchServe.
  • sourceField is the text field for which we want to calculate embeddings (to be vectorized).
  • targetField is a field of type knn_vector to hold the calculated embeddings.

Search component

About

A custom Solr SearchComponent to get the embeddings of a query and perform a Dense vector search.

SC Configuration

The Search component is configured in the collection's solrconfig.xml as follows:

  • Component definition:

      <searchComponent name="dense-search"
                       class="com.rondhuit.solr.handler.component.DenseVectorSearchComponent">
        <str name="hostName">host.docker.internal</str>
        <int name="portNo">8080</int>
        <str name="modelName">sonoisa</str>
      </searchComponent>
    
    • hostName and portNo can be omitted. Their default values are localhost and 8080 respectively.
    • modelName has to match a model deployed in TorchServe.
  • Search Handler definition:

    DenseVectorSearchComponent should be placed before QueryComponent. To do that, we define dense-search in the first-components of the SearchHandler.

      <requestHandler name="/select" class="solr.SearchHandler">
        <lst name="defaults">
          <str name="echoParams">explicit</str>
          <int name="rows">10</int>
        </lst>
        <arr name="first-components">
          <str>dense-search</str>
        </arr>
      </requestHandler>
    

Query parameters

Query parameters to perform a KS Dense Vector Search:

Parameter Description Default Required
_ks.dense Flag to activate DenseVectorSearchComponent. false yes
_ks.dense.model Model to use. Has to match a model deployed in TorchServe. modelName that is specified in DenseVectorSearchComponent. no
_ks.dense.field Schema field of type knn_vector holding the embeddings. yes
_ks.dense.k How many k-nearest results to return.
Equivalent to Solr's knn Query Parser topK.
100 no

Standard query request examples

http://localhost:8983/solr/covid/select?q=covid%20incubation%20period&_ks.dense=true&_ks.dense.field=answer_embeddings
http://localhost:8983/solr/covid/select?q=covid%20incubation%20period&_ks.dense=true&_ks.dense.field=answer_embeddings&fl=title%2C%20answer

Json Request API examples

curl -X POST "http://localhost:8983/solr/covid/select" -d '
{
  "query" : "covid incubation time",
  "fields": "title,answer",
  "params": {
    "_ks.dense": "true",
    "_ks.dense.model": "sonoisa",
    "_ks.dense.field": "answer_embeddings",
    "_ks.dense.k": 10
  }
}'

お見積もり・詳細は KandaSearch チームに
お気軽にお問い合わせください。

お問い合わせ