ご利用にはKandaSearchへの
ユーザー登録(無料)が必要です
最新バージョン: 1.1.0
開発者: RONDHUIT
ダウンロード数: 29
最終更新日: 2023-01-24
Copyright: RONDHUIT Co.,LTD
最新バージョン: 1.1.0
開発者: RONDHUIT
ダウンロード数: 29
最終更新日: 2023-01-24
Copyright: RONDHUIT Co.,LTD
ロンウイット開発によるApache Solrのプラグインです。セマンティックサーチモデルをSolrサーバーにデプロイし、インデクシングおよび検索時にSolr内部で密ベクトルを生成できるようにします。
The JAR file generated by gradle has to be placed under {SOLR_HOME}/{collection}/lib
to be loaded by Solr.
A custom Solr UpdateRequestProcessor to get the embeddings of a specified field and store them in a valid knn_vector
field.
The URP is configured in the collection's solrconfig
as follows:
<updateRequestProcessorChain name="get-embeddings">
<processor class="com.rondhuit.solr.update.dense.EmbeddingsProcessorFactory">
<str name="hostName">host.docker.internal</str>
<int name="portNo">8080</int>
<str name="modelName">sonoisa</str>
<str name="sourceField">title</str>
<str name="targetField">title_vector</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
hostName
and portNo
can be omitted. Their default values are localhost
and 8080
respectively.modelName
has to match a model deployed in TorchServe.sourceField
is the text field for which we want to calculate embeddings (to be vectorized).targetField
is a field of type knn_vector
to hold the calculated embeddings.A custom Solr SearchComponent to get the embeddings of a query and perform a Dense vector search
.
The Search component is configured in the collection's solrconfig.xml
as follows:
Component definition:
<searchComponent name="dense-search"
class="com.rondhuit.solr.handler.component.DenseVectorSearchComponent">
<str name="hostName">host.docker.internal</str>
<int name="portNo">8080</int>
<str name="modelName">sonoisa</str>
</searchComponent>
hostName
and portNo
can be omitted. Their default values are localhost
and 8080
respectively.modelName
has to match a model deployed in TorchServe.Search Handler definition:
DenseVectorSearchComponent
should be placed before QueryComponent
. To do that, we define dense-search
in the first-components
of the SearchHandler
.
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
</lst>
<arr name="first-components">
<str>dense-search</str>
</arr>
</requestHandler>
Query parameters to perform a KS Dense Vector Search:
Parameter | Description | Default | Required |
---|---|---|---|
_ks.dense | Flag to activate DenseVectorSearchComponent . |
false | yes |
_ks.dense.model | Model to use. Has to match a model deployed in TorchServe. | modelName that is specified in DenseVectorSearchComponent. |
no |
_ks.dense.field | Schema field of type knn_vector holding the embeddings. |
yes | |
_ks.dense.k | How many k-nearest results to return. Equivalent to Solr's knn Query Parser topK . |
100 | no |
http://localhost:8983/solr/covid/select?q=covid%20incubation%20period&_ks.dense=true&_ks.dense.field=answer_embeddings
http://localhost:8983/solr/covid/select?q=covid%20incubation%20period&_ks.dense=true&_ks.dense.field=answer_embeddings&fl=title%2C%20answer
curl -X POST "http://localhost:8983/solr/covid/select" -d '
{
"query" : "covid incubation time",
"fields": "title,answer",
"params": {
"_ks.dense": "true",
"_ks.dense.model": "sonoisa",
"_ks.dense.field": "answer_embeddings",
"_ks.dense.k": 10
}
}'