BLOG

  • トップ
  • ブログ
  • Solr 9.7 Load Test Report: Evaluation of Semantic Search Performance Improvement through SIMD Optimization

Solr 9.7 Load Test Report: Evaluation of Semantic Search Performance Improvement through SIMD Optimization

著者: Mingchun Zhao

     

投稿日: 2024年10月09日  更新日: 2024年10月16日

 
  • Solr
  • Semantic Search
  • SIMD Optimization
  • Vector Search
  • Performance

Download PDF

Download PDF

Table of Contents

Introduction

The recently released Apache Solr 9.7.0 on September 9, 2024, includes performance improvements in semantic search (vector search). For more details, please refer to Solr 9.7.0 Release Highlights.

Apache Lucene upgraded to 9.11.1 introducing tremendous performance improvements when using Java 21 for vector search among other things.

These performance enhancements are made possible through the integration of the Incubating Panama Vector API, enabling SIMD optimizations for vector calculations in Java 20 and Java 21. To enable this feature by default, the option --add-modules jdk.incubator.vector has been added to the Solr Java command.

What is SIMD Optimization

SIMD(Single Instruction, Multiple Data) refers to a technique that applies the same operation to multiple data points simultaneously, increasing computational efficiency. SIMD is particularly effective for tasks requiring processing of large datasets, such as scientific computing and image/audio processing.

Key Points of SIMD Optimization

  • Hardware Support

    • Requires CPUs or GPUs that support dedicated SIMD instruction sets (e.g., Intel’s SSE, AVX, ARM’s NEON).
  • Vectorization of Algorithms and Data Structures

    • Data must be vectorized to enable efficient parallel processing, which can complicate implementation and debugging.
  • Performance Improvement through Data Parallel Processing

    • Running the same operation on multiple data points simultaneously reduces processing time
    • Significantly enhancing performance of applications by making effective use of system resources.

Benefits of Utilizing SIMD Optimization in Solr

In Apache Solr, SIMD optimization can substantially enhance search performance, especially in semantic searches. It enables fast and efficient searches for complex query processing and large datasets.

  • Improved Text Analysis Performance

    • Tokenization and normalization processes are accelerated, leading to overall performance gains.
  • Enhanced Query Performance

    • Vector calculations are expedited, improving the performance of semantic searches.
    • Scoring and filtering can be processed in parallel, reducing the response time for search queries.
  • Improved Scalability

    • Performance improves with large datasets suitable for SIMD, enhancing scalability.
  • Better User Experience

    • Increased performance allows for faster retrieval of relevant results, leading to higher user satisfaction.

For more information on semantic search and its configuration, please refer to the documentation for our cloud-based search engine service, KanadaSearch, in the section What is Semantic Search?.

Objective

Evaluate the impact of SIMD optimization on vector calculation performance

Specifically, the focus is on comparing performance with and without SIMD optimization, using short test cases that can be executed quickly to allow for a broader range of performance comparisons. The tests are designed around the following aspects:

  • How much does the performance of Solr vector calculations improve when SIMD optimization is applied in Java 20 and java 21?
  • Does upgrading from Java 11 to Java 21 improve the performance of Solr vector calculations?

The testing methodology involves increasing the number of concurrent executions and query loads in Solr, measuring response times and resource utilization (CPU/MEMORY/DISK I/O) to quantify and visualize performance comparisons based on the measurements obtained.

Conclusion

The performance of semantic search was measured using the following three Java versions, and the results were compared:

  • Case 1: Running Solr on Java 11.
  • Case 2: Running Solr on Java 21 (without vector optimization).
    • The --add-modules jdk.incubator.vector option is removed from the Java command.
  • Case 3: Running Solr on Java 21 (with vector optimization).
    • The Java command includes the --add-modules jdk.incubator.vector option by default in Solr.

The results indicated:

  • Enabling SIMD optimization in Solr’s semantic search resulted in a performance improvement of 10% to 30%.
  • Upgrading the Java version used by Solr from Java 11 to Java 21 did not show any performance improvement.

EC2 Instance Specifications and Solr Heap Size Used for Testing

EC2 Instance Processor vCPU Memory Storage Network Performance SOLR_JAVA_MEM On-Demand Hourly Rate
t4g.small Arm-based AWS Graviton2 2 2 GiB EBS Only Up to 5 Gigabit "-Xms512M -Xmx925M" $0.0168
t4g.medium Arm-based AWS Graviton2 2 4 GiB EBS Only Up to 5 Gigabit "-Xms512M -Xmx1920M" $0.0336
t4g.large Arm-based AWS Graviton2 2 8 GiB EBS Only Up to 5 Gigabit "-Xms512M -Xmx3909M" $0.0672
t4g.xlarge Arm-based AWS Graviton2 4 16 GiB EBS Only Up to 5 Gigabit "-Xms4G -Xmx8G" $0.1344
c7g.large Arm-based AWS Graviton3(Compute-Optimized) 2 4 GiB EBS Only Up to 12500 Megabit "-Xms512M -Xmx1920M" $0.0723
c7g.xlarge Arm-based AWS Graviton3(Compute-Optimized) 4 8 GiB EBS Only Up to 12500 Megabit "-Xms512M -Xmx3909M" $0.1445

Response Time: The time taken from issuing a request to receiving the response (in seconds)

  • The following conditions were applied during the evaluation:

    • The input vectors for semantic search queries consisted of 7367 types.
    • JMeter's loop count was set to 7367 for the test.
    • JMeter's concurrent user count was set up to 20 for the test.
    • TopK was set to 1000.
  • Comparison of Response Times (sec) with SIMD Optimization Enabled and Disabled

Comparison of Response Times (sec) with SIMD On or Off for All EC2 Instances

  • Results of Sorting Response Times for All EC2 Instance Types

Sorting of Response Times (sec) for All EC2 Instances with SIMD On or Off and Time Reduction Rate By Optimization

  • Discussion

    • As the number of CPU cores increases, the degree of parallelism in vector calculations also increases, improving the performance of semantic search.
      • Among the same number of requests, the instance with the shortest response time and fastest search was the computing-optimized instance c7g.xlarge.
      • When comparing t4g.xlarge (vCPU 4/MEM 16 GiB) to t4g.large (vCPU 2/MEM 8 GiB), the response time was reduced by half.
    • EC2's computing-optimized instances exhibited superior processor performance compared to general-purpose instances.
      • When comparing c7g.large (vCPU 2/MEM 4 GiB) to t4g.large (vCPU 2/MEM 8 GiB), the response time was reduced by 27%. Despite having the same number of CPU cores and t4g having double the memory, the processor performance of c7g.large proved to be faster.
        • The network performance of c7g.large is twice as fast as t4g; however, it did not impact this particular workload's network transfer volume.
      • When comparing c7g.xlarge (vCPU 4/MEM 8 GiB) to t4g.xlarge (vCPU 4/MEM 16 GiB), the response time was reduced by 17%.
    • Memory capacity and Solr heap size also contributed to semantic search performance, but not to the same extent as the CPU.
      • When comparing t4g.large (vCPU 2/MEM 8 GiB) to t4g.medium (vCPU 2/MEM 4 GiB), the response time improved by only 2%.
      • When comparing t4g.medium (vCPU 2/MEM 4 GiB) to t4g.small (vCPU 2/MEM 2 GiB), the response time improved by 15%, suggesting that 2 GiB of memory and 1 GiB of Solr heap is insufficient to handle the query load.
  • Comparison of Response Time Reduction Rates Achieved by SIMD Optimization

Comparison of the Rate of Response Time Reduction Achieved by SIMD

  • Discussion

    • The application of SIMD optimization confirmed an improvement in vector calculation performance.
      • When comparing Java 21 (with optimization enabled) to Java 21 (with optimization disabled), the response time was reduced by 10% to 30% (with an average reduction of 16%).
  • Comparison of Response Times (sec) with TopK

Comparison of Response Times (sec) with TopK

  • Discussion

    • As the topK value in semantic search increased, the response time also increased.
      • This suggests that a higher topK value leads to an increase in vector calculation load.
  • Comparison of Response Times (sec) with Concurrency

Comparison of Response Times (sec) with Concurrency

  • Discussion
    • As the number of concurrent users increased, the performance improvement became more significant.
      • The response time reduction for a single user executing on t4g.xlarge (vCPU 4/MEM 16 GiB) was 10%, while for five concurrent users, it was 15%.
        • Comparing the CPU usage of the Solr process using the top command, it was observed that only 1.5 vCPUs were used for one user, whereas all 4 vCPUs were utilized with five users.
        • The increase in parallel vector calculations for five users compared to one user likely contributed to the greater performance improvement.

QPS (Queries/sec): The number of requests processed per second

  • The following conditions were verified:

    • The input vectors for semantic search queries consisted of 7,367 types.
    • JMeter's loop count was set to 7,367 for verification.
    • JMeter's number of concurrent users was set to 20 for testing.
    • TopK was set to 1000.
  • Comparison of QPS with SIMD Optimization Enabled and Disabled

Comparison of QPS with SIMD On or Off for All EC2 Instances

  • Results of Sorting QPS for All EC2 Instance Types

Sorting of QPS for All EC2 Instances with SIMD On or Off

  • Discussion
    • As the number of concurrent executions increased, QPS also increased; however, once a maximum value was reached, it did not change further.

Differences in Semantic Search Performance by Java Version

  • No differences in vector calculation performance were observed based on Java version.
    • When comparing Java 21 (with vector optimization disabled) to Java 11, the response times were nearly identical.

System Resources

CPU Usage

  • Comparison of CPU Usage with SIMD Optimization Enabled/Disabled

Comparison of CPU Usages with SIMD On or Off

  • Discussion

    • There was little increase in CPU usage due to SIMD optimization.
  • Comparison of CPU Usage by Concurrency

Comparison of CPU Usages with Concurrency

  • Discussion
    • With one concurrent user, Solr used a maximum of 1.5 vCPUs.
    • As concurrency increased, CPU usage rose, reaching nearly 100% as concurrency matched the number of vCPUs.
    • If the CPU usage of the Solr process reaches 100%, it indicates that all CPU cores are fully utilized for vector parallel calculations.

Memory Usage

  • Comparison of Memory Usage with SIMD Optimization Enabled/Disabled

Comparison of Memory Usages with SIMD On or Off

  • Discussion

    • There was little increase in memory usage due to SIMD optimization.
  • Comparison of Memory Usage by Concurrency

Comparison of Memory Usages with Concurrency

  • Discussion
    • There was little increase in memory usage due to concurrency.
    • Significant performance degradation can occur due to swapping caused by memory shortages, so workloads were adjusted to prevent depletion of OS buffer cache during load testing.

Disk I/O Usage

  • Discussion
    • Throughout the load tests, there were no I/O wait states due to disk busy conditions, and I/O load did not become a bottleneck for search performance.

Considerations

Confirming that the Verification Server's Hardware and OS Kernel Support SIMD Optimization

In this verification, Amazon EC2's t4g.small instance was used. Below, I share the steps to confirm the support for SIMD optimization at the hardware and OS kernel level, using t4g.small as an example.

Confirming Hardware Support for SIMD Optimization

  • Processor information was checked using the lscpu command:
    • CPU architecture: aarch64
    • Vendor ID: ARM
    • Model name: Neoverse-N1
    • Flags: asimd (Advanced SIMD) was set.
  • Confirmed the support for SIMD by the Neoverse-N1 model from the ARM documentation About the Advanced SIMD and Floating Point Support:
  The Neoverse™ N1 core supports the Advanced SIMD and scalar floating-point
  instructions in the A64 instruction set and the Advanced SIMD and 
  floating-point instructions in the A32 and T32 instruction sets.

Confirming OS (Kernel) Support for SIMD Optimization

A C sample program for multiplying float arrays was compiled using gcc's compilation options -O3 (optimization level) and -S (assembly code generation). From the generated assembly code, SIMD multiplication instructions of vector types were confirmed.

fmul    v2.4s, v2.4s, v6.4s
... ...
fmul    v0.4s, v0.4s, v4.4s

About the Input Vector Set Used in Semantic Search Queries

Pre-preparation of Input Vectors and Queries

To accurately evaluate the performance improvement of Solr semantic search due to SIMD optimization, it is necessary to ensure that other unrelated processes do not consume resources during load measurement.

The 7,367 input vectors included in the semantic search queries were pre-computed (rather than calculated on the fly during semantic search) and used as queries for JMeter load testing.

Variations of Input Vector Arrays Used in Queries

The variations and data distribution of vector datasets significantly affect the effectiveness of SIMD optimization. When the dataset is evenly distributed, the benefits of parallel processing are maximized. Conversely, if the data distribution is skewed, processing may concentrate on specific elements, making overall performance improvement challenging.

As a test, I prepared only 100 types of vectors (768 dimensions) and repeatedly used them in semantic searches. However, the performance measurement results were unstable and difficult to evaluate, so I ultimately used 7367 types of vectors (768 dimensions) for load testing.

Test Execution Environment and Method

  • Load testing tools, JMeter and Solr, were placed on two instances within the same subnet of Amazon VPC to minimize the impact of network load.
    • This is because during semantic searches, a large amount of vector data is sent to Solr over the network, and network bandwidth could become a bottleneck, affecting performance metrics such as QPS and Response Time.
  • Load tests were conducted by varying the execution order of multiple test cases.
    • The execution order may influence performance measurements.
  • Solr was restarted each time a test case was executed.
    • This was to prevent issues from the reuse of Solr caches during semantic searches from hindering performance evaluation.
  • Important notes for using burst performance instances on Amazon EC2:
    • Burst performance instances have a certain CPU burst quota, so if high load causes the CPU frequency to exceed the baseline frequency for an extended period, the CPU operation may slow down.
      • If the CPU credits run out during verification, the burst (availability of CPU usage exceeding the baseline) will not function, potentially degrading Solr's search performance.
    • It is necessary to monitor the usage and balance of CPU credits for EC2 instances to ensure that the balance does not deplete.
      • CPU credit usage: The number of CPU credits spent during the measurement period.
      • CPU credit balance: The number of CPU credits that an instance has accrued. This balance is depleted when the CPU bursts and CPU credits are spent more quickly than they are earned.
    • There is also a method to set the CPU credit mode of burst performance instances to Unlimited.
      • The Unlimited mode allows the burst state to be maintained even when CPU credits are depleted, but additional costs are incurred per vCPU hour.

Test Results

Explanation of Various Items in the Result Matrix

  • Users: The number of concurrent executions (simultaneous users) indicated by JMeter's Num_threads value.
  • QPS: The number of queries processed per unit time, as indicated by JMeter's QPS value.
  • Response Time: The time taken from issuing a query to receiving a response (in seconds).
  • %CPU: The system's CPU usage percentage (measured by the vmstat command).
  • %MEM: The system's memory usage percentage (measured by the vmstat command).
  • %I/O: The system's Disk I/O usage percentage (measured by the iostat command).

Search Performance

Load Measurement Method

  • Measurement results when using 7367 types of vector values in JMeter queries.
  • The loop count for JMeter queries was fixed at 7367.
  • Load measurements were performed twice for different concurrent execution numbers and Java versions, and the average was calculated.
  • There are 72 verification patterns.

Performance Comparison Across EC2 Instances/Java Versions/Concurrent Executions (Resource Usage and Response Time Reduction Rate)

Id EC2 Instance JavaVer SIMD Optimization Concurrency QPS ResponseTime(sec) Total %CPU Max Solr %CPU Max Total %MEM Max Solr %MEM Max Total %I/O Max ResponseTimeReductionRate(%) By Number Of Concurrent Users ResponseTimeReductionRate(%) By Vector Optimization ResponseTimeReductionRate(%) By Ec2 Instance with small ResponseTimeReductionRate(%) By Ec2 Instance with CPU cores ResponseTimeReductionRate(%) By Java Version
1 t4g.small Java11 off 1 103 72 75 74 57 40 1 0 0 0 0 0
2 t4g.small Java11 off 5 174 212 100 99 60 44 1 41 0 0 0 0
3 t4g.small Java11 off 10 175 421 100 99 60 44 1 42 0 0 0 0
4 t4g.small Java11 off 20 179 823 100 99 61 45 4 43 0 0 0 0
5 t4g.small Java21 off 1 102 73 71 70 63 44 1 0 0 0 0 -1
6 t4g.small Java21 off 5 171 215 100 99 64 44 1 41 0 0 0 -1
7 t4g.small Java21 off 10 179 413 100 99 63 44 2 43 0 0 0 2
8 t4g.small Java21 off 20 180 821 100 99 65 44 15 44 0 0 0 0
9 t4g.small Java21 on 1 111 67 68 67 60 44 0 0 9 0 0 0
10 t4g.small Java21 on 5 201 184 100 99 60 44 0 45 15 0 0 0
11 t4g.small Java21 on 10 209 353 100 99 60 43 1 47 15 0 0 0
12 t4g.small Java21 on 20 215 686 100 100 63 45 48 48 16 0 0 0
13 t4g.medium Java11 off 1 110 67 79 79 28 20 1 0 0 7 7 0
14 t4g.medium Java11 off 5 173 214 100 99 29 21 0 36 0 -1 -1 0
15 t4g.medium Java11 off 10 177 418 100 99 31 23 0 38 0 1 1 0
16 t4g.medium Java11 off 20 177 834 100 99 33 25 1 38 0 -1 -1 0
17 t4g.medium Java21 off 1 108 69 77 76 29 21 2 0 0 5 5 -3
18 t4g.medium Java21 off 5 172 214 100 99 29 21 2 38 0 0 0 0
19 t4g.medium Java21 off 10 176 420 100 100 29 21 2 39 0 -2 -2 0
20 t4g.medium Java21 off 20 176 838 100 100 29 21 1 39 0 -2 -2 0
21 t4g.medium Java21 on 1 134 55 75 73 29 21 0 0 20 17 17 0
22 t4g.medium Java21 on 5 231 160 100 99 29 21 0 42 25 13 13 0
23 t4g.medium Java21 on 10 247 299 100 99 30 21 0 46 29 15 15 0
24 t4g.medium Java21 on 20 253 583 100 99 30 21 0 47 30 15 15 0
25 t4g.large Java11 off 1 127 58 77 77 15 11 4 0 0 19 13 0
26 t4g.large Java11 off 5 202 182 100 99 16 12 1 37 0 14 15 0
27 t4g.large Java11 off 10 209 353 100 99 17 12 1 39 0 16 16 0
28 t4g.large Java11 off 20 214 687 100 99 23 19 1 41 0 17 18 0
29 t4g.large Java21 off 1 130 57 76 75 14 11 1 0 0 22 17 2
30 t4g.large Java21 off 5 199 185 100 99 16 12 1 35 0 14 14 -2
31 t4g.large Java21 off 10 204 361 100 100 17 12 7 37 0 13 14 -2
32 t4g.large Java21 off 20 215 684 100 100 16 12 26 40 0 17 18 0
33 t4g.large Java21 on 1 143 52 76 75 15 10 1 0 9 22 5 0
34 t4g.large Java21 on 5 241 154 100 100 16 12 1 41 17 16 4 0
35 t4g.large Java21 on 10 247 299 100 100 16 12 77 43 17 15 0 0
36 t4g.large Java21 on 20 259 570 100 100 17 12 80 45 17 17 2 0
37 t4g.xlarge Java11 off 1 126 59 38 37 22 20 1 0 0 18 -2 0
38 t4g.xlarge Java11 off 5 401 92 100 99 22 20 1 69 0 57 49 0
39 t4g.xlarge Java11 off 10 417 177 100 100 22 20 1 70 0 58 50 0
40 t4g.xlarge Java11 off 20 443 333 100 100 23 20 1 72 0 60 52 0
41 t4g.xlarge Java21 off 1 126 59 39 38 31 29 1 0 0 19 -4 0
42 t4g.xlarge Java21 off 5 405 91 100 99 31 29 0 69 0 58 51 1
43 t4g.xlarge Java21 off 10 425 174 100 100 31 29 0 71 0 58 52 2
44 t4g.xlarge Java21 off 20 431 343 100 100 31 29 0 71 0 58 50 -3
45 t4g.xlarge Java21 on 1 140 53 40 39 31 29 0 0 10 20 -2 0
46 t4g.xlarge Java21 on 5 482 77 99 99 31 29 0 71 15 58 50 0
47 t4g.xlarge Java21 on 10 506 146 100 100 31 29 0 72 16 59 51 0
48 t4g.xlarge Java21 on 20 518 285 100 100 31 29 1 73 17 58 50 0
49 c7g.large Java11 off 1 187 39 100 99 33 25 3 0 0 46 33 0
50 c7g.large Java11 off 5 322 115 100 99 28 21 1 41 0 46 37 0
51 c7g.large Java11 off 10 324 227 100 99 34 28 1 42 0 46 36 0
52 c7g.large Java11 off 20 336 439 100 99 34 28 1 44 0 47 36 0
53 c7g.large Java21 off 1 73 46 72 71 29 21 82 0 0 37 19 -18
54 c7g.large Java21 off 5 312 118 100 100 28 21 1 49 0 45 36 -3
55 c7g.large Java21 off 10 337 219 100 100 28 21 1 52 0 47 39 4
56 c7g.large Java21 off 20 344 429 100 100 29 21 1 53 0 48 37 2
57 c7g.large Java21 on 1 187 39 70 69 30 21 1 0 15 41 25 0
58 c7g.large Java21 on 5 348 106 100 99 30 21 1 46 10 42 31 0
59 c7g.large Java21 on 10 373 198 100 100 30 21 1 49 10 44 34 0
60 c7g.large Java21 on 20 383 385 100 100 30 22 2 51 10 44 32 0
61 c7g.xlarge Java11 off 1 203 37 39 38 16 12 1 0 0 49 37 0
62 c7g.xlarge Java11 off 5 625 59 99 99 16 12 1 68 0 72 36 0
63 c7g.xlarge Java11 off 10 677 109 100 99 16 12 1 71 0 74 38 0
64 c7g.xlarge Java11 off 20 708 208 99 99 18 15 1 72 0 75 38 0
65 c7g.xlarge Java21 off 1 206 36 99 99 17 13 1 0 0 51 39 3
66 c7g.xlarge Java21 off 5 638 58 99 99 18 13 2 68 0 73 36 2
67 c7g.xlarge Java21 off 10 669 110 100 99 17 13 2 69 0 73 37 -1
68 c7g.xlarge Java21 off 20 693 213 100 99 18 13 1 70 0 74 38 -2
69 c7g.xlarge Java21 on 1 231 32 39 38 15 11 7 0 11 52 40 0
70 c7g.xlarge Java21 on 5 766 48 99 99 18 14 5 70 17 74 38 0
71 c7g.xlarge Java21 on 10 812 91 100 99 18 14 4 72 17 74 38 0
72 c7g.xlarge Java21 on 20 849 174 100 99 18 14 4 73 18 75 39 0

Verification Environment

Amazon EC2 Instances

To prevent competition for resources between JMeter and Solr, JMeter is installed on a separate server from Solr. However, to minimize the impact of network load on verification results, JMeter and Solr are placed on EC2 instances within the same VPC subnet.

  • EC2 instance for JMeter server:

    • t4g.small
  • EC2 instances for Solr server:

    • t4g.small
    • t4g.medium
    • t4g.large
    • t4g.xlarge
    • c7g.large
    • c7g.xlarge
  • OS: Ubuntu 20.04.6 LTS

Verification Procedure

Environment Preparation

Install Java 11 and Java 21 on the Solr instance and set it up for switching

$ wget -O- https://apt.corretto.aws/corretto.key | sudo apt-key add -
sudo add-apt-repository 'deb https://apt.corretto.aws stable main'

$ sudo apt-get update; sudo apt-get install -y java-11-amazon-corretto-jdk
$ wget -O - https://apt.corretto.aws/corretto.key | sudo gpg --dearmor -o /usr/share/keyrings/corretto-keyring.gpg && \
echo "deb [signed-by=/usr/share/keyrings/corretto-keyring.gpg] https://apt.corretto.aws stable main" | sudo tee /etc/apt/sources.list.d/corretto.list

$ sudo apt-get update; sudo apt-get install -y java-21-amazon-corretto-jdk
  • Switching Java Version
$ sudo update-alternatives --config java

There are 3 choices for the alternative java (providing /usr/bin/java).

  Selection    Path                                           Priority   Status
------------------------------------------------------------
* 0            /usr/lib/jvm/java-21-amazon-corretto/bin/java   12100004  auto mode
  1            /usr/lib/jvm/java-11-amazon-corretto/bin/java   11100024  manual mode
  2            /usr/lib/jvm/java-21-amazon-corretto/bin/java   12100004  manual mode

Press <enter> to keep the current choice[*], or type selection number: 1
update-alternatives: using /usr/lib/jvm/java-11-amazon-corretto/bin/java to provide /usr/bin/java (java) in manual mode

$ java --version
openjdk 11.0.24 2024-07-16 LTS
OpenJDK Runtime Environment Corretto-11.0.24.8.1 (build 11.0.24+8-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.24.8.1 (build 11.0.24+8-LTS, mixed mode)

Install the iostat command on the Solr instance

$ sudo apt install sysstat

Install the load testing tool JMeter on a separate server from Solr

  • Download JMeter
$ wget https://dlcdn.apache.org//jmeter/binaries/apache-jmeter-5.6.3.tgz
  • Unzip

    $ tar zxvf apache-jmeter-5.6.3.tgz
    
  • For testing, run the jmeter command with a report

    ./apache-jmeter-5.6.3/bin/jmeter -n -t test.jmx -Jusers=$1 -l jmeter-query.log -e -o ./jmeter-report
    
  • For testing, run the jmeter command without a report

    ./apache-jmeter-5.6.3/bin/jmeter -n -t test.jmx -Jusers=$1 -j jmeter-`date +'%Y%m%d%H%M%S'`.log
    

Solr Configuration

It is necessary to add field types and fields for semantic search in the Solr schema definition.

Two fields of the field type solr.DenseVectorField for 768-dimensional vectors were added.

  <fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="768" similarityFunction="dot_product"/>
  <field name="body_vector" type="knn_vector" indexed="true" stored="true"/>
  <field name="title_vector" type="knn_vector" indexed="true" stored="true"/>

Since the types of input vectors used in queries also affect the effectiveness of SIMD optimization for vectors, I pre-calculated 7367 types of vector values and prepared 7367 queries using these input vectors.

For more details on the Solr settings required for semantic search, please refer to the documentation of our cloud-based search engine service, KanadaSearch: Semantic search features of Apache Solr.

Solr Heap Size

The Solr heap size was set according to the specifications of the verification server as follows:

SOLR_JAVA_MEM="-Xms512M -Xmx925M"

Solr Log Settings

Since excessive logging during load testing can lead to system load, the log size was changed to the minimum necessary.

  • Set a smaller rotation size and generation count in: /var/solr/log4j2.xml

    • <SizeBasedTriggeringPolicy size="1 MB"/>
    • <DefaultRolloverStrategy max="1"/>
  • Change the log level to error and disable request logging in: /var/solr/solr.in.sh

    • Change log level: SOLR_LOG_LEVEL=ERROR
    • Disable request logging: SOLR_REQUESTLOG_ENABLED=false

Preparing Data for Indexing

We will use the Livedoor News Corpus (embeddings) data provided by the KandaSearch extension library. The 7367 documents already have 768-dimensional vectors assigned.

For details on semantic search using the Livedoor News Corpus (embeddings), please refer to the documentation for our cloud-based search engine service, KanadaSearch, Semantic search feature of KandaSearch.

Creating Solr Collection

We will create a Solr collection using the Livedoor News Corpus (embeddings) configuration provided by the KandaSearch extension library.

Indexing Vector Data

We will POST the vectorized Livedoor News Corpus (embeddings) data to Solr.

$ /opt/solr/bin/solr post -c livedoor ./data/livedoor_embeddings.json

Posting files to [base] url http://localhost:8983/solr/livedoor/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file livedoor_embeddings.json (application/json) to [base]/json/docs
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/livedoor/update...
WARNING: URLs provided to this tool needn't include Solr's context-root (e.g. "/solr"). Such URLs are deprecated and support for them will be removed in a future release. Correcting from [http://localhost:8983/solr] to [http://localhost:8983/].
Time spent: 0:01:30.047

Preparing Vector Queries

The input vectors for the semantic search queries will be extracted from all documents in the aforementioned Livedoor News Corpus (embeddings) data, totaling 7367 768-dimensional vectors (query creation may take several hours).

  • Extract the 7367 body_vector values from livedoor_embeddings.json.
% for i in {0..7366}; do command="jq -r '[.[] | .body_vector] | nth($i)' ../livedoor_embeddings.json > vector-$i.json" && eval $command ;done
  • Use the above vectors to create 7367 queries with the q parameter.
for i in {0..7366}; do echo "{!knn f=body_vector topK=10}"$(cat vector-$i.json | tr -d '\n' | tr -d ' ') >> vector-query-7367.csv ;done
  • Here’s a sample of the created queries.
{!knn f=body_vector topK=10}[-0.02621307782828808,-0.06952571868896484,-0.0034800975117832422,...(768 dimensions)]

Verifying Semantic Search Functionality

  • Issue a vector query.
$ curl -g 'http://localhost:8983/solr/livedoor/query?q={!knn%20f=body_vector%20topK=10}[-0.02621307782828808,-0.06952571868896484,-0.0034800975117832422,...(768 dimensions)]'
  • The following response will be returned from Solr.
{
  "responseHeader":{
    "status":0,
    "QTime":11,
    "params":{
      "q":"{!knn f=body_vector topK=10}[-0.02621307782828808,-0.06952571868896484,...
    }
  },
  "response":{
    "numFound":10,
    "start":0,
    "numFoundExact":true,
    "docs":[{
      "id":"movie-enter-5978741.txt",
      "url":"http://news.livedoor.com/article/detail/5978741/",
      "category":"movie-enter",
      "title":"【DVDエンター!】... ...",
      "title_exact":"【DVDエンター!】... ...",
      "title_2g":"【DVDエンター!】......",
  ... ...
  • From JMeter, specify different simultaneous user counts and loop counts to measure performance metrics such as QPS, Response Time, and system resource usage.

  • Performance measurements will be conducted for three Java versions: Java 11, Java 21 (vector optimization disabled), and Java 21 (vector optimization enabled).

  • To disable vector optimization for Java 21, the --add-modules jdk.incubator.vector option was commented out in the Solr startup command /opt/solr/bin/solr:

#  if [[ "$JAVA_VER_NUM" -ge "20" ]] && [[ "$JAVA_VER_NUM" -le "21" ]] ; then
#    SCRIPT_SOLR_OPTS+=("--add-modules" "jdk.incubator.vector")
#    echo "Java $JAVA_VER_NUM detected. Incubating Panama Vector APIs have been enabled"
# fi
  • To ensure that previous test case executions do not impact the current run (due to resource consumption or Solr cache), Solr will be restarted at the beginning of each new test case.

Reviewing Test Results

JMeter Performance Measurement Results

Resource Measurement Results from vmstat/iostat/top Commands

Checking Solr Logs

Verify that no errors occurred during the load testing period, particularly looking for OutOfMemory (OOM) errors or process kills due to memory shortages.

Example command to check for errors in Solr logs:

$ sudo grep -i err /var/solr/logs/* | grep -v " INFO "

Checking System Logs

Verify that no errors occurred during the load testing period, particularly looking for OutOfMemory (OOM) errors or process kills due to memory shortages.

Example command to check for errors in syslog:

$ sudo grep -i err /var/log/syslog

In conclusion

We conducted load testing using Apache Solr 9.7.0 and confirmed that SIMD optimization significantly improved the performance of semantic search (vector search). We also shared the verification procedures, precautions, insights, and observations from the load testing of semantic search, hoping it will be of assistance in your business activities.

お見積もり・詳細は KandaSearch チームに
お気軽にお問い合わせください。

お問い合わせ