Reciprocal Rank Fusion (RRF) is coming to Solr! | Blog | KandaSearch

BLOG

  • Top
  • Blog
  • Reciprocal Rank Fusion (RRF) is coming to Solr!

Reciprocal Rank Fusion (RRF) is coming to Solr!

Author: Elpidio Gonzalez Valbuena

  

Posted: November 27, 2025

    
  • Solr

Hybrid search refers to combining traditional keyword search with vector-based semantic search in a single retrieval and ranking pipeline. Solr supports both sides: BM25 and other classical scoring models for lexical matching, and knn queries over solr.DenseVectorField for vector similarity. Hybrid retrieval aims to take the best of both: the precision and user expectations around exact keywords, and the robustness and semantic generalization provided by modern embedding models.

There are many patterns and trade-offs involved in doing this well in production. Rather than repeat them here, readers who want a deeper introduction can refer to our previous post, "Combining Keyword and Semantic Searches", where we walk through the motivations, basic architectures, and some common pitfalls. That post focused on what hybrid retrieval looks like conceptually in Solr and why it matters for real applications. This one picks up the story from there and looks at what has changed around hybrid search in Solr itself.

Revisiting Solr Current Features and Limitations

In his article "Hybrid Search with Apache Solr", Alessandro Benedetti stresses that Solr already supports hybrid search in practice. His recipe is to retrieve two candidate sets, one from lexical queries and one from knn vector queries, and then combine them inside Solr using existing features: the Boolean Query Parser to union or intersect candidate sets, function queries to normalize and add or multiply scores, and Learning To Rank (LTR) models when you want a data-driven combination of lexical and vector features. This shows that hybrid retrieval in Solr is not just hypothetical, it can be used in real systems today.

At the same time, the process still involves manual score normalization, additional function queries, and in some cases external training of LTR models. These are powerful tools, but they also raise the bar for teams that want a straightforward way to combine lexical and vector rankings without designing a bespoke fusion strategy for every use case.

In the earlier blog post, we described several limitations in Solr and Lucene around "out of the box" support for advanced hybrid retrieval. The key points were:

  • There is no native score fusion mechanism to intelligently merge lexical and vector rankings.
  • BM25 scores and vector similarity scores live on different scales, which makes naive score summation unreliable.
  • Without a first-class fusion component, it is harder to understand and explain why documents were ranked in a particular order.

Reciprocal Rank Fusion is, in many ways, the community’s bet to address these shortcomings: a simple, well understood fusion strategy that Solr can offer natively so that hybrid retrieval does not always require custom score engineering.

Reciprocal Rank Fusion (RRF) is a simple but powerful way to merge multiple ranked result lists into a single, coherent ranking. In the Solr context, you can think of those lists as coming from different "experts" in your search stack: a lexical expert (BM25 or other keyword queries), and one or more vector experts (k-nearest neighbor queries). Instead of trying to compare raw scores that live on incompatible scales, RRF ignores the score values and focuses on ranks in each list. Documents that appear high in several rankings are rewarded, while documents that show up only once, or far down the list, are naturally down-weighted. This is why RRF has become a standard tool in information retrieval for combining rankings from multiple systems and has been shown to work well in large-scale evaluations such as TREC 1.

Formally, each document receives a contribution of 1 / (k + rank) from every list where it appears, where rank is its position in that list and k is a small constant that controls how quickly the influence of lower ranks decays (Usually k = 60). Summing these contributions across all lists produces a single RRF score that captures agreement between the different retrieval strategies.

This approach is attractive for Solr because it does not require any score normalization between BM25 and vector similarities. It only needs the ranked positions, which Solr already provides for each query.

Implementing this in Solr is a first step toward overcoming the hybrid search shortcomings explained in our previous article on combining keyword and semantic searches. For users of hybrid retrieval, RRF promises more stable relevance, fewer surprises when score distributions change, and a more transparent story about why a particular document surfaced near the top.

Early Attempts and Current Status

Our previous post mentioned that "However, development of that contribution is currently inactive and would require renewed community or commercial support to proceed." That line referred to Alessandro Benedetti’s first attempt to bring RRF into Solr under SOLR-17319, implemented as PR #2489. It was the initial concrete step toward native RRF support and showed that the idea could work inside Solr’s existing JSON Request API.

At a high level, Alessandro’s patch wired RRF into Solr by letting users send multiple sub queries in a single JSON request and have Solr combine the result lists into one ranking. Functionally this was attractive, but it lived directly inside core query handling code and was explicitly targeted at single shard scenarios, with only basic support for distributed search. That raised concerns among Solr maintainers about long term maintenance and where experimental features should live.

Alessandro agreed that moving toward the cleaner design would require substantial additional work that he did not have the time or funding to complete, and he later described the effort in conference material as "work in progress - paused, waiting for funding/time". 2

With no further updates, the PR went stale and was automatically closed by the GitHub bot in March 2025.

In other words, the first RRF implementation shaped the discussion and exposed important edge cases, but it never reached the level of architectural consensus and sustained maintainer attention needed to ship.


The next serious push toward native RRF support comes from Sonu Sharma’s "combined query" work in PR #3418. The goal is to let a single JSON request describe several sub-queries of different kinds (for example, a BM25 keyword query plus one or more knn vector queries), have Solr execute them across all shards, and then merge the result lists using a configurable algorithm (such as RRF, which is the only one currently implemented).

To achieve this goal, the patch introduces a dedicated CombinedQueryComponent and a matching CombinedQueryResponseBuilder, rather than adding more logic into the existing QueryComponent. These new pieces know how to fan out the sub-queries, collect their results, and then apply the fusion step, while the JSON Query DSL gains an explicit "combined query" entry point to trigger this behavior.

Architecturally, the work went through a couple of iterations on how to handle distribution. The initial design experimented with two strategies: one that applied RRF independently on each shard as results came back, and another that deferred fusion until the coordinating node had a global view of all shard results.

A configuration parameter was used to choose between them. During review, David Smiley argued that the per-shard approach would quickly degrade into "shard interleaving" as the shard count grows, because the coordinator would no longer have a meaningful global ranking signal to work with, even when one document was clearly best overall. Sonu responded by simplifying the implementation, removing the per-shard mode and keeping only the coordinator-level fusion so that the final RRF ranking is computed once, with all shards taken into account.

Sonu’s PR comes with unit tests for the RRF logic, distributed tests for SolrCloud, and is wired into the usual ./gradlew check workflow. It has already seen detailed reviews from Solr committers Christine Poerschke and David Smiley, among others, and continues to evolve in response to their feedback.

At the time of writing it remains open, but it represents a much more modular and distribution-aware design than the first attempt and has become the focal point of current community discussion on how RRF should land in Solr. With SOLR-17319 sitting in “Patch Available” status and listed on the Solr dense vector focus group’s roadmap, this work now looks less like an experiment and more like a feature waiting for its final round of review and polish before it can be merged.

At KandaSearch, we see this work as strategically important for making hybrid search more accessible to Solr users, and we want to contribute in concrete ways rather than just cheering from the sidelines. Our development team has started applying the patch, running Combined Query + RRF against realistic workloads, and preparing to share observations back with the community. In the next article, I will walk through that process step by step: how we built and ran the branch locally, the types of hybrid queries we issued, and what we learned from the early experiments. The goal is to turn this promising design into something tangible that Solr users can evaluate and, ultimately, help move it closer to inclusion in a future Solr release.

For estimates and details,
please feel free to contact our development team.

Contact Us
TOP