Blog Post - Remko Strengholt, May 25 2018

Introducing Solr - and why you need it

Introducing Solr - and why you need it

Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene™. Don’t take it from us. These are the words of the developers of Solr themselves, Apache!

OK, so Solr (pron. ‘solar’) is more than a search engine, but viewed within the context of a search engine, are we talking sunny side up or dark side of the moon? Let’s take a closer look.

What’s wrong with searching a conventional database?

Most of us understand the basics of a search engine. You type in your query, hit enter and up pop the results that are the best match, in a descending order. How the strength of the match is calculated depends on a number of factors, but generally a higher score means the document is more relevant to the query.

In an SQL query to a relational database, a row either matches a query or it doesn’t, and results are sorted based on one or more of the columns. This means it’s very static and simple; it will return matches, and from these results, a user needs to decide which one best fits their query.

You might be thinking that a relational database could easily return the same results using an SQL query, which is true for a simple query. But one key difference between a Solr query and a database query is that in Solr results are ranked by their relevance to a query, and database results can only be sorted by one or more of the table columns.

So how does Solr help?

There are two sides that need to be considered while working with Solr: the index process and the query. On both sides smart and helpful functionalities can be introduced to match documents and queries more successfully.

Solr uses an inverted index. While a traditional database representation of multiple documents would contain a document’s ID mapped to one or more content fields containing all of the words/terms in that document, an inverted index inverts this model and maps each word/term in the corpus to all the documents in which it appears. It is worth noting that many additional text transformations are possible, not only these simple ones; terms can be modified, added, or removed during the content-analysis process.

On the other side you have the query. Solr helps the user to find what he or she is looking for. Relevant results must be returned quickly, within a second or less in most cases. Mistakes are often made, so spelling correction is needed in case the user misspells some of the query terms. Synonyms of query terms must be recognized, so all relevant items are found. And the same goes for the fact that documents containing linguistic variations of query terms must also be matched. Queries containing words not relevant to the actual query need to be filtered out, such as “a,” “an,” “of,” and “the”.

Search engines like Solr shine in solving such problems. Solr is able to perform text analysis on content and on search queries to:

  • determine textually similar words
  • understand and match on synonyms
  • remove unimportant words like “a,” “the,” and “of,”
  • score each result based upon how well it matches the incoming query to ensure the best results are returned first, and that your customers do not have to page through countless, less relevant results to find the content they were expecting.

Solr accomplishes all of this by using an index that maps content to documents instead of mapping documents to content as in a traditional database model. This inverted index is at the heart of how search engines work.

Besides that, for eCommerce solutions it brings even more functionalities:

  • Pagination and sorting
  • Faceting
  • Autosuggest
  • Linguistic variations, such as “bags” versus “bag”
  • It doesn’t understand synonyms of words such as “bag” and “case”
Advantage of Solr

Search engines like Solr are not general purpose, data storage and processing solutions; instead, they are intended to power keyword search, ranked retrieval, and information discovery. They don’t replace your conventional database, but they enhance it - especially when working with thousands or even millions of documents available for the user to query, to create an optimal experience you can’t live without. Of course, there is a standard Solr implementation but there are numerous ways to improve and optimize your Solr engine. We will talk about this in a later post.