ftopia Blog

Find out what's been going on with us and cloud storage

Switching to Solr

This week we have migrated our search engine from Sphinx to Solr. The main reasons for doing this are:

Both Sphinx and Solr are full-text search engines. Sphinx has been designed for performance, relevance, and ease of integration. It is written in C++ and runs on most systems. Solr is based on the Lucene engine; its main strengths are its relevance and its extensibility.

Sphinx and Thinking Sphinx

The ruby gem that we have been using with Sphinx is Thinking Sphinx. Thinking Sphinx has many upsides:

However some of these arguments turn out to be not so cool when the data volume or load increases:

Solr and Sunspot

Solr relies on the Lucene Apache engine and is written in Java. This might not sound that cool to a number of coders, but Solr includes scripts and tutorials that make it a breeze to install the server. In a dev env, a simple script is enough to boot the server.

The ruby implementation of Solr is Sunspot. Very similar to the Sphinx and Thinking Sphinx pair, but without Sphinx’s main drawbacks:

Sunspot also has a nice upside for the developer: since classes are reloaded upon each request, the code that describes the index is executed upon each request too. In our app with Thinking Sphinx, this code takes more than 1000ms to run in a dev env – it’s much faster with Sunspot, about 40ms with the same indexes! In production, classes are cached upon the first execution, therefore there’s almost no difference between the two search engines.

It’s also worth mentioning that Sunspot is slightly different regarding the way the search is coded: instead of using a traditional call to a method with search params, Sunspot uses a ruby block describing an elegant DSL. Much better for code readability:

Post.search do
 fulltext 'best pizza'
 with :blog_id, 1
 with(:published_at).less_than Time.now
 order_by :published_at, :desc
 paginate :page => 2, :per_page => 15
 facet :category_ids, :author_id
end

Performance and next steps

Compared to Sphinx, Solr’s main drawback is search speed. But Solr’s performances can be greatly improved by clustering Solr servers.

We are much more confident about the scalability of our architecture now that the search engine runs independently of our asynchronous queue management system.

The next step is to exploit more of Solr capabilities and apply full-text indexing to all the textual content stored on ftopia.

Leave a Reply