zen of coding

algolia search

Case study: building a better search with Algolia and Lumen PHP

I can’t believe it’s been more than seven years since I last wrote about building a search feature.

Needless to say that over the years the landscape of available tools and technologies has changed quite a bit. Hands down one of the major players today is Elasticsearch or the entire ELK stack. It has been powering many applications and services in the indexing and search space.
It is often used to collect log files via Logstash and visualize the data using Kibana. However, the stack is quite capable of other feats, such as text search via Lucene as well as geo search capabilities.

Side note

In my other post I talked about the destruction of the MVC monolith.

Today, as part of this case study, I’d like to show another example. Two needs drive such decoupling: improving the underlying platform (application) as well as ever-changing and growing business requirements.
As an architect or developer you should look for solutions that will fit your needs. At the same time simplifying the application logic and code complexity. Ensuring separation of concerns to keep your foundation as clean and efficient as possible.
From the business perspective your goal is to lower development costs and speed up time to market.

Back to the future

Recently I was working on a project where the application was already built and the search was “working”, so we already had some specific, established requirements (not that different from many other applications):

  • Filtering – based on product attributes
  • Sorting – based on price, for example
  • Geo-search – based on warehouse location
  • Text search – based on product description or name

The application used MySQL as the main storage engine. To deal with each piece of logic we had to create various PHP classes and services.
Unfortunately, the code was messy, the functions were gigantic and it was not unusual to see things like the code snipper below:

// WTF is this?
if (isset($params['northeastlat']) && isset($params['northeastlon']) && isset($params['southwestlat']) && isset($params['southwestlon']) && $params['northeastlat'] != "" && $params['northeastlon'] != "" && $params['southwestlat'] != "" && $params['southwestlon'] != "" && !empty($params['northeastlat']) && !empty($params['northeastlon']) && !empty($params['southwestlat']) && !empty($params['southwestlon']) && $params['northeastlat'] != "undefined" && $params['northeastlon'] != "undefined" && $params['southwestlat'] != "undefined" && $params['southwestlon'] != "undefined") {
            $this->view->northeast = array('lat' => $params['northeastlat'], 'lon' => $params['northeastlon']);
            $this->view->southwest = array('lat' => $params['southwestlat'], 'lon' => $params['southwestlon']);
            if (isset($params['latMapCenterValue']) && isset($params['lngMapCenterValue']) && $params['latMapCenterValue'] != "" && $params['lngMapCenterValue'] != "" && !empty($params['latMapCenterValue']) && !empty($params['lngMapCenterValue'])) {
                $this->view->latMapCenterValue = $params['latMapCenterValue'];
                $this->view->lngMapCenterValue = $params['lngMapCenterValue'];
                $this->view->midpoint = array("lat" => $params['latMapCenterValue'], "lon" => $params['lngMapCenterValue']);
            } else {
                $this->view->latMapCenterValue = '';
                $this->view->lngMapCenterValue = '';
            }

(Ouch).

Spread throughout the controllers, services and models this kind of horror made the application unmaintainable.

Yes, there was MVC in place. No, it was not a proper implementation of the pattern nor SOLID principles by any means.

The code which dealt with filters was not prettier. There was front-end (Javascript) logic to toggle various UI elements, as well as server-side classes to handle filter persistence in the session and to modify the query parameters based on certain selections made by the user.

We are talking about roughly 20,000 (yes, twenty thousand) lines of code, which comprised the search capability.

For geo location searches, a good ol’ formula (which calculates distance based on earth’s diameter) was used…

if ($radiusSearch == TRUE && isset($criteria['locSearch']['city']) && is_numeric($criteria['locSearch']['city']) && $criteria['searchBy'] != 'products') {
            $order = ", (CASE WHEN ad.stateName LIKE :product THEN 2
                WHEN ad.cityName LIKE :product THEN 1
                WHEN ad.zip LIKE :product THEN 0
                ELSE 3
                END) AS HIDDEN ORD "
;
        }
$distance = ", (
            3964 * ACOS(
              COS(RADIANS(:lat)) * COS(RADIANS(ad.lat)) * COS(
                RADIANS(ad.lng) - RADIANS(:lng)
              ) + SIN(RADIANS(:lat)) * SIN(RADIANS(ad.lat))
            )
          ) AS HIDDEN distance"
;

I’ve counted approximately seventy (yes, 70) of such if/else statements in a single function.

As one might suspect, at the end of all the “iffin’ and elsin'” and looping and grinding, this magic box produced results in about 6 seconds … on a good day. Over twenty thousand lines of code, and not much to show for it except unmaintainable application and horrible user experience.

Use the right tool for the right job

MySQL is a solid RDBMS, but it’s not suited to handle complex searches, gigantic JOIN’s, distance calculation on the fly, that is, if you would like to get your query results to return reasonably fast.

Elasticsearch was something that pretty much immediately came to mind, as I’ve mentioned in the introduction. While all of the requirements for speed, geolocation and indexing could be tackled with this particular product, we still had to rewrite a lot custom business logic in order to accomplish filtering, querying by various attributes, geo-search logic, bounding boxes, etc., etc.

Even though with ES we would achieve much better performance, the necessary business logic still had to be written or refactored. Whatever algorithms and rules we’d use to get the results still had to be imagined.

Additionally we faced the same challenges as you would when building out any solution. Infrastructure had to be rolled out and scalability would be important as the service grew in popularity. Failover, security, monitoring, would also need to be implemented.

Drumroll… Enter Algolia

All the above seemed like a necessary and noble effort. Yet, my desire was to ease some of the work by using a hosted (or SaaS-like) ELK stack. During the research of various providers I was pleasantly surprised by quite a few players out there that offered such a service, but most were either very expensive, had limits on data retention or focused on log processing and visualisation as the core of their service.

In my research I had, thankfully, come across a truly awesome platform called Algolia.

Here’s the summary from their site:

The Easiest Way To Build All Kinds of Search Experiences

Couldn’t have said it better. When looking for a new search solution a tagline like this would pique one’s interest.

The big picture
algolia, lumen php, integration, workflow

  1. Whenever product record is created or updated an API endpoint is called and product ID (123) is sent to the API.
    Example of how it’s done in ZF2 MVC:

    $this->doRequest('POST', sprintf(self::INDEX_PRODUCT, $id), []);
  2. Based on the received ID, the product record and any related attributes are retrieved from MySQL. Then converted into JSON format expected by Algolia and indexed via Algolia API. The whole code snippet to handle the interaction is pasted below and can be understood even by a novice programmer.
  3. The search results now come from the Algolia data store which is very fast and scalable. The queries are made via AJAX using the Algolia API.

Initial steps

After spending some time with the docs and looking at the examples, it was clear that this service could solve our needs and then some. Still, I had to see it for myself. I needed to test drive this thing, aka roll up my sleeves and write some code to see what’s possible.

First, it was necessary to index the product data from our MySQL store to Algolia. This was actually as easy as copying and pasting a tiny PHP script from their web site and adjusting the query to my needs.

I only needed a few hundred products to get the idea of what’s possible.

About ten minutes later the data was in Algolia’s index.

I could immediately search for the imported data via the Algolia dashboard/UI.
Numeric attributes like “length” were auto-magically treated as filter widgets. Bad-ass. Not only is this a very cool feature, but it immediately proved its value. One of the imported products had a length of 460 feet (which is rather impossible for a pencil).

Since the location of each product could be in a different warehouse I had to test the geosearch feature. For this purpose I created an app.html file in my local environment, copied the JS code from one of their examples, replaced application token/keys/names for my own… and voila! Merveilleux! (Algolia is based out of Paris).

Within an hour or so I had an albeit rudimentary, but nonetheless functional search engine.

Here’s the app.html, which I have borrowed.

And here is the Javascript that drives most of the logic.

There are a few interesting points to note. The initial setup requires binding between the DOM elements and our JS variables.

  // DOM and Templates binding
  var $map = $('#map');
  var $hits = $('#hits');
  var $searchInput = $('#search-input');
  var hitsTemplate = Hogan.compile($('#hits-template').text());
  var noResultsTemplate = Hogan.compile($('#no-results-template').text());

  // Map initialization
  var map = new google.maps.Map($map.get(0), {
    streetViewControl: false,
    mapTypeControl: false,
    zoom: 4,
    minZoom: 3,
    maxZoom: 12,
    styles: [{stylers: [{hue: '#3596D2'}]}]
  });
  var fitMapToMarkersAutomatically = true;
  var markers = [];
  var boundingBox;
  var boundingBoxListeners = [];

A wealth of attributes is available to tweak the map-based search for your specific needs.
Although not a part of this example, they also have many UI widgets to allow for filtering (or faceting).

Here’s what a color filter setup might look like:

search.addWidget(
  instantsearch.widgets.refinementList({
    container: '#colors',
    attributeName: 'colors',
    operator: 'or',
    limit: 10,
    templates: {
      item: facetTemplateColors,
      header: '<div class="facet-title">Colors</div class="facet-title">'
    }
  })

This snippet uses Algolia’s instantsearch.js library.

Building out the platform

By now it was a no-brainer, we had to replace our ugly search mechanism with the Algolia service. The basics were simple:

  • Index or sync the data when product model/record changes
  • Replace search and filtering logic by the UI widgets and pure front-end/JS code

For syncing our data with Algolia we’ve decided to use Lumen PHP (a micro-framework sister of Laravel’s MVC). The sync required only some JSON data traveling back and forth between our system and Algolia, so Lumen seemed like a great choice.
We did need some DB interaction. An excellent and lightweight implementation of Active Record via Laravel’s Eloquent ORM seemed to be what we needed.

    /**
    * @param $id int record to index
    * @return Bool
    */

    public function indexRecord($id)
    {
        $product = $this->product->getObjectById($id);

        if (!empty($product)) {
            if (isset($product['status']) && $product['status'] === 1) {
                return $this->AlgoliaIndex->saveObject($boat);
            } else {
                return $this->AlgoliaIndex->deleteObject($id);
            }
        }

        return json_encode(['error' => 'No records found for indexing.']);
    }

The

indexRecord()

method is the controller action, which gets triggered by the POST to the end point in step 1 of the “big picture”.

getObjectById()

is a Model method, which performs the query using Eloquent to get all the product data. In the end it returns the data in a JSON format which Algolia expects.
Install the Algolia client via “composer”, making the

AlgoliaIndex::saveObject() and AlgoliaIndex::deleteObject()

methods available. They have libraries available for many languages and platforms.

Front end

In the examples of the javascript code, I showed that initial setup and DOM mapping is rather simple. Yet, it took a fair amount of work to convert the whole search UI into a SPA (single page application).
This part was a little more difficult to be fair. We shifted away all the query-processing logic from our application and database to Algolia. Yet, there was a need to refactor the filtering and UI capabilities from the ground up.

With a set of simpler business requirements we could’ve done it with Algolia’s native JS widget/search library.
(By the way since I first started using this service about a year ago, Algolia introduced some awesome UI tools and their eco system of integrations and plugins has grown quite a bit).

Was it worth it?

Well, we’ve reduced 20K and 3K of PHP and Javascript code to about 2K of JS code in total.

Prior, at least two developers had to maintain the search feature. Now we’ve shifted vast majority of the responsibility to a single front-end engineer.

All these stats might be impressive to me as a developer or a product owner, but it was the end user experience, where the most dramatic shift took place.

The results are now coming back in milliseconds. Some issues pop-up here and there, but people can actually find what they are looking for. This, of course, means higher conversion ratios for the business.

I’d say it’s a win-win situation.

%d bloggers like this: