On User Preferences

When users use some application, they often expect it to adapt to their preferences. It works for the search engine too. Search user preferences can be set manually or calculated automatically. In any case, this adaptation has to improve the user experience and simplify the search process.

Solution

There are two types of search user preferences we will check: manual and automatic.

Manual preferences are configuration options that the user can manually define, and then the application has to take them into account during the search. Let us check what options can be set here.

View configuration covert the looks and feel of the search results page. It may include page layout (grid or table, image size), the default language, and additional information about the found results.
Boosting preferred content usually makes sense if the user knows what he is looking for and wants to see this content at the top of the list. Such a boost may be done by categories, content type, tags, date, or custom parameters.
Excluding useless content is the opposite of the boost of preferred content. The user can manually select some parameters (content type, category, date, tags) he does not want to see.

Automatic preferences are configuration options automatically calculated by the application based on user behavior, view history, etc. Here are typical examples of such preferences.

Boosting most viewed content can be enabled to provide a better user experience. If a user views some article a lot, it may make sense to boost it next time he looks for something similar.
Boosting content based on parameters is similar to boosting preferred content. One significant difference is that the list of preferred parameters (categories, tags, etc.) is calculated automatically based on user behavior and view history.

One more important thing to remember is that manual preferences always have to override automatic preferences. Enabling automatic preferences by default is an excellent approach to improve user experience. Still, if the user has manually changed his preferences, the application has to skip automatic preferences and use only manual ones.

Implementation

Let us quickly go through what we can do to apply user preferences to search engine behavior.

Boosting some content (preferred, most viewed, recommended) is essential for the search engine. Luckily, this topic is already fully covered in the scope of the Boosting Search Results article. There you can find detailed examples of boosting some content based on your custom parameters. In our case, the parameters are manual and automatic user preferences.

View configuration is not a part of the search engine, but you can manually select some data from the search engine and show it to the customer. The easiest way to do that is to use the fields option; you just need to tell the search engine what fields you want to get together with the search results.

Excluding useless content is the third option, and we that we never touched it before. It is just an additional condition that filters some unnecessary results. There are two typical ways to organize it: a term query or match query and a bool query for negation.

So, let us create a simple index with an appropriate mapping. The mapping contains three fields: title, data, and tags. The title will be used to demonstrate how to get data from the index together with results, data will be used for full-text search, and tags will be used to exclude useless content. And let us add some sample data to demonstrate all these features.

curl -X PUT "localhost:9200/user-preferences"
curl -X PUT "localhost:9200/user-preferences/_mapping" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "title": {
      "type": "text",
      "analyzer": "english"
    },
    "data": {
      "type": "text",
      "analyzer": "english"
    },
    "tags": {
      "type": "text",
      "analyzer": "english"
    }
  }
}
'
curl -X PUT "localhost:9200/user-preferences/_doc/1" -H 'Content-Type: application/json' -d'
{
  "title": "Apple",
  "data": "Apple An apple is an edible fruit produced by an apple tree (Malus domestica). Apple trees are cultivated worldwide and are the most widely grown species in the genus Malus. fruit sweet red yellow",
  "tags": ["apple", "fruit", "red", "yellow", "sweet", "tree"]
}
'
curl -X PUT "localhost:9200/user-preferences/_doc/2" -H 'Content-Type: application/json' -d'
{
  "title": "Orange",
  "data": "Orange The orange is the fruit of various citrus species in the family Rutaceae (see list of plants known as orange); it primarily refers to Citrus × sinensis,[1] which is also called sweet orange, to distinguish it from the related Citrus × aurantium, referred to as bitter orange. fruit sour orange",
  "tags": ["orange", "fruit", "sour", "tree"]
}
'
curl -X PUT "localhost:9200/user-preferences/_doc/3" -H 'Content-Type: application/json' -d'
{
  "title": "Banana",
  "data": "Banana A banana is an elongated, edible fruit – botanically a berry[1][2] – produced by several kinds of large herbaceous flowering plants in the genus Musa.[3] In some countries, bananas used for cooking may be called \"plantains\", distinguishing them from dessert bananas. fruit sweet yellow",
  "tags": ["banana", "fruit", "sweet", "yellow", "palm"]
}
'

Now we can write a query. There are two important things here. First, the bool query with must_not clause excludes documents containing information about sour fruits. Second, the fields section is used to retrieve the title and tags from the search index together with the results.

curl -X POST "localhost:9200/user-preferences/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool" : {
      "must_not" : [
        { "term" : { "tags" : "sour" } }
      ]
    }
  },
  "fields": [
    "title",
    "tags"
  ],
  "_source": false
}
'

So, let us look at the results. We can see that orange fruit has been excluded as it is marked with the sour tag, and results contain additional fields title and tags.

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "user-preferences",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.0,
        "fields" : {
          "title" : [
            "Apple"
          ],
          "tags" : [
            "apple",
            "fruit",
            "red",
            "yellow",
            "sweet",
            "tree"
          ]
        }
      },
      {
        "_index" : "user-preferences",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.0,
        "fields" : {
          "title" : [
            "Banana"
          ],
          "tags" : [
            "banana",
            "fruit",
            "sweet",
            "yellow",
            "palm"
          ]
        }
      }
    ]
  }
}