Site icon Horizontal Blog

Configuring Solr to provide search suggestions

I needed to provide search term suggestions based on characters that the user has typed into the search box. Doing this is pretty easy with Solr, an open source enterprise search platform, powered by Java, Apache and Lucene.

If you’re using a version prior to 4.8, this can be accomplished using the SpellCheckComponent. See this document for details.

As of 4.8 a new component is available, the solr.SuggestComponent. This post will go through the steps to configure an index to provide search suggestions using this component. In my case I created a separate index to handle this, it could be combined into an existing index such as sitecore_web_index (or any other custom indexes you may be using), depending on what your needs are.

Define the schema for the index:

In order to create smaller documents I trimmed the fields down to the bare minimums. This is done in schema.xml.

[code language=”xml”]
<fields>
<field name="_content" type="text_general" indexed="true" stored="false" />
<field name="_database" type="string" indexed="true" stored="true" />
<field name="_uniqueid" type="string" indexed="true" stored="true" required="true" />
<field name="_name" type="text_general" indexed="true" stored="true" />
<field name="_indexname" type="string" indexed="true" stored="true" />
<field name="_version" type="string" indexed="true" stored="true" />
<field name="_version_" type="long" indexed="true" stored="true" />
</fields>
[/code]

Then I added two fields that will be used by the suggester. One to store the suggestion text and another to store the weight of that suggestion. The suggestion field should be a text type and the weight field should be a float type. Both need to be stored in the index. In this case these fields get their values form corresponding fields in our sitecore instance. These fields can be added to documents based on your specific indexing strategy.

[code language=”xml”]
<field name="term" type="text_general" indexed="true" stored="true" />
<field name="weight" type="float" indexed="true" stored="true" />
[/code]

Define a custom field type for the suggest component:

Next we need to add a new type that the suggester will use to analyze and build the suggestion fields. This particular type will remove all non alphanumeric characters and be case-insensitive as well as tokenizing the contents of the field. This is not strictly necessary, existing types may be used. Again, this is done in schema.xml.

[code language=”xml”]
<types>

<fieldType name="suggestType" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

</types>
[/code]

Define the suggest component for the index:

Now that we have the schema set up, we need to define a searchComponent that will do the suggesting. This is done in solrconfig.xml.

Add the following to the <config> node:

[code language=”xml”]
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">fuzzySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="storeDir">fuzzy_suggestions</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">term</str>
<str name="weightField">weight</str>
<str name="suggestAnalyzerFieldType">suggestType</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
<lst name="suggester">
<str name="name">infixSuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="indexPath">infix_suggestions</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">term</str>
<str name="weightField">weight</str>
<str name="suggestAnalyzerFieldType">suggestType</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
</searchComponent>
[/code]

lookupImpl

In this case we’re setting up a suggest component that has two suggester data sources available to it.

Using a combination of methods, we can get more complete results. Additional suggester implementations are available:

See the Suggester Documentation for more details on the different types of Lookup Implementations. They each have properties unique to their implementation.

storeDir and indexPath

These parameters define the directory where the suggester structure will be stored after it’s built. This parameter should be set so the data is available on disc without rebuilding.

field

The field to get the suggestions from. This could be a computed or a copy field.

weightField

As of Solr 5.1 this field is optional. In previous versions this field is required. If no proper weight value is available, a workaround is to define a float field in your schema and use that. Even if this field is never added to a document the code will compensate.

threshold (not used in this example)

A percentage of the documents a term must appear in. This can be useful for reducing the number of garbage returns due to misspellings if you haven’t scrubbed the input.

suggestAnalyzerFieldType

This parameter is set to the fieldType that will process the information in the defined ‘field’. I suggest starting simple and adding complexity as the need arises.

buildOnStartup and buildOnCommit

Building the suggester data involves re-reading, decompressing and and adding the field from every document to the suggester. These two settings should both generally be set to “false”. On Startup happens every time Solr is started. On Commit happens every time a document is committed. In the case of a smaller list of potential suggestions, the latter is acceptable.

Define a requestHandler for the Suggest Component

[code language=”xml”]
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.dictionary">infixSuggester</str>
<str name="suggest.dictionary">fuzzySuggester</str>
<str name="suggest.onlyMorePopular">true</str>
<str name="suggest.count">10</str>
<str name="suggest.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
[/code]

The “name” of the requestHandler defines the url that will be used to request suggestions. In this case it will be http://”localhost”:8983/solr/index_name/suggest. Your port number may be different.

The requestHandler definition contains two parts:

defaults

These are settings that you would like to apply to each request. They may be provided in the querystring if different values are necessary.

Multiple “suggest.dictionary” values may be used. Each one will have it’s own section of results. The values are the names of the suggesters that were defined in the Suggest Component.

components

The name of the Suggest Component is set here. This connects the handler to the component.

See the documentation for more details on configuring search components and request handlers.

Actually getting suggestions

Once all of this is set up, using it is very simple. Assuming a solr index url like this:
http://localhost:8983/solr/index_name

Response Format:

[code language=”js”]
{
suggest: {
suggester_name: {
suggest_query: { numFound: .., suggestions: [ {term: .., weight: .., payload: ..}, .. ]}
}
}
[/code]

I hope you find this information useful. See the Suggester documentation for more details.

Thanks for reading!

Exit mobile version