Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 3550

Full-text Search with Solr

$
0
0

1. Overview

In this article, we’ll explore a fundamental concept in the Apache Solr search engine – full-text search.

The Apache Solr is an open source framework, designed to deal with millions of documents. We’ll go through the core capabilities of it with examples using Java library – SolrJ.

2. Maven Configuration

Given the fact that Solr is open source – we can simply download the binary and start the server separately from our application.

To communicate with the server, we’ll define the Maven dependency for the SolrJ client:

<dependency>
    <groupId>org.apache.solr</groupId>
    <artifactId>solr-solrj</artifactId>
    <version>6.4.2</version>
</dependency>

You can find the latest dependency here.

3. Indexing Data

To index and search data, we need to create a core; we’ll create one named item to index our data.

Before we do that, we need data to be indexed on the server, so that it becomes searchable.

There are many different ways we can index data. We can use data import handlers to import data directly from relational databases, upload data with Solr Cell using Apache Tika or upload XML/XSLT, JSON and CSV data using index handlers.

3.1. Indexing Solr Document

We can index data into a core by creating SolrInputDocument. First, we need to populate the document with our data and then only call the SolrJ’s API to index the document:

SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", id);
doc.addField("description", description);
doc.addField("category", category);
doc.addField("price", price);
solrClient.add(doc);
solrClient.commit();

Note that id should naturally be unique for different items. Having an id of an already indexed document will update that document.

3.2. Indexing Beans

SolrJ provides APIs for indexing Java beans. To index a bean, we need to annotate it with the @Field annotations:

public class Item {

    @Field
    private String id;

    @Field
    private String description;

    @Field
    private String category;

    @Field
    private float price;
}

Once we have the bean, indexing is straight forward:

solrClient.addBean(item); 
solrClient.commit();

4. Solr Queries

Searching is the most powerful capability of Solr. Once we have the documents indexed in our repository, we can search for keywords, phrases, date ranges, etc. The results are sorted by relevance (score).

4.1. Basic Queries

The server exposes an API for search operations. We can either call /select or /query request handlers.

Let’s do a simple search:

SolrQuery query = new SolrQuery();
query.setQuery("brand1");
query.setStart(0);
query.setRows(10);

QueryResponse response = solrClient.query(query);
List<Item> items = response.getBeans(Item.class);

SolrJ will internally use the main query parameter in its request to the server. The number of returned records will be 10, indexed from zero when start and rows are not specified.

The search query above will look for any documents that contain the complete word “brand1” in any of its indexed fields. Note that simple searches are not case sensitive.

Let’s look at another example. We want to search any word containing “rand”, that starts with any number of characters and ends with only one character. We can use wildcard characters and in our query:

query.setQuery("*rand?");

Solr queries also support boolean operators like in SQL:

query.setQuery("brand1 AND (Washing OR Refrigerator)");

All boolean operators must be in all caps; those backed by the query parser are ANDOR, NOT, + and – .

What’s more, if we want to search on specific fields instead of all indexed fields, we can specify these in the query:

query.setQuery("description:Brand* AND category:*Washing*");

4.2. Phrase Queries

Up to this point, our code looked for keywords in the indexed fields. We can also do phrase searches on the indexed fields:

query.setQuery("Washing Machine");

When we have a phrase like “Washing Machine“, Solr’s standard query parser parses it to “Washing OR Machine“. To search for a whole phrase, we can only add the expression inside double quotes:

query.setQuery("\"Washing Machine\"");

We can use proximity search to find words within specific distances. If we want to find the words that are at least two words apart, we can use the following query:

query.setQuery("\"Washing equipment\"~2");

4.3. Range Queries

Range queries allow obtaining documents whose fields are between specific ranges.

Let’s say we want to find items whose price ranges between 100 to 300:

query.setQuery("price:[100 TO 300]");

The query above will find all the elements whose price are between 100 to 300, inclusive. We can use “}” and “{” to exclude end points:

query.setQuery("price:{100 TO 300]");

4.4. Filter Queries

Filter queries can be used to restrict the superset of results that can be returned. Filter query does not influence the score:

SolrQuery query = new SolrQuery();
query.setQuery("price:[100 TO 300]");
query.addFilterQuery("description:Brand1","category:Home Appliances");

Generally, the filter query will contain commonly used queries. Since they’re often reusable, they are cached to make the search more efficient.

5. Faceted Search

Faceting helps to arrange search results into group counts. We can facet fields, query or ranges.

5.1. Field Faceting

For example, we want to get the aggregated counts of categories in the search result. We can add category field in our query:

query.addFacetField("category");

QueryResponse response = solrClient.query(query);
List<Count> facetResults = response.getFacetField("category").getValues();

The facetResults will contain counts of each category in the results.

5.2. Query Faceting

Query faceting is very useful when we want to bring back counts of subqueries:

query.addFacetQuery("Washing OR Refrigerator");
query.addFacetQuery("Brand2");

QueryResponse response = solrClient.query(query);
Map<String,Integer> facetQueryMap = response.getFacetQuery();

As a result, the facetQueryMap will have counts of facet queries.

5.3. Range Faceting

Range faceting is used to get the range counts in the search results. The following query will return the counts of price ranges between 100 and 251, gapped by 25:

query.addNumericRangeFacet("price", 100, 275, 25);

QueryResponse response = solrClient.query(query);
List<RangeFacet> rangeFacets =  response.getFacetRanges().get(0).getCounts();

Apart from numeric ranges, Solr also supports date ranges, interval faceting, and pivot faceting.

6. Hit Highlighting

We may want the keywords in our search query to be highlighted in the results. This will be very helpful to get a better picture of the results. Let’s index some documents and define keywords to be highlighted:

itemSearchService.index("hm0001", "Brand1 Washing Machine", "Home Appliances", 100f);
itemSearchService.index("hm0002", "Brand1 Refrigerator", "Home Appliances", 300f);
itemSearchService.index("hm0003", "Brand2 Ceiling Fan", "Home Appliances", 200f);
itemSearchService.index("hm0004", "Brand2 Dishwasher", "Washing equipments", 250f);

SolrQuery query = new SolrQuery();
query.setQuery("Appliances");
query.setHighlight(true);
query.addHighlightField("category");
QueryResponse response = solrClient.query(query);

Map<String, Map<String, List<String>>> hitHighlightedMap = response.getHighlighting();
Map<String, List<String>> highlightedFieldMap = hitHighlightedMap.get("hm0001");
List<String> highlightedList = highlightedFieldMap.get("category");
String highLightedText = highlightedList.get(0);

We’ll get the highLightedText as “Home <em>Appliances</em>”. Please notice that the search keyword Appliances is tagged with <em>. Default highlighting tag used by Solr is <em>, but we can change this by setting the pre and post tags:

query.setHighlightSimplePre("<strong>");
query.setHighlightSimplePost("</strong>");

7. Search Suggestions

One of the important features that Solr supports are suggestions. If the keywords in the query contain spelling mistakes or if we want to suggest to autocomplete a search keyword, we can use the suggestion feature.

7.1. Spell Checking

The standard search handler does not include spell checking component; it has to be configured manually. There are three ways to do it. You can find the configuration details in the official wiki page. In our example, we’ll use IndexBasedSpellChecker, which uses indexed data for keyword spell checking.

Let’s search for a keyword with spelling mistake:

query.setQuery("hme");
query.set("spellcheck", "on");
QueryResponse response = solrClient.query(query);

SpellCheckResponse spellCheckResponse = response.getSpellCheckResponse();
Suggestion suggestion = spellCheckResponse.getSuggestions().get(0);
List<String> alternatives = suggestion.getAlternatives();
String alternative = alternatives.get(0);

Expected alternative for our keyword “hme” should be “home” as our index contains the term “home”. Note that spellcheck has to be activated before executing the search.

7.2. Auto Suggesting Terms

We may want to get the suggestions of incomplete keywords to assist with the search. Solr’s suggest component has to be configured manually. You can find the configuration details in its official wiki page.

We have configured a request handler named /suggest to handle suggestions. Let’s get suggestions for keyword “Hom”:

SolrQuery query = new SolrQuery();
query.setRequestHandler("/suggest");
query.set("suggest", "true");
query.set("suggest.build", "true");
query.set("suggest.dictionary", "mySuggester");
query.set("suggest.q", "Hom");
QueryResponse response = solrClient.query(query);
        
SuggesterResponse suggesterResponse = response.getSuggesterResponse();
Map<String,List<String>> suggestedTerms = suggesterResponse.getSuggestedTerms();
List<String> suggestions = suggestedTerms.get("mySuggester");

The list suggestions should contain all words and phrases. Note that we have configured a suggester named mySuggester in our configuration.

8. Conclusion

This article is a quick intro to the search engine’s capabilities and features of Solr.

We touched on many features, but these are of course just scratching the surface of what we can do with an advanced and mature search server such as Solr.

The examples used here are available as always, over on GitHub.


Viewing all articles
Browse latest Browse all 3550

Trending Articles