Java and J2EE Tutorials, Jsp and Servlet Tutorials, Spring MVC, Solr, XML, JSON Examples, Hibernate & Struts 2 Hello World projects



Friday, 25 January 2013

How to integrate Highlighting in Search Results Using Apache solr4 and apache tomcat

In our previous discussion we came to know what solr exectly is ? and features of solr 4 , than we came to know how to integrate solr with apache-tomcat server. if you are new to solr, I recommend you to go through following discussion before we start
In this particular blog we will come across a very useful feature of solr4 that is highlighting the search keywords in search data.
In solr4 highlighting part cab be configured  in request url as well as solrconfig.xml. If you have indexed your data correctly and facets are correctly placed than you will be able to see your search snipets in xml, json or the form you have selected.

search-result-in-solr-4

Now the step comes to highlight the search keywords in your search result, if you are able to see your searched result than the highlighting part is already set you only need to add some parameters to your search url like this.
This url will gives you search result on the basis of search keyword
http://localhost:8080/solr/select?q=body:<search keywords>&wt=xml
now to highlight data you need to add some parameters to it and you are all done.
This url given below will show highlighted data snipets
http://localhost:8080/solr/select?q=body:<search keywords>&wt=xml&hl=true&hl.fl=*&hl.simple.pre=<em>&hl.simple.post=</em>&hl.snippets=5
here, &hl=true tells the solr server to turn on the highlighting feature on your search.
hl.fl=*, tells solr in what fields highlighting should be implemented, if you want to highlight all fields than place a *.
&hl.simple.pre=<em> and &hl.simple.post=</em>, tells solr that the search keyword will be encloused with these tags.
&hl.snippets=4, tells solr the number of highlighted snipets to be shown in your result xml, if you specify a number 4 that solr will highlight starting 4 entries of search keyword in your search result.
Using these settings the highlighted keyword will be enclosed with these <em> and </em> tags.
The highlighted data with specified number of snipets is returned under <lst name="highlighting"> tags along with the search data. As shown below the search keyword sachin is enclosed with <em> tags and shows 2 snipets
<lst name="highlighting"><lst name="player"><arr name="content">

<str>He placed his bats on a plastic moulded chair near the net. On <em>sachin</em>'s attempt, they did not stand together</str>

<str>meticulousness has won him admirers the <em>sachin</em> tendulkar values his bats and his meticulousness has won him </str>

</arr></lst></lst>

You will find a number of snipets according to your query and data as shown in above example.
Note : If you are able to see your search result correctly than to implement highlighting you only need to add highlighting params and the highlighted data snipets will be shown in output data. If not than make sure your data is fully indexed and facets are correctly placed.
Here we are done with a formal highlighting, other configuration and custom settings are explained below:
In case you want to hard code highlighting part in your configuration files than it can be done in solrconfig.xml. The solrconfig.xml file is available as a part of solr package, we can add highlighting settings like this.

The highlight component and attributes can be configured for the fields needed to be highlighted on as follows.
<!-- Highlighting defaults -->

       <str name="hl">on</str>
       <str name="hl.fl">*</str>
       <str name="hl.encoder">html</str>
       <str name="hl.simple.pre"><em></str>
       <str name="hl.simple.post"></em></str>
       <str name="f.title.hl.fragsize">0</str>
       <str name="f.title.hl.alternateField">title</str>
       <str name="f.name.hl.fragsize">0</str>
       <str name="f.name.hl.alternateField">name</str>
       <str name="f.content.hl.snippets">3</str>
       <str name="f.content.hl.fragsize">200</str>
       <str name="f.content.hl.alternateField">content</str>
       <str name="f.content.hl.maxAlternateFieldLength">750</str>

By default the items are highlight using the <em></em> tags.

highlighting-in-solr-4

Some of the commonly used highlighting parameters of solr 4 are :
hl=true Enable highlighted snippets to be generated in the search results. Any blank, missing or "false" value disables highlighting feature.
hl.fl=* Enables highlighting feature in all fields, if you want to specify selected fields to be highlighted than give a list of fields saperated wit a comma(,).
hl.snippets=5 It accept a number as value, the specified numeric value decides the number of highlighted snipets to be returned in a query respense. The default value is 1.
hl.requireFieldMatch It accept a true or false value as parameter, the highlighted response is returned only if the keyword is found in requied field.
The default value is "false".
hl.maxAnalyzedChars It decides, how many characters into a document should be considered for highlighting.
The default value is "51200".
This is all about configuration and implementation of search keywords highlighting in solr4.

In upcoming blogs we will see how to make queries to solr server and other important stuff with solr 4.0.0.

Introduction to Apache Solr 4.0 with Apache Tomcat
Apache Solr 4.0 with Apache Tomcat 7 in Windows 7
Apache Solr 4.0 with Apache Tomcat 7 in Ubuntu Linux










Thanks for reading !
Being Java Guys Team



1 comment:

  1. So Solr still only returns snippets instead of supporting option to embed <em> and </em> directly into field values in query result.

    ReplyDelete

Like Us on Facebook


Like Us On Google+



Contact

Email: neel4soft@gmail.com
Skype: neel4soft