Monday, August 29, 2011

Apache Solr: Initial Encounter.

In this post, I will try to explain how to setup tomcat with solr and working with import data from relational databases (oracle in my case). This guide is more of cookbook thing, more details can be found at solr-tomcat link and data import handler link. I am assuming tomcat is setup and running and hence wont go into details of setup but the link can be found here.

Configuring Solr.
1) Setup solr directory and setup variable $SOLR_HOME.
  a) Download the apache solr source from link (click on Resources at left hand side tab and download the latest source code, apache-solr-3.3 at the time of writing this post).
  b) Unzip/ untar the source code. Copy the following into separate directory (I copied it to /home/amrut/solr)
  • <SRC_DIR>/dist/apache-solr-3.3.war to /home/amrut/solr (cp ~/Downloads/apache-solr-3.3/dist/apache-solr.3.3.war /home/amrut/solr/solr.war)
  • Contents of <SRC_DIR>/example/solr/ to /home/amrut/solr (cp -r ~/Downloads/apache-solr-3.3/solr/ /home/amrut/solr/)
  c) Setup environment variable $SOLR_HOME into .bashrc (for unix)/ .profile (for mac). export SOLR_HOME=/home/amrut/solr

2) Setup tomcat context fragment.
vi $CATALINA_HOME/conf/Catalina/localhost/solr.war
Paste the following into solr.war
<?xml version="1.0" encoding="utf-8"?>
 <Context docBase="/home/amrut/solr/solr.war" debug="0" crossContext="true">
 <Environment name="solr/home" type="java.lang.String" value="/home/amrut/solr/" override="true"/>
</Context>

3) Start tomcat and monitor the logs for any error.
Type the following at the command prompt: $CATALINA_HOME/bin/shutdown.sh; $CATALINA_HOME/bin/startup.sh; tail -f $CATALINA_HOME/logs/catalina.out

4) Solr should be available at: http://localhost:8080/solr.

Importing data from Relational database.
Consider a scenario where a table in mysql is to be imported into Solr. For people who want to try out, download the torrent dataset at link and insert into a table called torrent which has fields idtorrent, category, size, seeders, leechers. More information on how to load csv file into database can be found at link. 
After loading the data into database table torrent, we must modify schema.xml and solrconfig.xml files.
1) schema.xml: Configure fields in Solr.

<field indexed="true" name="seeders" stored="true" type="string" />
<field indexed="true" name="lechers" stored="true" type="string" />
<field indexed="true" name="category" stored="true" type="text_general" />
<field indexed="true" name="id" required="true" stored="true" type="string" />

2) solrconfig.xml: Map data import handler and corresponding source it should look at (data-config.xml)
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
      <str name="config">/opt/solr/example/torrent-data-config.xml</str>
    </lst>
  </requestHandler>

3) torrent-data-config.xml: contains the source information (in our case, the torrent table information)

<datasource driver="com.mysql.jdbc.Driver" password="mysql" url="jdbc:mysql://localhost:3306/torrent_schema" user="mysql">
<document name="torrent">
<entity name="item" query="select idtorrent as id, category, size, seeders, lechers from torrent" />
<field column="ID" name="id" />
<field column="CATEGORY" name="category" />
<field column="SIZE" name="size" />
<field column="SEEDERS" name="seeders" />
<field column="LECHERS" name="lechers" />
</document>
</datasource>

Finally, import data using going to this link: http://localhost:8080/solr/dataimport?command=full-import . Monitor the catalina.out and also status can be checked at http://localhost:8080/solr/dataimport?command=status.

1 comment:

  1. This comment has been removed by a blog administrator.

    ReplyDelete