Wednesday, July 8, 2015

Basic Tips to Prevent Solr Downtime

If you've followed my series on installing Solr for Sitecore then you should have a shiny, new Solr instance somewhere in your environment happily indexing Sitecore data and returning results to queries. Hopefully, that never changes, but we all know that hiccups can happen. This post suggests a few things you can do to mitigate or prevent down-time.

Logging

If you find yourself troubleshooting, you'll be very glad to have Solr-specific logs to refer to. Given how easy this is to configure, you owe it to yourself to do so. Assuming you have the downloaded .zip from Solr:
  1. Copy the Jar files from  solr/example/lib/ext to Tomcat's lib/ folder.
  2. Copy properties file from solr/example/resources to Tomcat's lib/ folder.
All done! You will find your new Solr logs in the install path of Tomcat in the logs/ folder.

RAM

When dealing with Solr there are two kinds of RAM to consider. One is the amount of RAM dedicated to the Java heap and the second is OS disk cache. While I can't give specific guidance on how much RAM you should devote and where, I will provide some general advice and guidance.

Java Heap

To set the Java heap size is pretty straightforward matter once you understand the implementation details of Tomcat for your machine. Mainly, this means what version of Tomcat are you running and which OS do you use. I'll be covering Tomcat 8 as a Windows service. If you differ from me in one or more regards, don't despair. Most of what I say still applies to you, you'll probably need to look a little to find the equivalent spots to make your setting changes.

First, let's review the four different memory management parameters you may control
  • Xms - The minimum size of your heap
  • Xmx - The maximum heap size
  • XX:PermSize - Specifies the initial size allocated to the JVM at startup
  • XX:MaxPermSize - If necessary, up this maximum will be allocated to the JVM during startup

Most likely, you won't need to worry about XX:PermSize and XX:MaxPermSize unless you see errors like Java.lang.OutOfMemoryError: PermGen Space. Much more likely, you will want to control the bounds on your already-running heap through Xms and Xmx. If you are running Tomcat as a Windows service then this is as simple as filling in a text box. For example:


The above screenshot shows the equivalent of setting -Xms=256m and -Xmx=512m. Additionally, I elected to specify the XX:PermSize as 128MB.

As a final note on heap size, be aware that for heap sizes greater than 2GB, garbage collection can cause performance problems. Symptoms are occasional pauses in program execution during a full GC. This can be mitigated through GC tuning of your JVM or electing to use a commercial JVM.

Disk Cache

For disk cache, you would ideally have enough RAM to hold the entire index in memory. Whatever memory remains unused once the OS, running programs, and the Java heap have been satisfied is fair game for disk cache. Thus, if 12GB of RAM is unused, you could potentially fit 12GB of index data into memory before the OS is forced to start paging. In practice, you must use trial and error to find the right memory fit for your data and usage patterns.

Secondary Cores

Given that you have elected to use Solr, you probably treat search as a first-class citizen in your environment. If you aren't using secondary cores to provide data continuity during an index rebuild, you're simply doing it wrong. It helps that the process for configuring secondary cores is easy to follow.

Note: every time a rebuild occurs, the name values in the core.properties files for the two related cores will swap. This is normal behavior, of course, but can be horribly confusing if you aren't aware of it. I.e. don't just assume that the name of the core you are viewing matches the core's folder name in your Solr home directory!

Replication

This topic is actually quite broad and probably deserves a blog post or several all of its own. Nevertheless, we can at least imagine the base case of wishing to provision a second Solr instance that is slaved to a master instance. Fail-over will not be automatic although you could script it.
  1. Modify the core.properties file in your cores to set whether the core is a master or a slave.
    • On Master
      • enable.master=true
      • enable.slave=false
    • On Slave
      • enable.master=false
      • enable.slave=true
  2. Modify conf/solrconfig.xml file in each core to include a request handler for replication. Below is a snippet of XML you can use. Simply replace the "remote_host" and "core_name" in the snippet's XML with your environment's values. Note: the way I have constructed this snippet means you can apply it "as is" to any core on your master OR your slave instance. The trick I used was to associate the state of the "enable" property for the master and slave elements with the value of the enable.master and enable.slave properties from the core's core.properties file which you should have set in step 1. This makes your bookkeeping duties a little less painful, especially if you ever find yourself swapping the master and slave around.
What to do in the event your master goes down? Edit the master/slave properties in the core.properties file and change the ServiceBaseAddress used by Solr in the Sitecore.ContentSearch.Solr.DefaultIndexConfiguration.config file. You should also (as soon as time allows) edit the Replication handler XML appropriately: either change the URL or comment it out entirely.

Further Reading

2 comments:

  1. Great Post Patrick! I would add that in CM/CD environment we need to disable automatic index update strategy on CDs servers, by changing it to manual, Index update should be triggered only from CM server

    ReplyDelete
  2. Hi, Ahmed. Thanks! :) Good advice about the index strategy.

    ReplyDelete