Logging
If you find yourself troubleshooting, you'll be very glad to have Solr-specific logs to refer to. Given how easy this is to configure, you owe it to yourself to do so. Assuming you have the downloaded .zip from Solr:- Copy the Jar files from solr/example/lib/ext to Tomcat's lib/ folder.
- Copy properties file from solr/example/resources to Tomcat's lib/ folder.
RAM
When dealing with Solr there are two kinds of RAM to consider. One is the amount of RAM dedicated to the Java heap and the second is OS disk cache. While I can't give specific guidance on how much RAM you should devote and where, I will provide some general advice and guidance.Java Heap
To set the Java heap size is pretty straightforward matter once you understand the implementation details of Tomcat for your machine. Mainly, this means what version of Tomcat are you running and which OS do you use. I'll be covering Tomcat 8 as a Windows service. If you differ from me in one or more regards, don't despair. Most of what I say still applies to you, you'll probably need to look a little to find the equivalent spots to make your setting changes.First, let's review the four different memory management parameters you may control
- Xms - The minimum size of your heap
- Xmx - The maximum heap size
- XX:PermSize - Specifies the initial size allocated to the JVM at startup
- XX:MaxPermSize - If necessary, up this maximum will be allocated to the JVM during startup
Most likely, you won't need to worry about XX:PermSize and XX:MaxPermSize unless you see errors like Java.lang.OutOfMemoryError: PermGen Space. Much more likely, you will want to control the bounds on your already-running heap through Xms and Xmx. If you are running Tomcat as a Windows service then this is as simple as filling in a text box. For example:
The above screenshot shows the equivalent of setting -Xms=256m and -Xmx=512m. Additionally, I elected to specify the XX:PermSize as 128MB.
As a final note on heap size, be aware that for heap sizes greater than 2GB, garbage collection can cause performance problems. Symptoms are occasional pauses in program execution during a full GC. This can be mitigated through GC tuning of your JVM or electing to use a commercial JVM.
Disk Cache
For disk cache, you would ideally have enough RAM to hold the entire index in memory. Whatever memory remains unused once the OS, running programs, and the Java heap have been satisfied is fair game for disk cache. Thus, if 12GB of RAM is unused, you could potentially fit 12GB of index data into memory before the OS is forced to start paging. In practice, you must use trial and error to find the right memory fit for your data and usage patterns.Secondary Cores
Given that you have elected to use Solr, you probably treat search as a first-class citizen in your environment. If you aren't using secondary cores to provide data continuity during an index rebuild, you're simply doing it wrong. It helps that the process for configuring secondary cores is easy to follow.Note: every time a rebuild occurs, the name values in the core.properties files for the two related cores will swap. This is normal behavior, of course, but can be horribly confusing if you aren't aware of it. I.e. don't just assume that the name of the core you are viewing matches the core's folder name in your Solr home directory!
Replication
This topic is actually quite broad and probably deserves a blog post or several all of its own. Nevertheless, we can at least imagine the base case of wishing to provision a second Solr instance that is slaved to a master instance. Fail-over will not be automatic although you could script it.- Modify the core.properties file in your cores to set whether the core is a master or a slave.
- On Master
- enable.master=true
- enable.slave=false
- On Slave
- enable.master=false
- enable.slave=true
- Modify conf/solrconfig.xml file in each core to include a request handler for replication. Below is a snippet of XML you can use. Simply replace the "remote_host" and "core_name" in the snippet's XML with your environment's values. Note: the way I have constructed this snippet means you can apply it "as is" to any core on your master OR your slave instance. The trick I used was to associate the state of the "enable" property for the master and slave elements with the value of the enable.master and enable.slave properties from the core's core.properties file which you should have set in step 1. This makes your bookkeeping duties a little less painful, especially if you ever find yourself swapping the master and slave around.
Further Reading
Great Post Patrick! I would add that in CM/CD environment we need to disable automatic index update strategy on CDs servers, by changing it to manual, Index update should be triggered only from CM server
ReplyDeleteHi, Ahmed. Thanks! :) Good advice about the index strategy.
ReplyDelete