Obviously, zero-downtime deployments are very valuable to enterprise clients. Since blue-green deployments minimize downtime (perhaps even to zero) it behooves us as Sitecore architects to utilize the strategy. Of course, if you have any familiarity with the complexity of a Sitecore environment, blue-green deployments may seem like an unattainable goal.
My motivation for this blog post is to sketch out an approach for blue-green deployments with Sitecore. I'd like to think through the problem and demonstrate it is possible insofar as we can trust a thought-experiment.
The Challenge
The primary problem posed by Sitecore with blue-green deployments is the database layer. Since the database is a shared resource amongst all Sitecore servers in an environment, any change there can affect the entire environment. Additionally, since we are dealing with a CMS, we must expect that authors regularly introduce changes to the database.This database challenge is exacerbated by two factors:
- We cannot control the schema. Sitecore must own that.
- We must actually think about two database layers (SQL and Mongo) -- in which some databases are inter-related -- as well as their attendant search indexes
Point #2 probably requires a little more explanation. The databases in Mongo function as a very large "net" that captures all interaction data with visitors to the site. Mongo is organized in such a way to make writes very fast. The Reporting database in SQL represents data from Mongo that has been reorganized to support efficient reads so that report performance is optimized. The Analytics index lets us query Mongo data from the API efficiently. Thus, Mongo is the source of truth for visitor interaction data and the Reporting database and the Analytics index are coupled to it. This means any changes introduced to the data in Mongo must also be represented in SQL and in the Analytics index. We must treat those three systems as a unit.
A Solution
Clearly, we must mitigate the problems posed by the database layers. I believe we can, but let's first pose a few assumptions:- Content Authors will be inactive during deployments.
- We cannot use InProc mode for session state on CD servers.
- Sticky sessions (server affinity) should be disabled.
- Load-balancer supports configuration changes through scripting
Step 0: Initial State
Step 1: Synchronize Content
Step 2: Deploy New Version
Update: There is nothing, I believe, about this strategy the requires indexes to be rebuilt. There could be something about your particular solution that needs an index to be rebuilt. If so, this is the correct step to perform that work.
Step 3: Testing
Step 4: Change Connection Strings and Analytics Index
First, let's take a moment to think through analytics data. It all starts with the four Mongo databases. From the Mongo databases we have the derived data in the SQL Reporting database and the Analytics search index. So, really we need to think of these three subsystems as a unit. Another important consideration is that we never want to mix the live and non-live analytics data. For example, we don't want to dirty visitor interaction data with clicks generated by smoke-tests performed during a deployment. We also need to be careful with session data. Live users will be transitioned from the green environment to the blue environment. All of their serialized session data must remain valid and coherent.
Great...how do we do this?
We change the connection strings for the Mongo, Reporting, and Session databases for all Sitecore servers in the blue environment. We also modify the analytics index config to use the 'analytics_live' Solr core which is replicated from a corresponding 'analytics_live' Solr core in the green environment. In this way, we guarantee that all analytics and session data for a blue Sitecore server corresponds to live data.
Step 5: Make Blue the Live Environment
Step 6: Finish Retiring the Green Environment
Rollbacks
While I separated steps 4-6 are listed above as discrete steps to help illustrate the idea, as a practical matter we should think of them as a single continuous step. Even better, if we automate 4-6 as a single operation, the process of rolling back from blue to green becomes much easier.Imagine you finish step 6 and proudly watch traffic seamlessly flow to the blue servers only to discover that despite all the testing in step 3, an issue pops up related to the code just deployed. You could rollback to the green environment by inverting steps 4-6. That's a simple proposition if you took the time to automate 4-6 as a single operation.
We did something similar, but only with the web database, documentation is not quite there yet, but it works with enabling a config file. We toggle between green and blue as production server. Code is here: https://github.com/luuksommers/sitecore-staging-database
ReplyDelete