Thursday, May 21, 2015

Create a Reverse Proxy Controlled By Sitecore

Reverse proxies can be an incredibly useful technology in your Sitecore implementation depending upon your needs. The basic idea is that a reverse proxy forwards requests on to other servers on behalf of the requesting client, sort of like a traffic cop. The responses from the servers behind the reverse proxy are then returned to the requesting client. This can be done in such a way that is completely transparent to the end-user.

The Use Case

So why bother? Well, as Grant Killian suggests over on his blog at least two scenarios come to mind (I'm sure all you very smart folks could undoubtedly name more!) I want to focus on the case of a reverse proxy sitting between the Internet and a set of web servers that includes one or more legacy web servers and a Sitecore instance.

I've kept the conceptual diagram above simple (no load-balanced servers, firewalls, cache servers, etc.) but the technique readily applies to an enterprise ecosystem. The basic strategy is as follows:

  1. A user tries to browse a page (perhaps one they have bookmarked) e.g. http://company.com/foobar.php
  2. The reverse proxy receives the request and "asks" Sitecore where to route the request
  3. Sitecore tells the reverse proxy if it can handle the request and, if so, what the URL should be.
  4. The reverse proxy rewrites the request and forwards it. For example, if Sitecore responded positively to the reverse proxy our URL might be transformed to http://sitecorecd.company.com/foo/bar
  5. Sitecore or the legacy server responds to the page request
  6. The reverse proxy rewrites the response so that the end-user is unaware the page they see was came from a different server than the one they contacted.

The payoff with this scenario is we can now manage incremental content migrations from legacy servers to Sitecore servers without any disruption to end-user experience. Bookmarks, campaign emails, RSS feeds, Google search result rankings....all of it will happily continue on as always regardless of whether the legacy web server or Sitecore actually answers the HTTP request. Powerful stuff! This technique is especially useful for clients that have a very large inventory of content and cannot or are unwilling to migrate everything all at once.

The Solution

The first order of business is setting up a reverse proxy in IIS. The goal is to have a dedicated web site in IIS as the reverse proxy. To do that we need to install the Application Request Routing (ARR) extension. Once ARR is installed we'll need to do perform the following configuration steps

  1. Open IIS Manager and select the server node. Double click on the Application Request Routing Cache icon.

  2. In the right-hand pane click Server Proxy Settings.

  3. Check the Enable proxy setting and uncheck the Reverse rewrite host in response headers option.

  4. Set the Response buffer and Response buffer threshold values for 8092 and then click Apply. The reason for this I discovered through the school of hard knocks: some pages were mysteriously causing YSODs. After digging through logs (more on that later) we found that the response from the server was literally truncated. The page was large enough that it was overflowing the response buffer and causing it to flush with only a part of the overall page.


Now that ARR is installed and configured at the server level we need to turn our attention to the reverse proxy site. Here is the secret sauce of our solution: rather than merely write in some rules for routing in the web.config we are going to create our own custom Rewrite Provider. This will allow us to execute our own code during the runtime of the reverse proxy. I followed this guide to develop my own custom provider; it should get you up and running.

So what does my rewrite provider do? At its heart it's just a very simple URL resolver. The provider is rather dumb (and it should be!) We want the reverse proxy to do as little processing as possible.

  1. First make a request to Sitecore to see if the requested page (rewritten with Sitecore's host header) can be served, i.e. does the web request return with a response status code < 400.
  2. If that fails, the reverse proxy contacts a web service that knows how to map a legacy URL onto a Sitecore URL. Thus a URL like /news/article.php?id=foobar in the legacy system can be mapped onto /news/articles/foobar for example.
  3. If steps 1 and 2 fail, then the request is routed to the legacy server.

You may be wondering if all of that still sounds like too much work given that every request passes through the reverse proxy. Fortunately ARR has very good built-in caching, so in practice, your most requested pages (and resources) will not be processed in code continuously.

Closing Thoughts

Beyond what's already been covered, I recommend you consider the following:

  1. You need some kind of logging strategy. I write to the event logs from the reverse proxy and Sitecore's logs during the runtime of Sitecore (for example, the URL mapping web service.)
  2. Perform load testing. ARR is remarkably good OOTB with its cache settings, but better to test and know than simply assume.
  3. Think ahead about sessions and how you will deal with them.
  4. Redundancy. Our solution uses more than one reverse proxy. As a side-note with my implementation, if Sitecore itself goes down, the reverse proxy will continue to function. All requests would simply go to the legacy server. Eventually this behavior may become undesirable, but early in a project's lifetime this can be a real selling point.
  5. You definitely need to create some outbound rewrite rules in your reverse proxy's web.config to deal with:
    1. Rewriting relative links in the response HTML
    2. Rewriting the Location in the response Header when the status code is a 3XX (a redirect) and the host name is your backend server. This will prevent the end-user's browser being redirected to http://legacy.company.com/foobar rather than http://company.com/foobar
  6. You should absolutely turn on Failed Request Tracing Rules. This is the logging function I discussed earlier that proved invaluable in diagnosing and resolving issues during development

3 comments:

  1. Nice post...I look forward to reading more, and getting a more active part in the talks here, whilst picking up some knowledge as well..
    access Bomb-mp3 in UK

    ReplyDelete
  2. Thanks for the article, great concept! Did you run into any issues with IIS recognizing your custom provider? I have deployed the dll to the GAC, adjusted the app pool and tried rebuilding to different frameworks; still the Managed Types dropdown in empty when trying to add a provider.

    ReplyDelete
    Replies
    1. Hi Mike, make sure your DLL is signed and built to the 2.0 framework.

      Delete