Cache and workflow

Choosing the strategy for content population during the high traffic period

Based on a true story

Imagine you are running a Dancing Goat coffee business.. ๐Ÿ˜‰

And have a wonderful Kentico 12 MVC website. Business grows, the website becomes bigger and you have now hundreds of blog articles, thousand of news pages and plenty of other content. And one day you decide to run a big week-long event. Your marketing team is prepared to work almost 24x7. Google Analytics shows you the record - more than 1.000 users are currently viewing your website. Not life but a fairytail! What could go wrong here?! Well..

Content editing and caching

Let's take a step back and look closer at the Dancing Goat MVC website. If you are going to deal with higher traffic you must have a caching strategy. And in Dancing Goat you can see two levels of cache actually:

  • Data layer cache (caching the data returned by IRepository implementations)
  • Presentation layer cache (caching the HTML, output cache)

And the greatest thing with these caches is that they have cache dependencies to invalidate the cache! You can read more about Kentico cache dependencies in the documentation article. Basically, when you publish a new article these cache dependencies help you to invalidate the relevant parts of the cache and refill them again. So that the articles landing page shows you the newly published article in the listing automatically.

Output cache invalidation via dependencies is implemented in OutputCacheDependencies.cs class. For example, the code below uses the dummy key in the following format:

nodes|<site name>|<page type code name>|all

public void AddDependencyOnPages<T>() where T : TreeNode, new()
{
    if (!mCacheEnabled)
    {
        return;
    }

    var className = mContentItemMetadataProvider.GetClassNameFromPageRuntimeType<T>();
    var dependencyCacheKey = String.Format("nodes|{0}|{1}|all",
                                 SiteContext.CurrentSiteName.ToLowerInvariant(),
                                 className);

    AddCacheItemDependency(dependencyCacheKey);
    AddCacheItemDependency("cms.adhocrelationship|all");
    AddCacheItemDependency("cms.relationship|all");
}

The same cache dependency key format is used in CachingRepositoryDecorator.cs to invalidate data caches:

private string GetDependencyCacheKeyForPage(Type type)
{
    return String.Format("nodes|{0}|{1}|all",
        SiteContext.CurrentSiteName.ToLowerInvariant(),
        mContentItemMetadataProvider.GetClassNameFromPageRuntimeType(type));
}

But what happens if this cache invalidation is triggered on the page hit by high traffic? Right! The website will hang for a few seconds to refresh the cache and everything will be fine again. You will probably see a small CPU, memory and database spike on the monitoring dashboard and that is it. But it may become a huge problem in the following scenario:

Every time editors click "Save" button in Kentico admin - hundreds of output and data caches are destroyed, the system refills these caches while hundreds of requests are waiting in the application queue. And this may produce a snowball effect where your website stuck at 100% CPU, everything just hangs and stops working. The greatest marketing event is under a threat of not working website!

Workflow to the rescue!

Have you ever watched the content editors in their day-to-day job? ๐Ÿค” We are developers, right? We always have some new technology to play with instead. ๐Ÿค– But does it even matter how editors put the content in the CMS? Well, apparently it does in some cases..

When editors are in the hurry (it is our biggest event, remember?) they tend to make a bit more typos, miss paragraphs, constantly save and preview pages before finally approving. And the more they click "Save" button - the more cache invalidations happen. And ultimately the more performance problems with the website we have.

The very first thing to consider in this scenario is to enable the workflow to allow editors having a draft/edit version of the page where they can check and fix everything. And publish the page when they are finally happy with it. Kentico allows configuration of the following workflow types:

In this case the cache invalidation will happen only when the page is actually published. Or will it not?? ๐Ÿ˜ฒ

The final problem

I was so convinced (and I still don't know why) that this dummy key "nodes|<site name>|<page type code name>|all" will only be touched when the page is published (or just saved, if there is no workflow defined). But in the reality this key is touched if anything changes within the page of the specified page type:

  • Create a draft version of article - cache invalidation
  • Submit it for approval - yet another one
  • Reject changes
  • Delete draft version which even has never been published
  • Publish the page, copy the page, move the page..

And unfortunately there is no default dummy cache key that will be touched only when page is published. But can we fix it somehow? Yes, we can! ๐Ÿ’ช

Kentico has Global Events in the system, and there is a Publish event under WorkflowEvents category. It means that we can create our own dummy cache key that will be touched only on final publishing of the page. The format of the dummy key will be similar  with just "|published" part added:

nodes|<site name>|<page type code name>|all|published

We will need the following custom module, but only in Kentico admin (also known as "mother") solution:

[assembly: RegisterModule(typeof(CustomPublishCacheModule))]
public class CustomPublishCacheModule : Module
{
    public CustomPublishCacheModule()
        : base("CustomPublishCache")
    {
    }

    protected override void OnInit()
    {
        base.OnInit();
        WorkflowEvents.Publish.After += OnAfterDocumentPublish;
    }

    private void OnAfterDocumentPublish(object sender, WorkflowEventArgs e)
    {
        var node = e.PublishedDocument;
        if (node.IsPublished)
        {
            CacheHelper.TouchKey($"nodes|{node.NodeSiteName}|{node.ClassName}|all|published".ToLowerInvariant());
        }
    }
}

From now on cache dependencies can be changed slightly in the MVC solution. In OutputCacheDependencies.cs code file we need to change AddDependencyOnPages<T>() method by adding "|published" to the key format:

public void AddDependencyOnPages<T>() where T : TreeNode, new()
{
    if (!mCacheEnabled)
    {
        return;
    }

    var className = mContentItemMetadataProvider.GetClassNameFromPageRuntimeType<T>();
    var dependencyCacheKey = String.Format("nodes|{0}|{1}|all|published",
                                 SiteContext.CurrentSiteName.ToLowerInvariant(),
                                 className);

    AddCacheItemDependency(dependencyCacheKey);
    AddCacheItemDependency("cms.adhocrelationship|all");
    AddCacheItemDependency("cms.relationship|all");
}

And now we can do some testing:

  1. Attach the VS debugger to Dancing Goat MVC website
  2. Put the breakpoint into ArticlesController Index() method
  3. Open "/en-US/Articles" page and make sure the debugger stops in the breakpoint
  4. Refresh the page and make sure that the breakpoint is not hit now (it's output cached)
  5. Go to the Kentico "mother" application and start amending the page "/Articles/Articles under workflow/Donate with us"
  6. Click "Save" (the page should go to the "Edit" state of the workflow)
  7. Repeat step 4 now - and you can see that the controller is not hit, and it means that the output cache is still there!
  8. Now publish "Donate with us" page and you will see that now ArticlesController Index() method will be hit and the cache will be invalidated

And just one last thing worth mentioning, the "Publish" event is raised only for pages under workflow. It means if you use "Publish From" date for the pages without workflow, the approach described in this article will not work and the cache will not be invalidated in this case at all! It requires at least Versioning Without Workflow to be enabled for the pages.

Conclusion

Editing the content when the website is under a high traffic may become much more difficult and may require us, developers, to think about it thoroughly. This is because it will affect the way how we implement the caching. To summarize, this is a list of recommendations that can help you with this:

  • Reduce the amount of cache invalidations by refilling the cache only on actual publishing of the page
  • Enable workflow to allow content editors to play separately with draft/edit versions of the page without producing the cache invalidations and affecting performance
  • Use scheduled publishing with workflow and if some of the content can be scheduled to be published during the "quiet" traffic period (during the night, for example) - go for it!
    • Please note the link below if you see the scheduled content not working properly
  • Educate your content editors, explain that "just creating or editing a page" may have negative impact on the website performance

Useful links

Below you can find a list of very useful blog articles about cache in Kentico: