SharePoint: Isolating Test Data

If you extract the change sets described in the previous installment of this series into a separate installer class, you can use it to create data structures without installing WSPs. This enables automated integration tests for the installer itself, and for code based on the resulting data structures. Automated tests should be reproducible. When you run them twice, each run should yield the same result, if you didn’t change the code in between. If you run tests against the SharePoint object model, you will notice that SharePoint persists your changes between two test runs. This means that two consecutive runs can differ, if the second run evaluates data from the first run. Looking at SQL doesn’t help in this case. Most SQL servers support transactions, the basic tool for integration tests against databases. You start the transaction during the test setup and perform a rollback in the tear down method. This leaves the database effectively untouched. SharePoint does not support transactions, so this way is out of reach.

A common solution is to rely on mock objects. If you run your tests using mocks, your data won’t reach SharePoint, and won’t be persisted. This is feasible when the system under test is the business logic. But in many cases, the integration with SharePoint is more critical than the business logic itself. The object model exhibits some strange behavior which you probably won’t mirror in your mock objects. Take the SPWeb class as an example. When you create an instance, then add a new user defined field type and have a look at the field types exposed by your SPWeb instance, you will see the old list not including your new type. Somewhere deep inside SPWeb this list is cached, and you cannot influence it. Similar behavior can be observed for the Properties list. This can result in hard to find bugs. The second important source of errors hidden by mocking is the invisible dependency chain. Switch on forms based authentication, and SPWeb.EnsureUser will actually require a web.config with the appropriate settings for System.Web.Security.Roles.Providers. Although this is reasonable given the nature of forms based authentication, it is a source of confusion since it runs fine in a web context and fails for console applications or automated tests. Given these drawbacks, mocking the SharePoint object model should be handled with care.

Another source of inspiration can be unit tests. Main memory doesn’t support transactions either, yet unit tests run isolated from each other due to the lack of persistence. The point is that new main memory is allocated for each test run. Even if the physical bytes are reused, they are logically unused. You can mirror this by creating a new environment for each test run. Similar to memory-based unit tests, this environment is used once and then discarded. SharePoint provides different levels of isolation: You can create a new farm, a new web application or a new site collection. Creating a new farm provides perfect isolation, but takes a lot of resources. This is not feasible in practice. New site collections provide isolation of lists and content types, but share installed solutions, user defined field types and the like. Web applications fall somewhere in between. We prefer using one site collection per tests, since these are relatively cheap to create and sufficient in many cases. Creating a new web application is orders of magnitude slower.

You gain another order of magnitude in execution speed when you pre-allocate the test site collections. A windows service in the background can ensure that there are always a few dozen site collections ready to be used as a testing environment. Each test run then take one of them (if available), mark them as used and delete it when it’s done:

public SPSite GetSite()
{
  var site = UnunsedSites.FirstOrDefault() ?? CreateSite();
  site.RootWeb.AllProperties["IsUsed"] = true;
  site.RootWeb.Update();
  return site;
}

private static SPSite CreateSite()
{
  var site = WebApplicationHelper.CreateSite();
  site.RootWeb.AllProperties["IsReady"] = true;
  site.RootWeb.Update();
  return site;
}

It is not trivial to determine whether there is a usable site collection. SharePoint likes throwing exceptions when you access a site collection during its creation or deletion:

public static IEnumerable<SPSite> UnunsedSites
{
  get { return TryFilter(s=>!IsUsed(s) && IsReady(s)); }
}

private static IEnumerable<SPSite> Sites
{
  get { return WebApplicationHelper.WebApplication.Sites; }
}

private static IEnumerable<SPSite> TryFilter(Func<SPSite, bool> filter)
{
  foreach (var site in Sites)
  {
    try
    {
      try
      {
        if (!filter(site))
          continue;
      }
      catch
      {
        continue;
      }
      yield return site;
    }
    finally
    {
      site.Dispose();
    }
  }
}

private static bool? TryParseBool(object value)
{
  if (value == null)
    return null;
  bool result;
  if (bool.TryParse(value.ToString(), out result))
    return result;
  return null;
}

private static bool IsReady(SPSite site)
{
  return TryParseBool(site.RootWeb.AllProperties["IsReady"]) ?? false;
}

private static bool IsUsed(SPSite site)
{
  return TryParseBool(site.RootWeb.AllProperties["IsUsed"]) ?? false;
}

If test runs fail in a way that the tear down method is not reached, for example when you stop a run in the debugger, the site collection won’t get deleted. You can add a garbage collector to the windows service to remove these zombie site collections:

public static IEnumerable<SPSite> ZombieSites(TimeSpan timeOutReady, TimeSpan timeOutUsed)
{
  return TryFilter(site => IsZombie(site, timeOutReady, timeOutUsed));
}

private static bool IsZombie(SPSite site, TimeSpan timeOutReady, TimeSpan timeOutUsed)
{
  var age = DateTime.UtcNow - site.LastContentModifiedDate;
  var isZombie = (!IsReady(site) && age > timeOutReady) ||
                 (IsUsed(site) && age > timeOutUsed);
  return isZombie;
}

Using this cache reduces the overhead for each test run to about 300 ms. This is huge when compared to unit tests. On the other hand, it is fast enough to encourage developers to write and run a few tests for the code they are working on, probably even using test-driven development.

We developed and implemented the site collection cache at adesso. We might publish the implementation, be it open or closed source, but this is still undecided. If you are interested in this, please leave a comment. I am not the one who decides, but we are actively seeking opinions on this, so you will actually influence the outcome.

Using .less in ASP.NET MVC4

Update [2012-06-02]: Visual Studio 2012 RC includes a new bundle mechanism that supports chained transforms. Scott Hanselman posted an example that includes .less.

ASP.NET MVC 4 (beta) supports post-processed bundles which are especially useful for CSS style sheets and JavaScript. MVC 4 is shipped with minifiers for CSS and JavaScript, but you can also use your own post-processor. This is where you can plug in .less, using the dotlessClientOnly nuget package:

public class LessMinify : IBundleTransform
{
	private readonly CssMinify _cssMinify = new CssMinify();
	public void Process(BundleContext context, BundleResponse response)
	{
		if (context == null)
			throw new ArgumentNullException("context");
		if (response == null)
			throw new ArgumentNullException("response");
		response.Content = dotless.Core.Less.Parse(response.Content);
		_cssMinify.Process(context, response);
	}
}

Now you can create bundles with your own minifier:

var styles = new Bundle("~/Content/myStyle", new LessMinify());
styles.AddFile("~/Content/someStyle.less");
BundleTable.Bundles.Add(styles);

SharePoint: Updating Data Structures

One of the most problematic aspects of change requests in SharePoint projects is the update mechanism. You have WebParts written in C# and the accompanying .webpart files. These are based on data stored in lists based on a declarative XML specification. The lists themselves refer to ContentTypes, also specified using XML. ContentTypes rely on Site Columns, as found in a third kind of XML file. The data is formatted using CSS files, and all of this is activated according to a XML based Feature. Your solution is already deployed, your users have filled it with actual production data. And then they notice that something has to change. You have to update the whole solution.

SharePoint ships with an update mechanism for solutions, accessible via the PowerShell commandlet Update-SPSolution. How does it work? I can’t tell for sure. Handling all the different kinds of declarative data structures is difficult, and sometimes counterintuitive. Have a look at the feature definitions. When you add a new feature to your solution, it is actually ignored by Update-SPSolution. You have to add a new solution package instead. This leads to solutions with technical feature names, such as “Web” and “Site”. These features contain everything used in this scope. Most users don’t like these features, they prefer names from their domain language instead of obscure technical terms. This is something many developers learned to work around. But there are more complex problems. Given a ContentType with a field called A. Now this field has to be renamed to B. If you update the respective XML files, how does Update-SPSolution handle this? Does is rename the field? Does it remove A and add B? Does is anything at all? It’s this kind of uncertainty which leads many developers to one-shot solutions, devoid of any future updates. You could look it up for any special case, but in general you won’t find a satisfying answer. Retracting and deploying the solution is no alternative, because this would delete all the existing user data.

SharePoint is not the only platform which has to deal with mutable data structures. Lets have a look at SQL. SharePoint isn’t a relational database management system, but the concept of SharePoint lists and SQL tables are close enough to get inspirations. It is important to note that the declarative list definitions are somewhat comparable to CREATE TABLE, while the SharePoint object model can also be used to mimic ALTER TABLE. Tools like LiquiBase and dbDeploy help SQL developers to update data structures without losing user data. Their basic concept is the change set. A change set contains all changes from one version of the database to the next. When applied incrementally to an empty database, you finally reach the current data structure. You can also use them to update existing databases. If your production database runs in version 15, and you install the update including the change sets up to version 22, only the sets 16 to 22 are applied. Change sets explicitly specify the required change, not the resulting structure. This allows to rename fields without losing data, because the change set makes it clear that A is renamed to B, in contrast to removing A and adding B. While the resulting structures would be the same, the first case retains the user data while the second one yields an empty column.

The concept of change sets can be also implemented in SharePoint. The object model provides everything you need. The most comfortable way for developers is to add extension methods to SPList etc. using a fluent syntax:

SPList list;
SPContentType contentType; 
list.Edit(0).AddContentType(contentType).Apply();

Edit(0) specifies that the following change set is only to be applied if the current version of the list equals 0. AddContentType specifies an operation in this change set. You could also chain operations. The final Apply() then checks whether the current version of the list is actually 0. If it is, all operations are applied and the version is incremented by one. When you run the same code a second time, Apply() detects that the change set at hand has already been applied and ignores it. This allows for incremental updates:

list.Edit(0).AddContentType(contentType).Apply();
list.Edit(1).RemoveChildContentTypesOf(contentType).Apply();

In this case, the previously added content type is removed in the next step. You might think that you could also delete the first line instead of adding a second one, so that it isn’t even added in the first place. But the point of this mechanism is to deal with already deployed data structures, so you can only append operations, never change the ones already deployed on production systems. The change set implementation could look like this:

using System;
using System.Collections.Generic;

public abstract class ChangeSet<T> : IChangeSet
{
  private readonly List<Action<T>> _changes = new List<Action<T>>();
  private readonly T _entity;
  private readonly int _fromVersion;
  private readonly string _contextId;

  protected ChangeSet(T entity, int fromVersion, string contextId)
  {
    _entity = entity;
    _fromVersion = fromVersion;
    _contextId = contextId;
  }

  protected ChangeSet(ChangeSet<T> changeSet)
  {
    _entity = changeSet._entity;
    _fromVersion = changeSet._fromVersion;
    _contextId = changeSet._contextId;
    _changes = new List<Action<T>>(changeSet._changes);
  }

  protected ChangeSet(ChangeSet<T> changeSet, Action<T> change) : this(changeSet)
  {
    _changes.Add(change);
  }

  protected abstract Uri WebUrl { get; }
  protected T Entity { get { return _entity; } }

  public void Apply()
  {
    var versionAccess = new VersionAccess(WebUrl);
    var entityVersion = versionAccess.GetVersion(_entity, _contextId);
    if (entityVersion < _fromVersion)
      throw new MissingChangeSetsException(_entity, entityVersion, _fromVersion, _contextId);
    if (entityVersion > _fromVersion)
      return;

    foreach (var change in _changes)
      change(_entity);
    OnPostChanges();
    versionAccess.SetVersion(_entity, _fromVersion + 1, _contextId);
  }

  protected abstract void OnPostChanges();
}

You should store the versions numbers in a hidden list. Unlike the web properties, you can concurrently two different items in a list without overwriting without conflicts, while web properties are always persisted as a whole.

Since you usually don’t have a central point of data access in SharePoint projects, you also don’t have a central point to manage all changes in the data structures. Therefore multiple modules in your code might want to apply change sets to the same object, for example the root web. You can deal with this by storing the version together with a context id:

list.Edit(0, contextId).AddContentType(contentType).Apply();

Given these ids, the order in which your features are activated and the updates are applied becomes irrelevant. By the way, feature event receivers are a good place to apply change sets. As you don’t have declarative data structures anymore, your data structures won’t be removed when you retract your features. This allows you to use the retract/deploy mechanism to install updates, including all the bells and whistles of a new deployment such as adding new features. The feature activated event is then used to perform the update.

We developed and implemented the change set concept at adesso. We might publish the implementation, be it open or closed source, but this is still undecided. If you are interested in this, please leave a comment. I am not the one who decides, but we are actively seeking opinions on this, so you will actually influence the outcome.