Getting Closer to Achieving Cross-Table Consistency with Azure Table Storage

If you have ever attempted to implement an application that uses the Microsoft Azure Table Storage and goes beyond a certain degree of “Hello World” complexity, you may have come across the need for utilizing multiple tables to persist your data structures. A common example is the requirement to overlay a composite type over two or more tables, for instance, storing invoice header in one table, and using another table to keep the invoice line items. The other canonical example includes the use of multiple Azure tables to work around not having a support for secondary indexes. In this case, one or more additional auxiliary tables are needed in order to maintain the secondary index information.

What notably spans across the above two examples is the need for maintaining data consistency when two or more Azure tables are being engaged into a composite multi-step operation. In this blog post, I would like to provide one particular perspective on achieving cross-table consistency in such a scenario with minimal effort.

This topic might be useful for Azure developers who cares (worries) about maintaining a consistency across multiple tables and implementing a programmatic client-side compensation for failed storage transactions. The non-prescriptive recipe provided in this article is based upon a successful real-life customer project, and it can be broken down into three main ingredients: Why, What and How.

The “Why” Ingredient

Let me start with a very short remark as to why a tailor-made solution focused on addressing cross-table consistency is ultimately required.

To cut a long series of blog posts, corridor conversations and engineering meetings, the Microsoft Azure platform does not currently support the notion of distributed transactions. The primary reasons behind it have been discussed multiple times, with some industry-recognized experts articulating the architectural challenges and possible workarounds here and there. It would be unwise to argue with their viewpoint, so I will just respectfully echo their perspective.

As there is no platform-level support for coordination across multiple transaction-aware resources, the Azure developers are naturally being dragged into implementing a custom solution that ensures the basic level of data consistency when storage operations span multiple tables. Such an implementation can take many forms; and one of them renders itself below.

The “What” Ingredient

In our customer project, the technical approach to achieving cross-table consistency in Azure table storage was comprised of a bespoke client-side code responsible for observation, coordination, and orchestration of relevant actions that form a multi-step operation with data spanning two or more Azure tables. It provides the ability to undo all data modifications should any autonomous step of the entire operation fail repeatedly and irrecoverably.

To work around the limitation in distributed transaction coordination in the Azure Storage service, the core principles of the Compensating Transaction Pattern were exercised.

The following key criteria and design considerations essentially shaped the implementation:

  1. The code must reside on the client and be able to support all operations with Azure tables whereby a data modification occurs, such as Insert, InsertOrMerge, InsertOrReplace, Merge, Replace, and Delete.
  2. The code must be implemented in a generic fashion so that it can be plugged into an existing solution extensively leveraging the Azure Table Storage. The code must be capable of transparently collecting the sufficient information to be able to undo the effects of each step in the unsuccessful end-to-end operation with the Azure table entities.
  3. The code must not introduce any new APIs or programming paradigms so that it cannot upset the harmony of the place where the Azure developers feel safe and familiar with existing platform APIs and client SDKs.
  4. The final deliverable must adhere to the KISS principle strictly avoiding all the unnecessary complexity, as the code will need to be supportable and maintainable by ordinary humans from the same planet as the author.
  5. In line with the simplicity requirement, all data modifications of a given Azure table entity are assumed to have always been made by a single thread at a time. Concurrent changes made in a single table entity are undesirable, and must be avoided.

It is worth noting that the compensation logic may impose a significant latency tax when it kicks off, therefore its usage may not be suitable in low-latency scenarios. It is mainly due to the additional recovery actions and storage requests associated with rolling changes back. In addition, the undo steps in a compensating transaction may need to be governed by a stricter (heavier) retry logic in order to roll back all successful data modifications in the event of an intermittent service (or connection) unavailability. This may add even more latency to the bottom line.

The old cliché saying that “a picture is worth ten thousand words” can be put to test to visualize what a compensating transaction really does in the context of a multi-step operation with Azure tables when all autonomous operations succeed:

HappyPath

If any given autonomous operation fails at any point, the compensating transaction attempts to rewind back all the steps that have completed successfully:

UnhappyPath

As you can see, our custom implementation of the compensating transaction pattern simply returns the data to the initial state in the event of any failure at any step. Although this offers a fair amount of coverage for most scenarios with Azure Table Storage, such an approach may not always be applied blindly. In fact, the compensation logic does not always imply rewinding back to the original state, so please do not treat the implementation included herewith as “all singing, all dancing”.

The “How” Ingredient

The final deliverable manifests itself as a shareable .NET component called CompensationScope. It represents a scoped context object that can be used by consuming applications to track changes made in the Azure Table Storage so that these changes can be automatically undone if the entire multi-step operation fails to complete successfully.

The usage of the CompensationScope class involves instantiating it, and then flowing its instance to the Execute() method that invokes individual storage operations (or a batch of operations). Right before the very end of the multi-step operation, the developer should call the Complete() method provided by CompensationScope. This method marks the entire set of operations as successful. If the Complete() method does not get called, the compensation logic will kick off and will attempt to intelligently rollback all changes made to all Azure tables involved. This behavior essentially guarantees that the compensation logic will always kick off in the event of a failure.

Let me show you how you can integrate the CompensationScope class into an existing solution. Here is some existing code from our sample Azure project:

After you introduce CompensationScope class to this method, the result will look as follows (highlighted in yellow are the main differences):

As a side note, we implemented the CompensationScope class with a support for all existing storage operations including some “tricky” ones like InsertOrMerge or InsertOrReplace. The class recognizes all known operations and performs the correct undo action should it be required. However, the current implementation does not place a global lock on the entity being updated, which means that if two or more processes attempt to update the same entity at exactly the same time and one of the update operations fail, the older entity’s data may be restored during the undo phase. In other words, please take extra care when enabling CompensationScope for those entities that may be modified concurrently from multiple threads or role instances.

Let’s quickly zoom into the details and review the actual work being performed behind the scene by the CompensationScope class. For sake of brevity, I will just use the Insert operation to walk you through the bits.

Each table storage operation is “hijacked” by the class’ Execute() method where the operation is compared against the list of compensation-aware operations and turned into the respective undo action (line 15). Then, the table operation is relayed to the Azure table client and gets executed (line 18). If successful, and no exception is thrown, the undo action is put on a FIFO queue (line 21).

The ultimate knowledge as to how to rewind a failed storage operation resides inside the CreateUndoAction method, and its implementation gravitates towards a simplistic decision making based on the original operation’s type:

Below is the example of an undo action for insert operations into Azure table:

In the happy path, the client successfully invokes all storage operations and finally calls into the Complete() method. This method simply marks the entire scope as “completed successfully” with no further actions required.

However, if the unhappy path is triggered, the Complete() method will not be called, leading to some additional work kicking off inside the Dispose phase in the compensation scope lifecycle. The additional work involves enqueuing and executing all undo actions which the compensation scope was tracking before it was disposed (line 15).

That’s pretty much it! For those who want to explore this implementation deeper, the source code for CompensationScope is available on GitHub along with a set of unit tests because we care.

Hope you will find this post useful. Thanks for reading and following along!