Summary of CRDTs

In this article I am going to introduce you to CRDTs, or Conflict-Free Replicated Data Types. You will see under what conditions CRDTs might be useful, why you might want to use such a data type, what kinds of CRDTs are available, and how you can use them.

Think of having a Shopping Cart persisted by a multi-node replicated store, and then having one of the nodes become unavailable. If you have at least 3 nodes in the store, which of the remaining nodes contains the most recent version of a given user’s Shopping Cart? You need to display the most recent one. How would you know? What can you do to resolve this problem when the user is just ready to check out and pay?

Semantic Resolution

One typical approach is to use what is called Semantic Resolution. In this case you take all copies of the Shopping Cart from each of the remaining storage replicas and apply some “business logic” to the situation. The business says that we want to make the maximum sale, so the “logical” thing to do is merge the two remaining Shopping Carts so that the “official” Shopping Cart has the maximum number of products, each with the maximum count of items, from both. However, if the user has just recently removed an item from their Shopping Cart, then merging that item back into their possession looks like an error, and is at least annoying. In the worst case the user ends up not noticing the results of the merge, buys the extra item, and then returns it for their money back. This is probably much more annoying that noticing the extra item in the Shopping Cart and removing it before final check out.

Isn’t there a better way?

Introducing CRDTs

A CRDT is literally a data type, and there are two basic ways to implement them. Both of these approaches adhere to the CRDT acronym because both are named starting with the letter C:

  1. Convergent Replicated Data Types: This is where data converges, or merges. You focus on the data on each of the storage replicas and merge the data changes between each of the replicas in order to bring each replica up to date. Obviously to make this work well you must in some way time stamp the data so you can merge in a proper way. You could use a simple time stamp, but you may have to use Vector Clocks or some similar approach to get this to work out well.
  2. Commutative Replicated Data Types: Maybe you remember Commutative Properties from mathematics. For example, you can take two numbers, such as 2 and 3, and whichever way you add them together, the results is always 5. That is, 2+3 yields the same value as 3+2. In this case we are more focused on the operations rather than the data, and we communicate the operations to each of the replicas rather than merging data. Regardless of the data they have, the results of applying the operations they receive will result in obtaining the correct state. Commutative types work well when your messaging supports at-least-once delivery, such as is the case with Akka Persistence.

In the following examples we will discuss both Convergent and Commutative types.

One Convergent-Only Replicated Data Type

Here I will discuss only one Convergent-only type, the Register. A Register type has only two operations that you can perform on it: Set and Get. Because the only modification that you can make to a Register is to set its current value, it cannot be managed concurrently as a Commutative type.

There are two ways to implement a Register:

  1. Last-Write-Wins (LWW): Here you keep the value and a timestamp of when the value was set on a given replica. The timestamp allows the value to be merged on any given replica by comparing the timestamp of the local replica with that of the one to be merged. If the timestamp of the local value is older than the sent value that is to be merged, then the local value is updated.
  2. Multi-Value: Here you may keep multiple copies of the same conceptual value, but maintain information that indicates when the value came to be set. It’s possible to hand to a client multiple values of the same thing and allow the client to apply conflict resolution to determine which of the multiple values it shall choose as “current.”

There are advantages to both of these approaches. Multi-Value provides the most versatile, but also the most complex approach. Although a Multi-Value may seem to have the same nature as the poor merging example given under Semantic Resolution, the data that indicates the differences can be based on alternative criteria such as a Version Vectors or a Vector Clock.

Some Commutative Replicated Data Types

Here we are going to discuss several CRDTs:

  1. Counters: G-Counters and PN-Counters
  2. Sets: G-Sets, 2P-Sets, U-Set, LWW-Element-Sets, and OR-Sets
  3. Graphs

Let’s take a look at these one by one.

Counters

There are two types of counters:

  1. G-Counter: This is a grow-only counter, one that only understands addition. The implementation of a G-Counter basically implements a Version Vector or Vector Clock.
  2. PN-Counter: This is a positive-negative counter, one that understands how to both add and subtract. To support this there are actually two G-Counters that are maintained: a Positive G-Counter and a Negative G-Counter.

To explain how G-Counter works, let’s consider its operations on two replicas:

  • R1 increments the G-Counter A, yielding a value of 1
  • R1 increments the G-Counter A again, yielding a value of 2
  • R2 increments the G-Counter A for the first time, yielding a value of 1

Now the increment operations are sent from R1 to R2, and from R2 to R1:

  • R2 receives an increment notification from R1 and increments G-Counter A from 1 to 2
  • R2 receives an increment notification from R1 and increments G-Counter A from 2 to 3
  • R1 receives an increment notification from R2 and increments G-Counter A from 2 to 3
  • Both R1 and R2 are now up to date with an A value of 3

Now consider an example of how the PN-Counter works between two replicas:

  • R1 increments PN-Counter B, yielding a PN value of { 1, 0 }
  • R1 increments PN-Counter B, yielding a PN value of { 2, 0 }
  • R2 increments PN-Counter B, yielding a PN value of { 1, 0 }
  • R2 decrements PN-Counter B, yielding a PN value of { 1, 1 }

Now the increment and decrement operations are sent from R1 to R2 and from R2 to R1:

  • R2 receives an increment notification from R1 and increments its P counter from 1 to 2, yielding { 2, 1 }
  • R2 receives an increment notification from R1 and increments its P counter from 2 to 3, yielding { 3, 1 }
  • R1 receives an increment notification from R2 and increments its P counter from 2 to 3, yielding { 3, 0 }
  • R1 receives a decrement notification from R2 and increments its N counter from 0 to 1, yielding { 3, 1 }
  • Both R1 and R2 are now up to date with a value of B { 3, 1}. The actual value of the PN-Counter B is yielded by subtracting the N value from the P value. So 3-1 is 2, which is the actual value of PN-Counter B.

You can use a PN-Counter to hold the count of each of the product items that are currently in the Shopping Cart. In case the counter yields a value less than 0, you can always evaluate it as 0 and not show the product in the Shopping Cart.

There are other types of Counters available, but not covered here.

Sets

A G-Set, or a grow-only set, is implemented very much like a G-Counter. Again, we will examine the use of a G-Set on two replicas:

  • R1 adds the value 1 to G-Set C, yielding a set of { 1 }
  • R1 adds the value 2 to G-Set C, yielding a set of { 1, 2 }
  • R2 adds the value 3 to G-Set C, yielding a set of { 3 }

Now the add operations are sent from R1 to R2, and from R2 to R1:

  • R2 receives an [ add 1 ] notification from R1 and R2 adds the value 1 to G-Set C, yielding { 1, 3 }
  • R2 receives an [ add 2 ] notification from R1 and R2 adds the value 2 to G-Set C, yielding { 1, 2, 3 }
  • R1 receives an [ add 3 ] notification from R2 and R1 adds the value 3 to G-Set C, yielding { 1, 2, 3 }
  • Both R1 and R2 are now up to date with a set of C as { 1, 2, 3 }

There are other kinds of sets supported, but I don’t cover them extensively here.

There is a 2P-Set, which is a 2-phase set. It is managed much the same way as a PN-Counter; that is, there is a G-Set of added elements and a G-Set of removed elements, which are merged on each replica to yield the actual up-to-date set. You cannot add an element to the removed set that does not exist locally on your added set, which effectively means you cannot remove an element that has not been added already. The remove set functions as a tombstone set, which prevents removed items from ever being added back in to the merged sets. Be careful what you remove because it can never come back.

In a U-Set each element is added with a unique identity and you can only remove an element by using its unique identity. A remove must be performed as a causal operation as it can only following the add element with the same identity.

There is also the LWW-Element-Set, where each of the items added to the set and removed from the set have a timestamp. The final current set is yielded by adding all the most recently added items and removing the most recently removed items. This allows removed elements to be added back to the LWW-Element-Set, which is not possible with the 2P-Set.

An OR-Set is an observed-remove set, where each element added to the set is assigned a unique identity. This set is kept up to date by applying causal consistency. If two of the same elements, yet with different identities, are added to the set they will both be added, although the get (look up) of the element will yield only one element as the second will be masked as required by the definition of set. When remove of a specific element is requested, all identities of the same element are matched at the given source replica. Those identities are then used to remove the same element from all replicas. A remove will only perform a remove if it is historically and causally following an add of the same element (with the same identity), which conforms to the sequential specification of a set. In other words, there must be an element matched at the source replica for this operation to work across all replicas.

You could use a 2P-Set to hold the products that are currently in the Shopping Cart, but the problem here is that if the user removed a product and then wanted to add it back, it would not be possible. For that reason a U-Set, LWW-Element-Set, or an OR-Set would be better.

Graphs

Although you might think of graphs as being very powerful, there are problems with applying Graph CRDTs in a distributed system. For this reason I suggest referring to the paper below if you think you are interested in using the Graph type.

Yeah, But…

You ask: “Why wouldn’t you use a Value Object with a built in time stamp for the Shopping Cart scenario described above? After all, it’s just a matter of figuring out which replicated Shopping Cart has the most recent data.”

First of all, modeling this as a Value Object with a time stamp would work. It’s just that there may be advantages in using CRDTs. Consider some of the following reasons.

In the case of the Shopping Cart the main advantage is in redundant data suppression. Only the changes that are made to the Shopping Cart must be copied around to other nodes, and with most of the data not making each notification journey it will make bringing all nodes up to date much faster and more likely to succeed quickly. Then if the node that is hosting the most up-to-date Shopping Cart is lost, chances are better that less data replication will allow other nodes to be updated sooner.

On the other hand, the shopping cart is probably not the best example to reason about CRDTs. The limitation with the Shopping Cart example is that it is owned by only one user and probably most of the changes will be saved to the same node first, then propagated to other nodes. However, if you think of an Aggregate that can be used and modified by multiple users simultaneously, it’s more likely that multiple nodes will be modified separately at the same time. Chances are good that if non-conflicting changes are made by each user then each of the changes can be replicated to multiple nodes without any of the multiple users being dinged for a conflict (typical optimistic concurrency failure). And, by the way, actually the same thing can be accomplished with Aggregates and Event Sourcing when events are exchanged between nodes.

The problem is easier to reason on if you simply think about many users hitting the same URL and a counter is maintained for that URL and is partitioned across many nodes. As each of many users hits the URL the counter is incremented on any one of the multiple nodes. What’s the actual total? Well, each time the counter is incremented on any given node it will take some time before the other nodes are also incremented for that same URL access. Yet, within some period of time it is possible that one or more nodes will reflect the true total of the number of hits, that is if the URL isn’t pounded every few milliseconds 24×7. In any case, any one of the nodes will show at least a close approximation of the actual total.

References

The official paper on CRDTs is provided by INRIA. You can see of good overview presentation by Sean Cribbs.