Making the right design decision is a mix of Experience, Knowledge and a bit of Art. The phrase “right design decision” in itself requires the understanding that design is not an isolated abstract, it has context that spans requirements, environment to the people who actually about to write the software.
Service Fabric employs a complex set of procedures that ensures that your data is consistent and resilient (i.e. if a cluster node goes down).
Understanding (knowing) what goes under the hood in your application platform is key to making better design decisions. Today I want to attempt to describe how state is managed in Service Fabric and different choices that you have and their implications on the performance and availability of your Service Fabric Application.
Strongly recommend to go through Service Fabric: Partitions before reading this one.
Replication Process & Copy Process
Service Fabric employs two distinct processes used to ensure consistency across a replica set (the number of replicas in a given partition for a stateful service).
Replication Process: used during writing state data in your service.
Copy Process: used to build a new replica when an existing Active Secondary fails. Also used to recover an Active Secondary that was down for a while.
Both processes are described below.
So your Primary goes online start listening to requests and is about to write data using one of the reliable collection classes. Here is a break down of events:
- The Primary wraps the write operation (can be an add, update, or delete) puts it in a queue. This operation is synchronized with a transaction. At the other end of the queue your Active Secondary(s) are pumping out the operations as described in the next step. No data has been written yet to the Primary data store.
- Active Secondary(s) pick the operation(s) and apply them locally, apply them locally and acknowledge the Replicator one by one.
- The replicator delivers a call back to the Primary when write were success.
- Primary then can apply the data locally and return the original request caller.
Diving Even Deeper
- The replicator will not acknowledge the primary with a successful write until it was “Quorum Written”. A quorum is floor(n/2)+1 where n is the # of replicas in a set. As in the more than half of the replicas in a set has successfully applied it.
- Each operation is logically sequenced. The replication process is out of band (from the perspective o the Primary). As in at in any point of time an internal list is maintained by the primary that indicates operation waiting to be sent to replicators, operations sent to replicators and not yet Acked, operations sent to replicator and Acked.
- There is no relation between the data structure semantics (queue vs dictionary) and the actual storage/replication. All data are replicated and stored the same way.
- Given the following pseudo code
Create transaction TX
Update dictionary D (key = K to value of “1”)
Create another transaction TX1
Read Dictionary D (Key = K)
What value should you expect when you read the dictionary the second time? most probably whatever value K had before updating it. Because remember changes are not applied locally until replicated to the quorum.
Let us bend our minds even more
- Active Secondary(s) keep pumping data out from replication queue. They don’t depend on replicator to notify for new data. As in a number of Active Secondary(s) can be at different stages of replications. In a perfect case scenario a quorum has to be aligned with the Primary. This effect how you do your reads as we will describe it later.
- Service Fabric maintains – sort of – configuration version for each replica set. This version changes as the formation of the replica set (i.e. which one currently is the primary). This particular information are communicated to the replica.
- You can run in a situation where an Active Secondary is a head of the Primary. Given a situation where a Primary failed to apply changes locally before dying and not notifying the caller of successful writes. Or worse Service Fabric lost the write quorum (for example: many nodes of the cluster failed at the same time) and the surviving replicas (of which one will be elected primary) are the slow ones that was not catching up. The details of how this is recovered and communicated to your code (and control you will have on recovery) is the part of the Service Fabric team is working on. Rolling Back, Loss Acceptance, Restore from Backup are all techniques that can be used for this.
What About My Reads?
The Copy Process
The Fabric Replicator
So What Does This Means For Me?
How Fast are My Writes?
- S/A is the time of serialization & local calls to the replicator at the Primary.
- Slowest network to the replica (those of the quorum, or the slowest of fastest responders who happened to be most updated secondary(s)).
- D/A is the time of deserialization & calls to the replicator at the secondary.
- Ack time (the way back).
Large # of Replicas vs Small # of Replicas
But really, When Should I Go With Larger # of Replica in my Replica Set?
Large Objects (Data Fragment) vs. Small Objects
As you can tell the mechanics of Service Fabric internals are quite complex yet an understanding of how it work internally is important to building high performance Service Fabric applications.
till next time