Using Service Fabric Partitioning to Concurrently Receive From Event Hub Partitions

Prelog

The classic tale is you have Azure EventHub with a number of partitions, you run your receiver process in a singleton Azure Web Jobs or a single instance Azure Worker Role cloud service both will use “Event Processor Host” to pump events into your process. The host externalize offset state storage to Azure Blob Storage (to survive processor crashes and restarts). This works fine yet it has the following problem. You can not scale it out easily. You have to create more separate cloud services or web jobs each handling some of your event hub partitions. Which will complicate your upgrade and deployment processes.

This post is about solving this problem using a different approach.

Service Fabric Azure Event Hub Listener

Azure Service Fabric partitions are somehow logical processes (for the purists I am talking about primary replicas). They can exist separably each on a server, or co exist on a node. depending on how big your cluster is and your placement approach.

Using this fundamental knowledge we can create a service that receives events from Azure Event Hub as the following:

  • If the number of Service Fabric Partitions is greater than Event Hub Partitions. Then each Service Fabric partition should receive from multiple Event Hub partitions at once.
  • If the number of Service Fabric Partitions is equal Event Hub Partitions then each Service Fabric partition should receive from one Event Hub Partition.

Because we have absolute control over how the above mapping is done, we can also add 1:1 mapping (irrespective of the number of partitions here and there). or even custom funky logic where some service fabric service partitions are doing the receive the rest are doing something else.

The second piece of the new tale is state. Service Fabric has it is own state via the use of stateful services I can save few network I/Os by using them to store the state – reliably – internally on the cluster via the use of Reliable Dictionary. The state management for our receiver is plug-able. As in you can replace it with whatever custom storage you want (one of the scenarios I had in mind is running the listener in a stateless service and use a another packing stateful service).

The third piece of the new tale is usage. Event Hub Processor is great but it requires a few interfaces to be implemented (because it is a general purpose component). Ours is highly specialized single purpose component. Hence we can simplify the entire process by just implementing one interface (that represents what happened to events and when to store the offset).

The last piece of our new tale is the entire code wrapped in Service Fabric ICommnuicationListener implementation allowing you to use just as any other Service Fabric listener.

The code for the above is on github here https://github.com/khenidak/service-fabric-eventhub-listener

till next time

@khnidk

Redirecting nodejs stdout to Azure Storage

I am working on a platform that will host foreign code (i.e. not trusted) as one of it is primary functions. Code will be in either .NET/CLR or nodejs.

Aside from the security  concerns, i needed whoever decided to host the code to see whatever goes into their stdout to Azure Storage in both buffered and un-buffered (out of band) fashion.  Allowing them to view it later on.

my guess is, there will be a lot of demand for this feature hence i decided to publish it https://github.com/khenidak/node-stdout-AzureStorage as a stand alone component.

 

On Application Routers/Gateways

Prelog

The discussion below is about Routing & Application Gateways. It is about a component a wrote the code for and is published here https://github.com/khenidak/Router along with the documentation on design & how to use. The below is ranting on how things in this component are made in a certain way.

“Application Gateways are hard to build, but should not be hard to use”

Why Build an application Gateway or a Router

First: Why Application Routers/Gateways?

Your application will have requests coming to your external (or even internal) end points. These requests are not meant to be fulfilled by the application that received the requests but rather via backend of some sort. Most large scale (and especially those that are built for hyper-scale) have this requirement. Take the following examples:

  1. A SaaS application where users’ requests are routed to specific tenants. These requests might be web pages, Web Sockets, REST API Calls or even TCP/IP packets.
  2. An application that distribute the loads/requests types among multiple backend systems.
  3. An application that went through massive transformation (take M&A for example scenarios) where external API have changed, you will need a gateway in the middle to upgrade or bridge the protocol without the need to disrupt your existing external clients.
  4. Application Zoning/Containment/Partitioning/Segregation, where connections from the external world are authenticated, authorized then terminated at the gateway. The gateway then establishes connections to the backend systems.
  5. A microservices application deployed on top of a cluster managed by a cluster management platform such as Apache Mesos or Azure Service Fabric will need a gateway. While some examples refer to using 2 tiers (Web UX deployed on all nodes routing to compute in the backend such as Azure Service Fabric’s WordCount (https://github.com/Azure/servicefabric-samples/tree/master/samples/Services/VS2015/WordCount). This approach does not deal with situations where Web UX might be subject to routing per request (take a SaaS application, A/B Testing, or an application that provision mesos jobs or Azure Service Fabric App instances to scale to meet the load or isolation requirements).
  6. Specific platform requirements such as offering Azure Service Fabric Actor framework to external – to the cluster – applications (where Actor clients cannot be used).

And many others.

Is Routing a New Thing?

No, as a matter of fact every time you used a web server internal routing happens on the OS – typically by a kernel component – to route requests to different processes responsible for different address. At a higher level, frameworks such as ASP.NET Web API or WCF preform per request routing (dispatching in case of WCF) to the target controller/method (Service/Interface in case of WCF).

But This Is at the Platform Level, What about the Application Level Are This Requirement New? Is It a Cloud Thing?

No & No. Application gateways has been a thing since forever. I remember building an Http application gateway 1998. And there are 10+ years old products out that perform various parts of Application Gateway logic. The Cloud came in to fulfill “hyper-scale” requirements. Where the application itself can be provisioned multiple times to support load or isolation (hence the stress on routing).

The Problem With Routing

If your requirements are about from point A to point B and both are the same semantics (say Web API) with fairly static URL then you don’t need a custom gateway and I would strongly recommend looking for existing solutions such as Azure API Management (https://azure.microsoft.com/en-us/documentation/articles/api-management-get-started/). Or build a single purpose gateway. The problem is most requirements comes in the following forms:

  • Routing in multitenant solutions

If request is on https://<some-tenante>.<hist>.com/crm route it to http://node:port/<some-tenante>/crmapp (node is backend server) and add custom authorization header, then get the response and put in the original downstream payload while adding a custom header.

  • Routing in multi version and or A/B testing scenarios

If request is on https://www.<host>.com/api/customer and authorization header claim contains usergroup = “default” or usertype != “admin” then route it to http://node:port/api-v2/customer else route to http://node:port/api /customer

  • Routing in microservices like environments

if request is on https://www.<host>.com/api/customer then resolve microservices address list and perform round robin load balancing between them however if the request type is Post or Put or Delete then route only to primary partition (in case of Service Fabric).

  • Routing in protocol bridges

If request came on Web Sockets address ws://www.host.com/sockets/customer then route based to http://node:port/<some-tenante>/api/customer and set Http Method based on messagePayload.Type MT, MT = “add” then Method = Post, MT = “update” method = post etc.

Sounds complex enough? those are typical requirements for an application gateway.

The problem is scary enough but relatively easy to solve. If you split the representing the logic from the actual execution

Routing Logic, Simplifying the Complex

You can easily represent the logic by a linked list where each node represents a condition and/or logic and is only executed if the node.next executed successfully. In my code I called nodes Matcher (not the sexiest name I know). Consider representing them as the following:

 

//pseudo code 

// If request on bing then route it as Get to &lt;a href="http://www.microsoft.com"&gt;http://www.microsoft.com&lt;/a&gt; and add a custom header “CUSTOM_HEADER” with value “Hello, World!”;

var head = new SetMethod(“Get”)

head.chain(new SetAddress(“http://www.microsoft.com”), new AddHeader(“CUSTOM_HEADER, “Hello, World!”), new MatchAddress(“bing”, MatchType.MatchHostNameOnly);

// for more concrete samples and implementation check https://github.com/khenidak/Router

The above code describes routing and processing logic in an easy to understand fashion, and more importantly an easy extend framework. You can extend to add whatever matcher types you want. You will end up with something that looks like this (all images below are from the repo):

matching-frx

But what about my ANDs and ORs? This can also be represented by matchers, consider the following:


// pseudo code 

// If request on bing and user type is “dev” then route it as Get to &lt;a href="http://www.,msdn.com"&gt;http://www.,msdn.com&lt;/a&gt; else &lt;a href="http://microsoft.com"&gt;http://microsoft.com&lt;/a&gt; and add a custom header “CUSTOM_HEADER” with value “Hello, World!”;


// If request on bing then route it as Get to &lt;a href="http://www.microsoft.com"&gt;http://www.microsoft.com&lt;/a&gt; and add custom header “CUSTOM_HEADER” with value “Hello, World!”;

var head = new SetMethod(“Get”)

var msdn = IsUserType(“Dev”)
msdn.chain(new SetAddress(“http://www.msdn.com”))


head.chain(
   new OrMatcher(
              msdn
              ,
             new SetAddress(“http://www.microsoft.com”)
             ),
   new AddHeader(“CUSTOM_HEADER, “Hello, World!”),
   new MatchAddress(“bing”, MatchType.MatchHostNameOnly));

// for more concrete samples and implementation check https://github.com/khenidak/Router

Because of this type of branching the matchers are represented in memory as a tree not just a linked list.

matching-frx-tree

 

Because Matcher is a .NET type you can subclass it in new types of matching that suits your application (the code published here https://github.com/khenidak/Router contains most of the common stuff) or You can extend the existing ones with new capabilities specific to your application.

Executing the logic becomes a matter of mechanics as described in the documentation here https://github.com/khenidak/Router (the basic idea a “Routing Context” is pushed through the linked list).

 

Epilog

I have chosen to use this space to describe what and why we need application routers / gateways (the what and how along with source code is published on GitHub). I have also chosen to cover just one aspect of the complexity that usually comes in building them. Check the documentation to get an idea about the rest of the problems that a typical application router/gateway have to solve and how they were solved.

till next time

@khnidk

 

On October & Azure Service Fabric IoT Sample

Azure Service Fabric public preview is out today. Check the announcement here http://blogs.msdn.com/b/azureservicefabric/archive/2015/11/18/service-fabric-enters-public-preview.aspx. over most of October I have been busy – as in you wouldn’t get my usual ranting – putting together a reference architecture and sample for IoT on top of Service Fabric the sample is published here https://github.com/Azure-Samples/service-fabric-dotnet-iot
The sample in brief – quoting the docs – is:
“This sample is a reference architecture & implementation for hyper scale IoT event processing. The sample uses compute and storage on Azure Service Fabric and integrates with Azure Storage, Azure Power BI, and Event Hubs.”
Follow the link above for the sample. for the rest of the samples check http://aka.ms/servicefabricsamples be sure to check the “Party Clusters” 🙂
Till next time
@khnidk

The Case on Latency, Fairness & Throughput for Connected Clients

Prelog

The rant below is some of the challenges and possible solutions that you will encounter when developing a server side application that offer services to connected clients. While I will talk mostly about state-ful/session connection (as in TCP/IP sockets and the likes) the below to a high degree also applies on session-less connection (such as REST/Web API type of implantation).

A couple of things we need to note before going forward. Performance & Scale are different things. Performance is how fast you can respond to a request. Scale is how many of those can you respond to concurrently. They are related because typically at large number of concurrent connections & requests, your response times tend to increase.

For the sake of discussion, we will assume that we are building a game server that powers an action game such as Halo or the likes.

What is Under the Hood? Threads, Thread Pools and .NET’s TPL

So clients’ connections come in, in your code you need to assign compute resource to respond to connection’s requests. Irrespective of Windows, Linux or even other systems. There will be a message loop in place for each client that picks fragments of (typically a byte array) out of the wire buffer into your code. Each client gets a copy of that loop (or one loop goes through the clients). Fragments are picked, routed into your code, some execution happens, then a response is sent to the original client.

    • Threads: Each client connection gets a thread, the thread will perform the message pump for that particular client and you are done. As the client connects the thread is created, as the client disconnects the thread is destroyed. Easy? yes. too easy that it makes questionable. Here is why

Threads are an expensive compute resource, you don’t have a lot of them per process (obviously depends on your CPU/MEM and how big your kernel memory is, since it carries the handle table including thread handles). The maximum number of connected clients will be tied by the maximum number of your threads you can create in-process (and this usually tend to be small number).

The other problem is, your clients will come in all shapes and colors, some will be active, some will not be very active/idle. Not very active clients will have their idle threads (which executes just the message loop against empty buffer) eating away your compute resources.

It cannot be all that wrong, right? yes. Threads are really good if you are trying to achieve the fastest possible response times for an expected - better, fixed - number of clients. example: 2 servers sharing data (backup servers, warm stand by ones etc).  But they are problematic if you want to support the maximum number of connected clients, fairly (more on this later).

Best way to think about it if your topology is more like snowflake (few clients connected to each server, each server is connected to one or more servers) then threads are better, if you are doing a star like topology they are not.

  • Thread Pools (and .NET TPL): Because .NET TPL is built on top of Thread Pool we will just group the 2 approaches in one discussion. Here is how it works. As connections come in, you create the connection object and then string the pull (aka BCL’s/.NET Socket.Receive or win32 recv function) into a continues call where one call upon finishing queue the next call in thread pool via direct call or via await construct inside some loop). This will scale well. Resources are distributed well on connected clients (and are not mapped or dedicated to them). 

On Fairness Across Conneted Clients

Are We Fair?

Fair is when all connected clients’ requests are treated equally, irrespective of how active they are.

The thread per connected client approach, is fair/er because each client gets a dedicated thread. The OS Kernel will ensure equal time scheduled per threads. It does not scale well but it is fair. The thread pool approach while scalable it is not fair and will yield skewed performance numbers. Some clients will get faster responses others won’t.

Here is why (hint: The keyword here is “queue”):

Thread pools offer a queue (one and only one) where all your work items are queued, threads on the other end of the queue pick them up one by one and execute them, with a call back to notify completion (or the await .NET construct in TPL). Active clients will be served faster than *not very* active clients because they will have more work items in that queue. You might have heard ”The harder you hit the faster it will respond across all your calls” behavior in some of cloud PaaS services in Aws or Azure. It is largely attributed to this. Everything is placed in queue irrespective of fairness to connected clients.

Additionally, it has that phantom lock/release behavior. Consider this, One active client with 20 not very active clients on 20 thread thread-pool. There will be a condition where all threads in the thread pool are busy executing read from inactive clients while we have one client ready with a request waiting at the end of the queue. You will see this as idle CPU + longer response time then busy CPU with less response times even when the total number of request did not change.

Typically, we solve these problems by using very short timeout on the receive call. Or we use smaller I/O frames. Those solutions will not stop condition above from happening, but they will resolve it much faster than longer timeout on receive calls.

Do I Need Fairness Across Connected Clients?

If the situation above applies on the server side application, you are developing then start by asking yourself. Does it really worth it? The answer really is in what you promised your clients, if the maximum response time (for the most unfairly treated connected client) is within your SLA then you don’t need fairness. Ensuring that you are within SLA is a bit of art and a bit science (also known as performance testing). As you can tell you will have to test multiple load patterns (not just stress the system until it fails).

As an analogy, this is a lot like waiting for few minutes (minimum standard wait time) to get your espresso because the shop is full of other clients (who are not ordering anything). Some applications with super small latency cannot just afford that, gaming is one of them.

Where Can I Apply Fairness?

If you are reading thus far I assume you still need fairness and you are looking for possible solutions. Let us start. A typical application in this context perform 3 major things

  • Receive Requests
  • Execute Requests (i.e. route data fragments into your code)
  • Sends Response.

Each of this area (can/will) need fairness, depending on your requirements. I am obviously assuming that we will go with the thread pool route, not the thread per connected client route.

Using Time Slices

Each execution unit (thread pool work item) will get a specified amount of time that it has to be allowed to execute in. if it does not then it should time out. It is important that this is not razor sharp time allocation, it will vary as we will discuss it. This is fairly simple to implement; the complexity is to leave in-memory objects in a non-corrupt state. Some APIs are easier than others when it comes to that. For example, the Socket.Receive can be called with a timeout when it returns copy the received bytes into an array outside the about-to-be-terminated-task scope. Some will require addition work; for example, consider ExecuteRequestAsync below (as sample of #2 above)

async Task ExecuteRequstAsync(byte[] frame)

{

// do something that takes along time.

}

Such an opaque method is very hard to put a timeout on, because if you terminate the call you might be in a situation where your data structures are in a corrupt state.

A better approach is:

Task ExecuteRequstAsync(byte[] frame, CancellationToken ct)

{

// do step 0: execute any previous uncompleted work.

if (ct.IsCancellationRequested)

{

// retain state for uncomplete work (maybe an external queue)

return null; // don’t throw exceptions because from the perspective of the caller, the call did succeed.

}

// do something step 1

if (ct.IsCancellationRequested)

{

// retain state for uncomplete work (maybe an external queue)

return null; // don’t throw exceptions because from the perspective of the caller, the call did succeed.

}

// do step 2

if (ct.IsCancellationRequested)

{

// retain state for uncomplete work (maybe an external queue)

return null; // don’t throw exceptions because from the perspective of the caller, the call did succeed.

}

// do step 3

return null;

}

And here is how I can call it

ExecuteRequstAsync(data, new CancellationTokenSource(timeoutInMilisecond).token); //Token is created to be canceled after timeout

This way the method choses when and how it can return (leaving the in memory objects in a non-corrupt state). Most of the execution will use a little more time than the allocated timeout, yet overall it is easier to implement in safe fashion. As you do your checks for IsCancelRequested you can sign off the rest of the method to another Task (or use Task.ContinueWith calling another async method) which thread pool will put at the end of the queue (keep in mind that does not mean in-order execusion). The most difficult challenge is to ensure that ExecuteRequestAsync is called accumulatively (i.e. all state changes are applied in accumulative in-order fashion).

An alternative to this, is differed execution approach. A deferred execution is represented by an object that has a queue, and an execution loop that, instead of calling ExecuteRequestAsync directly you en-queue the call to the deferred execution. Each connected client is represented by a deferred execution instance timeout can still be used with retry (assuming that ExecuteRequestAsync is idempotent).

Using A Quantum

So far we refer to work items and execution as roughly the same thing. This approach depends on separating them. A work item is a task that connected client need to execute (i.e. one of the receive, execute, or send tasks). A quantum represents the number of work items that you will execute for a certain connected client before moving to the next. This assumes that all work items are executed in roughly the same time. This fairness model depends on having a queue per connected client (outside the queue which the thread pool controls). The easiest way to do this in .NET world is to implement your own TaskScheduler. For native you will have to implement queue and the scheduler from scratch. I have implemented the same pattern in .NET for a web socket server (that ensures fairness on the send side only) here I also ranted about this here

Epilog – Fixed Time Slice and Fixed Quantum are Bad

In a typical solution like the above you will start with a value (for either this or that). And then improve on them. If you see too many timeouts reported on your tasks (or the number of remaining tasks in queue for the second approach is too high) then you need to increase the size of your value. Remember that a time out or (quantum expiry) is somehow like context switching that require dropping/acquiring locks, unwinding stacks you don’t too many of these happening in your process it consumes CPU and does not directly contribute to responding to connected clients. If you are feeling brave you can use a watch dog that dynamically adjust the value in the runtime and/or per connected client.

till next time @khnidk

Stuff I Read

Over the past week I have been wading through some of my to-read list. I thought I’d share some of it

  • Re-read 1978, 1979 papers on Events/Clocks & Reliable Distributed Multi-Processor Systems:
  1. “Time, Clocks, and the Ordering of Events in a Distributed System” here: http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf
  2. “The Implementation of Reliable Distributed Multiprocess Systems” found here: http://research.microsoft.com/en-us/um/people/lamport/pubs/implementation.pdf

Note: While I love Leslie Lamport and his ideas, I still believe it is one of the driest reading out there.

  • Created a new Linux dev box for Docker, followed the basic instructions here http://docs.docker.com/linux/started/ created few containers based on existing images and created a sample test image. The entire process is easy. Next on the list is to manually setup a mesosphere cluster, I am probably gonna go with this https://open.mesosphere.com/getting-started/install/ as a starting point.

till next time

@khnidk

On Train Rides and Building Histograms

So you are developing your server application, congrats! no server application is complete without some sort of live telemetry. usually comes in form of:

  1. [Moving/Rolling] Count of event X, or Count of X where X > Y over past T period.
  2. [Moving/Rolling] Sum of X over the past T period.
  3. [Moving/Rolling] Sum/Average/Total/Count/StdDev of X (where X > Y) over past T period.

Sounds familiar? yes it does, because we have all faced one way or another with these requirements. The “live” part here is because you need to answer with data either from memory from some persisted store. Data is usually presented to user plotted on a histogram like diagram.

The requirements itself is not that hard but when you consider locking, memory management you will see the challenge that usually comes associated with them. add to that histogram is usually peripheral to the actual server application (hence it shouldn’t eat away memory nor CPU from the actual server application logic).

In a lot of ways you are trying to build a train conductor – obviously a more elaborate one – like component. Remember that guy/lady that walks around with a clicker in hand, clicks for it every person on the train.

Enter Clicker

The basic idea is a linked List that gets automatically trimmed after a defined period (click keep period). The linked list shouldn’t be locked under any condition. Each *Click* adds a node at the head. The list supports:

  1. Count()
  2. Count (in a timespan) : where timespan is shorter or equal to click keep period
  3. General purpose do & and do(in a timespan), do function takes a function pointer and execute it passing to it a copy of the list where in your function you can do max, average, StdDev or whatever you imagine.
  4. Clicks are events, by default it has a Value, but you can extend it as you like (to include other aggregates).

Chained Clickers, Calculation & Trimming

Because of the way it is implemented I can chain clickers (one for last min, one for last hour and so on) like

// this keeps the clicks for 1 min, then remove anything older
        private Clicker<MyClickType> m_MinuteClicker = new Clicker>MyClickType>(TimeSpan.FromMinutes(1));
// this keeps the clicks for 1 hour, then remove anything older
        private Clicker<MyClickType> m_HourClicker = new Clicker<MyClickType>(TimeSpan.FromHours(1));

// chain minute clicker into hour clicker
m_MinuteClicker.OnTrim = (head) => OnMinuteClickerTrim(head);


// later in the code
// whenever I need to record an event 
m_MinuteClicker.Click(new MyClickType(Add or Processed));

// Let us say i want to know the average of added clicks per min for the last hour
       return m_HourClicker.Do(head =>
                {
                    int sum = 0;
                    int count = 0;
                    var curr = head;
                    while (curr != null && curr.ClickType == Added)
                    {
                        sum = (int) curr.Value;
                        count++;
                        curr = (MyClickType)curr.Next;
                    }
                    return count == 0 ? 0 : sum/count ;
                }); 



// roll up totals into mins, saved for maximum of 1 hour
void OnMinuteClickerTrim(MyClickType head)
{
// my click type is either add event or processed event
            var totalAdd = 0;
            var totalProcessed = 0;

            var curr = head;
            while (curr != null)
            {
                if (curr.ClickType == Added)
                    totalAdd++;
                else
                    totalProcessed++;
            }

            // harvest 
            m_HourClicker.Click(new WorkManagerClick() { ClickType = Add, Value = totalAdd });
            m_HourClicker.Click(new WorkManagerClick() { ClickType = Processed, Value = totalProcessed});
}

the code is published here https://gist.github.com/khenidak/e78a0a3fd6fa071a4308

till next time

@khnidk

Azure Service Fabric: Multiple Communication Listeners Per Service

So I was off for a week, then got elbow deep on one of the projects. Slowly recovering to normal work rhythm. Until the work I am doing is completed – expected by end of October, stay tuned for future announcements 🙂 – I thought I’d share something of interest.

I am working with a Service Fabric service, where each service is expected to have:

  1. 0..n number of event hubs listeners. The event hub number can change as in add, remove hubs etc. Needless to say the service doesn’t need to stop, start, reset or upgrade in case of event hub
  2. 1 – Only one – REST control endpoint, where you can interact with the service to do:
    • Add Hubs.
    • Remove Hubs.
    • Get # of processed messages and other relevant telemetry data.
    • Stop/Start/Pause processing.

Service Fabric allows you to create one listener per service (for purest folks, per replica). More details here (http://henidak.com/2015/07/service-fabric-partitions/). In order to allow this to happen I created a composite listener, basically sets on top a list of listeners, listeners can be added, removed without affecting service lifecycles. The composite listener ensure that newly added listeners status matches the composite listener. As in if the composite listener has been opened-async then the newly added listeners will be opened-async-ed as well.

The composite listener can work with any class that implements ICommunicationListener interface (including those provided in the public samples).

I published it as a gist on github here: https://gist.github.com/khenidak/21f8de349a460ab90408

Till next time @khnidk

 

Azure for SaaS Providers

So I kicked off yet another side project. Azure for SaaS providers is a projet aims is to build a reference literture to be used by those who are using Azure to build SaaS solutions. This a long term project (probably ending by Feb 2016). I will be adding frequently to this content.

While the target the primary audience is SaaS providers the information can also be for solutions that needs to support massive scale.  And the core concepts can be used to support private cloud hosting or even different cloud providers.

The content is in OneNote format hosted here http://1drv.ms/1Ni2geR (no you don’t need to install OneNote)

Below is the content introduction page for a quick read:

What is this?

This is collection of literature and reference materials that help those planning or currently using Azure to build or host SaaS solutions. While the primary audience is those who are planning to use Azure in SaaS context there is nothing stopping you from using this for other applications, specifically applications with high scale requirements. Typically applications that require partitioning, isolation and containerization.

Why?

Frankly speaking, because there is not a lot of literature out there covering this topic. And those which cover it either too narrow, too old or both. The purpose is to liberate SaaS vendors’ developers to build more great features and less platform components (for architects this represents a wealth of options to consider while thinking about your SaaS solutions running on Azure).

What should I expect here?

Design options with tradeoffs, diagrams, code samples, code components and reference to external sources.

Why using this format?

  1. SaaS vendors come in all shapes and colors and there is no one solution that fits all. Hence the recipes approach. You mix, match and modify recipes as you see fit.
  2. Microsoft Azure is currently (and will always be) in fluid state. New services/components will be added and existing ones will be updated. This format allows us to revisit each recipe without having to change a lot of other recipes.
  3. OneNote is easy to use, free and with web frontend which can be viewed irrespective of the device you choose to use to view this content.

What if I want to contribute?

Please do reach out and let us have a discussion on which areas you can cover.

Questions/comments

@khnidk

 

On Service Fabric & Multiple Wcf Endpoints

Service Fabric allows you to listen to arbitrary endpoints (irrespective of the protocol, messaging framework etc..) by funneling everything through CreateCommunicationListener method which returns a listener (discussed at a great length here). The listener itself have basic life cycle control primitives exposed to to you as Initialize, OpenAsync, CloseAsync & Abort methods.

The included libraries (with Service Fabric) further extends into Wcf specific listening with 1:1:1 mapping (one service replica: one Wcf Host: one Wcf endpoint) which works well in a lot of the situations.

The problem happen when this mapping is not found (for example):

  1. Legacy services  that were built to have 1 host: multiple endpoints). and you need to migrate as is (some refer to inter contract shared state in Wcf world) .
  2. Legacy services that uses *weird* routing that having 2 contracts in the same service host won’t work (for those Wcf fans out there you know what i am talking about). They have to work on a different Wcf hosts.

While helping in migrating an existing Wcf system last week I encountered some of the above. Enter Wcf multi point listener which allow you to do the following:

1- Map multiple Wcf endpoints/contracts/hosts to a single replica.
2- Migrate Wcf Services as-is to Service Fabric based hosting.
3- Migrate Wcf Services (and use Service Fabric State) in them.
4- Control over how listening address are created.
5- control over how bindings are assigned.
6- Control over Wcf Hosts, Wcf Endpoints as they are being created. to add behaviors, dispatchers etc..
7- Support for ICommunicationXXXX Service Fabric interfaces which recycles Wcf channels between callers. Maximum of 1 channel type per host/per endpoint is created at any single time.
8- The service fabric client implementation implements disposable pattern and will abort all open Wcf channels upon Dispose or GC.

The code is open source and published on Git Hub here: https://github.com/khenidak/ServiceFabricMultiPointWCFListener

Feel free to clone, use, change.

till next time

@khnidk