Complex Event Processing (CEP) with Azure Stream Analytics : JOIN with Reference Data

Part 1 of this series : Getting started with Azure Stream Analytics.

In this post we will combine the reference data from Azure Storage and process the query and write the output events to the Service Bus. These events will be picked by the Azure Logic Apps and the responsible person to maintain the pipe will be notified.

Let’s begin by adding the second input to the Job Topology. The second input is a reference data which is a CSV file stored in the Azure Blob Storage.

Unfortunately testing the CSV inputs in the query window produced some error – this link 

But it is not a blocker to continue the experiment. After adding the input the CSV file, we can add the Service Bus Queue as the output.

After the complete wiring up of the inputs and output, we can write the CEP query.


The reference data input is named as ‘maintenance’. You can download the sample file from this location.

Let’s write the query for the CEP. In the query blade we can specify the query with the simple JOIN with the reference data. Note that the streaming data is in JSON format and reference data is in the CSV format, but Stream Analytics can join them and produce the output.

Query in text

 T.PipeCode, M.Owner, AVG(T.Temperature) AVGTemp
INTO Spikes
 TemperatureInput T
JOIN Maintenance M ON T.PipeCode = M.PipeCode
GROUP BY T.PipeCode, M.Owner, TumblingWindow(second, 120)
HAVING AVGTemp >= 50

Also we have mentioned that output should go to the specific Service Bus connection. This will post a message in Service Bus Queue.

In order to make the sample more interactive let’s create a Even Hub simulator which posts a message (sensor reading) to the Event Hubs. You can access the code for the simulator from this link.

Run the simulator from the local machine and run the job in the portal. You can notice the Service Bus Queue gets the messages from the Stream Analytics Job. We can visualize the live job status in the Job Diagram blade in the portal.


Plug a Azure Logic Apps to the Service Bus to listen to the queue and send the alert to the owner. Here the owner’s email address is coming from the reference data.

Capture 3

In the next post we will see how to add custom functions to the Stream Analytics CEP query.





Complex Event Processing (CEP) with Azure Stream Analytics : Getting Started with Streaming Data


Azure Stream Analytics is a Complex Event Processing (CEP) Platform as a Service (PaaS) in Azure. In the series of this blog posts we’ll examine a scenario which helps us understand the CEP capabilities of the Azure Stream Analytics.

Before we begin, let’s have a basic understanding of what CEP is. CEP is any technology that can process the streams of events along with static data. The ability to process both the streaming events and the static data in real-time by combining them in some sort of computation is the key for CEP. In addition to this idea, commercial CEP tools have other auxiliary capabilities like storage, event publishing and etc.

Almost all the major public cloud vendors provide CEP as service under different product names. CEP platforms require large memory and computational power – which are abundant in the cloud. This is possible by the economies of scale of data center infrastructure. This makes the public cloud vendors to offer CEP as service at lower cost than on premise implementations.

Case introduction to the series

Azure Stream Analytics is a managed service, which has a simple process model with input, query processing and output pipelining.

Let’s take a case of a nuclear reactor; there are pipes which carry the water used to cool down the nuclear reactors. In one particular nuclear plant assume there are 16 pipes and each pipe has sensors which detect the temperature of the inner shell of these pipes.

These sensors send temperature data to the cloud, in the cloud we analyze the data with reference data which have more details of the pipes like pipe location, owner for the maintenance and etc. When the average temperature of a pipe goes more than 85 degrees of Celsius within 300 seconds window, then the pipe maintenance owner should be alerted.

Getting started with Stream Analytics

Stream Analytics will get the stream of events from Events Hub, it gets the reference static data from Storage. When the specific condition is met, it will post a message in the Service Bus Queue, this can be consumed by any other applications. (Serverless Functions – but I had a real mess with the VS tooling. Read more about that from here) So removing that from the series of this blog post and adding Logic Apps)

In order to do this, first we have to set up the following Azure resources.

  • Event Hubs
  • Storage Account
  • Azure Stream Analytics Unit
  • Azure Storage account
  • Service Bus
  • Azure Functions Logic Apps

The below image describes the high level picture of the explained setup.


In this part of this series. Let’s plug the streaming data input and try the CEP query in a minimalist way. Navigate to the Stream Analytics service and add the Event Hubs (temperatureinput) as input. You can do this by clicking on the Input box and selecting the source in the relevant form in the Azure portal.

After adding the input, select the Query from the Job Topology section.

Before testing on the event data we can upload a sample file which represents the events and try the query. Assume the below JSON object is the sample event which represents a single reading from a sensor.


We will feed a sample set of above events from a file and try the CEP query. This sample file has 100 readings like the above one. You can download the sample file from this link.

Click on the Query in the Job Topology and you can upload the sample file in the input feed. Then we can write the query to satisfy our condition. Click the Test button to run the query.


T.PipeCode, AVG(T.Temperature) AVGTemp
TemperatureInput T
GROUP BY T.PipeCode, TumblingWindow(second, 300)
HAVING AVGTemp >= 85

The query is similar to the SQL syntax. The key aspect is the windowing concept. Since we’re dealing with the stream of data we have to specify a window to get the result of a specific time frame. Read more about windowing from this link.

So in the result you can see one pipe with the pipe code 7 has an average temperature greater than 85. Which is shown in the portal.

Next post will cover the full case of dealing with the simulated streaming data and delivering the results to the Service Bud and alerting the owner with the reference data.


ASP.NET Core Dependency Injection


ASP.NET Core comes with an inbuilt Dependency Injection (DI) module. We can register custom DI modules as well. This post explains the fundamentals of the inbuilt DI module available in the ASP.NET Core.

Exploring deeper into the Service Registrations

Get the Code for the below experiment from my GitHib

ASP.NET Core provides 3 generic service registration types for custom services.

  • Singleton – One instance of the dependency to serve across all the requests.
  • Transient – Different instances for each dependent call, thus creating different instances of the injected service in a single request call flow.
  • Scoped – Single instance of the dependency in a single call flow. Within the single request call one same instance will be used.

Apart from the above 3 generic service registrations ASP.NET Core provides other inbuilt service registration methods.  Let’ see how the these three generic service registration types work and how the lifecycle of the instances are handled.


Let’s have one common interface IMyService and create 3 different types of interfaces from the common type interface, each for the different service type registrations.

public interface IMyService
Guid Id { get; set; }

public interface IMySingeltonService : IMyService

public interface IMyTransientService : IMyService

public interface IMyScopedService : IMyService

Then let’s implement the above interfaces with three different classes. All these classes will create a new Guid in the constructor.

public class MySingletonService : IMySingeltonService
public Guid Id { get; set; }

public MySingletonService()
Id = Guid.NewGuid();

public class MyTransientService : IMyTransientService
public Guid Id { get; set; }

public MyTransientService()
Id = Guid.NewGuid();

public class MyScopedService : IMyScopedService
public Guid Id { get; set; }

public MyScopedService()
Id = Guid.NewGuid();

In the constructors of the implemenations we generate a Guid and we’ll print this in the View to see how many times the service instances are being instantiated. In ordert to do that let’s register the services with the right generic service registration type method respective to their implmentation name. (the generics syntax < > is not getting formatted with the wordpress code tag so I pasted the image for the below snippet. )


blog code 1

We can inject the services into the HomeController with the following constructir and will print the Id of each service in the View.

private readonly IMySingeltonService _singletonService;
private readonly IMyTransientService _transientService;
private readonly IMyScopedService _scopedService;

public HomeController(IMySingeltonService singletonService, IMyTransientService transientService, 
 IMyScopedService scopedService)
 _singletonService = singletonService;
 _transientService = transientService;
 _scopedService = scopedService;

public IActionResult Index()
 ViewBag.Singleton = _singletonService.Id;
 ViewBag.Transient = _transientService.Id;
 ViewBag.Scoped = _scopedService.Id;

return View(ViewBag);

When we run the application we will get the  below results. 2 different requests are compared.

blog 2

You can note, the Singleton implementation is same across different requests. Only one instance of service which is registered as Singleton available across the requests.

The above implementation does not give a full picture to compare the difference between Transient and Scoped service registrations as they both have difference instances in different requests. In order to understand the behavior of them we need to implement another service.

public interface IMyAnotherService
Guid SingletonId { get; set; }
Guid TransientId { get; set; }
Guid ScopedId { get; set; }


public class MyAnotherService : IMyAnotherService
private readonly IMySingeltonService _singletonService;
private readonly IMyTransientService _transientService;
private readonly IMyScopedService _scopedService;

public Guid SingletonId { get; set; }
public Guid TransientId { get; set; }
public Guid ScopedId { get; set; }

public MyAnotherService(IMySingeltonService singletom, IMyTransientService transient, IMyScopedService scoped)
_singletonService = singletom;
_transientService = transient;
_scopedService = scoped;

SingletonId = singletom.Id;
TransientId = transient.Id;
ScopedId = scoped.Id;


Do the requried changes in the Controller to accpet IMyAnotherService.

private readonly IMySingeltonService _singletonService;
private readonly IMyTransientService _transientService;
private readonly IMyScopedService _scopedService;
private readonly IMyAnotherService _anotherService;

public HomeController(IMySingeltonService singletonService, IMyTransientService transientService,
IMyScopedService scopedService, IMyAnotherService anotherService)
_singletonService = singletonService;
_transientService = transientService;
_scopedService = scopedService;
_anotherService = anotherService;

public IActionResult Index()
ViewBag.Singleton = _singletonService.Id;
ViewBag.Transient = _transientService.Id;
ViewBag.Scoped = _scopedService.Id;

ViewBag.AnotherSingleton = _anotherService.SingletonId;
ViewBag.AnotherTransient = _anotherService.TransientId;
ViewBag.AnotherScoped = _anotherService.ScopedId;

return View(ViewBag);

Now we can register the IMyAnother service in different modes and check the instance ouptuts. The below figure explains the instance lifetime. For the same instance the similar color is maintained.

blog 4

In a simpler form we summarize this like below. How many times a construcotr is being called.

  • Singleton – Once in the application lifetime.
  • Transient – Everytime the instance is requested regardless of the request.
  • Scoped – Once per request regardless of how many services use it.

DI figure

When IMyAnohterService is added as a Scoped service the below image shows two different requests.

Singleton service remains same across all the requests.

Transient service changes between HomeController and IMyAnotherService within the same request.

Scoped service does not change in the same request as it’s the same instance for both the HomeController and IMyAnotherService but between requests it changes.


blog 5

Interesting Scenrio IHttpContextAccessor 

In ASP.NET Core DI model the framework also provides some additional injection methods for some known scenarios. Like registering EF DbContext using the AddDbContext method. This method by default injects the DbContext in the Scoped mode.

But the interesting scenario is registering IHttpContextAccessor as Singleton as shown below.

blog 6

This service is used to access the HttpContext of the request, so registering this service as Singleton based on the official documentation collides with the experiement we did above, because having the Singleton registration would not give the flexibility to get the HttpContext per request.

But the framework handles it and this is explained well in this blog post


We have the understanding of the DI in ASP.NET Core and some special in built framework DI methods.

In the business logic services it’s good we add them as Scoped, unless we have a generic implementation of some functions like email.

API Architecture – Distinguish between Web Application and API

A RESTful service is a common element in any modern system. But not all the RESTful service implementations are valid Web APIs. This first post of the API architecture series, covers the idea of separation of an API from the web application based RESTful implementations.

Often we see a figure analogous to the one below in many diagrams. Regardless of what the diagram is called, if we look at this figure the direct meaning it gives, is the separation of the API and the clients.

client and api

In web development, a Javascript client talks to the RESTful endpoints designed specifically for it. There’s nothing wrong in this implementation, it is perfectly fine, but the concern is – the system does not have an API. Having mere technical implementation of a RESTful service does not create an API.

In order claim an API based architecture, the RESTful service implementation should have the following characteristics.


#1 Individually deployable

API assemblies should be deployable in a separate instance without the consuming client. Often Javascript clients are deployed together with the RESTful service, because these clients are often developed along with the API. Separate deployment model gives the benefit of scaling the API when required.

#2 Stateless or centralized state management

APIs should be either stateless and get the state information from the client request or should contain a centralized state persistence. Centralized state cache is often an in-memory caching implementation like Redi.

#3 Avoid Client Commanded Implementations

I strongly advocate this point, developers sometimes tend to do this if they find performing the required view model construction in the client is not preferable. It is hard to draw the boundary, because some aggregate operations are better be performed in the server but some cases are pure violations, Example – assume an endpoint has response body like the below JSON payload.


id : 1,

firstName : "Thurupathan",

lastName : "Vijayakumar"


And having another endpoint just for the purpose of another view which has a response output like the following JSON payload is a violation.


id : 1,

fullName : "Thurupathan Vijayakumar"


The above example is a very simple one, but I have seen worst implementations that API has responses of color codes and CSS classes, where the reason was purely that developers did not want to write code in Javascript. I would call such cases as client commanded implementations.

(Back end for front end) BFF model implementations have different response messages based on the client, but that is different from view model construction. Also the features like sorting and pagination are not part of the view model construction.

#4 No cookies

A RESTful service which is exposed as an API should NOT accept any information in the cookies. This again happens with the tight coupling of web application with the RESTful service. Developers create a cookie and the consequent request from the Javascript clients to the RESTful service sends the information from the cookies. All data from client to the API should be in query string, request body or in HTTP headers.

#5 Documentation

This is not a must, but a near must for any API implementation. There are many documentation libraries available like Swagger, TRex and other API Management Gateways also provide good API documentation services.

#6 Authentication

Most public developer APIs are secured by the API keys. Ex – Google Maps. The key is often used not only in authentication but also in monitoring the request rates and etc. If the API is private (either to a single organization or few accepted parties) the recommended practice is to have an authentication. Mostly this is an OAuth 2 based authentication and there are several token formats whilst JWT being the well-known one.

With those concerns we can have a API implemented for the systems. There are many readings in based on the different aspects of the API, like documentation, hyper media, versioning, authentication, caching, scopes and etc. You can find ample amount of resources in the web sphere. This post gives the fundamental idea – separation of API and web application implementation. In the next post we will discuss more about implementing an API using the ASP.NET Core and EF Core stack.

Azure Elastic Pool and EF Core Architecture and Developer Workflow

Introduction to the Azure Elastic Pools and Elastic Jobs

Azure Elastic Database Pool is a service which helps to share the database resources (DTUs) and the management of many databases easy. This is the tool, for solutions which follow the dedicated database tenant isolation strategy. Because apart from the resource elasticity between the database Elastic Pools have Elastic Jobs which is a separate configuration, which allows to manage the databases from a single interface.

Having different databases for different tenants or for any other isolations, comes with the concern of a shared index of those databases. Application get to know which database to connect from this shared index. Often this index is known as master database.

There can be situations where master database is not required in determining which database to connect when application get this information from some other sources like from a cache or from configurations or from request claims and etc.

Azure Elastic Pool has a tool known as Shared Map Manager (which is the master database), this Shared Map Manager works along with the Elastic Client Library. Elastic Client Library has a good support with the Entity Framework.

When it comes to EF Core, the support of the Elastic Client Library seems not actively available, and EF Core has its own differences compared to Entity Framework as well. This blog post addresses these issues.

Architecture Pattern for using Elastic Database Pool with EF Core

Assume we have a multiple tenants with different dedicated databases and they are configured in a pool. Also we have the Elastic Jobs configured in the subscription. (A subscription can have only one Elastic Jobs interface and multiple pools can use the same interface) Read more about Elastic Jobs and how to configure them

The below figure shows the architecture model for the Azure Elastic Pool development.

Elastic Database Pool Architecture

The below points summarize the idea.

  1. Developer generates the code first migrations from his IDE (typically this is VS PMC) pointing to the Developer Db.
  2. Developer Db can be any valid SQL instance – either an Azure SQL instance or a in the developer machine.
  3. EF Core DbContext parameter-less constructor should point to the Developer Db.
  4. In the pool apart from the tenant databases, create a database which holds only the schema – this database is known as Zero Db
  5. Zero Db will be used to generate the delta of the schema. Developers can use VS Database Project to generate the TSQL of delta.
  6. The generated delta script will be executed using the Elastic Jobs interface either by any sort of automation or using the Azure portal itself.


Developers can generate the scripts without having the Zero database, using the EF Core commands, but I highly recommend to have a schema only database in the pool, because of the following reasons.

  1. Zero Db can be used to generate the delta TSQL
  2. At any given point we have the schema only version of the production schema
  3. Creating a new tenant database is easy at any point as we can simply copy the Zero Db


In order to operate an enterprise scale solution, addition to the working code the entire flow of the development and deployment should be in place.

In order to make the developer migrations work as seamless with the existing VS PMC based experience and also to trigger the right database connection for the right tenant, we need an implementation of DbContext in the following way.

public class MassRoverContext : DbContext
public MassRoverContext()

public MassRoverContext(DbContextOptionsBuilder optionsBuilder)

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
if (optionsBuilder.IsConfigured == false)

public DbSet RoverEquipment { get; set; }

The parameter-less constructor of the DbContext will be used in the migrations.

The constructor takes a DbContextOptionsBuilder<MassRoverContext> will be used in the applications which points to the right database determined by some logic in the application.

The below code snippet shows a common service which determines the correct DbContext for the application in ASP.NET Core

public class ContextService : IContextService
private readonly HttpContext _httpContext;
private readonly ConnectionSettings _connectionSettings;

public ContextService(IHttpContextAccessor httpContentAccessor, IOptions settings)
_httpContext = httpContentAccessor.HttpContext;
_connectionSettings = settings.Value;

public MassRoverContext MassRoverContext

var sqlConnectionBuilder = new SqlConnectionStringBuilder(_connectionSettings.ZeroDbConnectionString);

// determine the mapping based on some value. In this sample the database mapping is determined by a header value tenantid
sqlConnectionBuilder.Add("Database", _httpContext.Request.Headers["tenantid"].ToString());

var dbOptionsBuidler = new DbContextOptionsBuilder();

return new MassRoverContext(dbOptionsBuidler);


The shared index data source which holds the mapping of the tenant and the database is not shown in the above figure, since this shared index can be anything. In our example for the simplicity, assume the tenant id is in the request claim and this claim holds the name of the database. This SHOULD NOT BE USED IN PRODUCTION

The registration of the services is not included in this post, since posting the entire flow of how EF Core project and ASP.NET Core project connects is outside the scope of this article. But the full code is available in this repo.

Continuous Deployment & Automation

In this section let’s discover how the continuous deployment can be performed. We have the delta TSQL generated either by comparing the schema between the Developer Db and Zero Db or by any other mechanism.

Now we should prepare an automation script or a running code to submit the delta TSQL to the Elastic Jobs interface. Elastic Job will run the job across the databases in the pool.

I created a simple console application which submits the task to the Elastic Jobs interface. Access the code from this repo

The Elastic Jobs SDK is still in preview of the time of this writing and I will be conintously updating this cosole application and will post a seperate blog on how to plug this with VSTS Build and Release Definitions.




Understanding GDPR and personal data

General Data Protection Regulation (GDPR) the law imposes new rules on companies, government agencies, non-profits, and other organizations that offer goods and services to people in the European Union (EU), or that collect and analyze data tied to EU residents. The GDPR applies no matter where you are located.

I have been reading the key aspects of the GDPR from the above official site, and thought to summarize the points I came across in order to understand what GDPR is, how we can make the systems complaint to the GDPR.

  1. GDPR is a regulation will take effect in May 2018
  2. It gives more control to the EU citizens over their personal data
  3. GDPR enforces controls over data collection, usage, storage, handling breaches, transparency, expiry and etc.
  4. GDPR puts non-technical regulations as well, in few scenarios. Ex – A must to have data security officer in organization in certain scenarios.
  5. It does not explicitly mention on encryption or end-to-end data protection but the regulatory requirements will force the applications deal with personal data to use those technologies.
  6. Strict security audits, logs and documentation should be in place in order to be complaint with the GDPR.

GDPR defines ‘personal data’ as – any information related to a natural person or ‘Data Subject’, that can be used to directly or indirectly identify the person.

According to the definition name, address, credit card number, social security number, bank account number, telephone number, license number, vehicle number and any other explicit identification information is a personal data. But the following types of data also fall under the personal data category.

  1. Calls to customer care services or any other voice based services where they record the voice of the user. Though the voice record takes place without the name and with the full anonymity, still the voice data itself should be treated as a personal data.
  2. Any video surveillance recordings (CCTV) or any other visual recordings should be treated as personal data (both in the edge and cloud storages).
  3. Any forms of biometric data should be treated personal data.
  4. Drawings which represents the state of a real image like drawing of a portrait, a family or anything which exhibits the behavior of socio cultural aspects of a family or an individual can be considered as personal data.
  5. Value of an asset – this is a number but still considered as a personal when it is linked with the a person’s profile and can be used to guestimate the economic state and the obligation of the person.
  6. Call logs and any other data usage logs.
  7. Real-time geo location monitoring and geo-location data. Ex –  Uber drivers
  8. Meeting minutes and any forms of such data for official purpose which any relatable links or traceable information to a person.
  9. Any sort of medical imagery with traceable information.
  10. Photos, videos and voice recordings of any person on any sort is a personal data.
  11. Any non-aggregated data which reveals consumer patterns on goods and services.
  12. IP addresses – including dynamic IPs

The above list is does not include all, but summarizes the personal data in short from the above link. In my opinion any data seems to be a personal data as long as it can be traced and tracked to a person.

In certain scenarios GDPR ensures the organizations to have non-technical complaints such as having a data security officer and etc.

Encrypting the personal data is a one aspect of GDPR which is covered by the clause of “pseudonymous data”. This does not make the solutions complaint to the GDPR because encrypted personal data is also considered as personal data, but this gives some relaxation on on security breaches and how the breach should be handled and notified.

In summary all the solutions should have the technical and non-technical aspects of

  1. Why we collect and store the personal data
  2. How the personal data is used
  3. Transparency of the usage and sharing policies of the personal data
  4. Store personal data as pseudonymous data
  5. Continuous security auditing and monitoring
  6. Notification to the users upon breaches and policy changes

Azure B2C with custom attributes with predetermined values

Azure B2C is a large membership database which also provides the features of tokens, sessions and membership/authentication experience (sign-up, sign-in, forget password and etc). But there are some scenarios which are little tricky based on how the entire solution is handled. Let me explain such a use case and describe different ways to handle that in B2C.

Case : You have an application which is a reselling portal, where a user can either be a seller or a buyer. During the registration / sign-up process user type will automatically be detected by the application, thus the user does not need to select the type. The below diagram explains the case.

Figure 1


Question: In this case why we cannot pass the parameters from step 1which holds the user type value and populate that value in a hidden field in the custom.html or the rendered mobile view in step 2. So it is straight forward to persist that information in B2C.

Answer: Since the rendering is controlled by B2C, any script execution is not allowed in that context. (I’m not sure is there any way to do this). Also the same html view is rendered in mobile or any other native clients passing the information from step 1 to step 2 in any means is not a safe option even if it is possible in any manner.

So we end up having the trouble of passing the user type information from step 1 to step 2 and instruct the B2C to persist that information.


There are different solutions. The trade off is always between how much of control we’re going to take from B2C and how much of control we let the B2C to have. This comes with the cost of development effort and control.

Solution 1: Let B2C handle the case much as possible and application controlled fields like user type to be saved in custom database column. And optionally update the B2C custom attribute using Graph API. Figure 2 explains this.


Figure 2


In this way, we get the benefits of the B2C policies and how to handle the auxiliary authentication services like password management, profiles offered by B2C. Most applications follow this without updating the B2C back using the Graph API.

Solution 2: Take control from B2C to the custom application and use B2C as a membership database.

Figure 3

In this model – In the step 1 some custom attribute values are determined (ex – user type) and passed to the step 2 which is a view controlled purely by the developer. Then step 2 passes the information to the application API in step 3. Server application updates B2C in step 4 and receives the JWT token in step 5. Application updates the database with the oid and other parameters in step 6.

What B2C could do in the future

In the current state of B2C – B2C has applications and policies. Applications and policies can be used many combinations. One application can have many policies of the same type with different settings and also one policy can be used across many applications as well.

In the custom rendering B2C should allow to have hide attributes with the default values. In that way in modeling the above scenario we can have have different policies with different settings and default values.

Primary critical requirement is in the below screen that B2C should add are..

  1. show/hide fields
  2. set default values


Place your business right– Cost Effective Architecture.

In the recent times, I have been working on building a product for a ‘pure startup’. What I meant by ‘pure startup’ is not about how novel the concept is, but development of the product needs to be started from the ground zero.

This scenario often includes the technical stakeholders to take part in the the product features and feature release plans. The technical stakeholders share and believe (to some extent) the vision of the entrepreneur.

Generally my role is to facilitate this process from the technical point of view, whilst aiding the entrepreneur reach his goals with his product.

Technical stakeholders tend to put all the great new stuff and deliver the product for a billion dollar business; entrepreneurs often occupied by the big dream of the product and can easily be mislead from the business goals.

The argument is not about not using great technologies and all the buzzwords, but the argument mostly comes – what we require at which stage, how big the idea is, how fast it can grow to hit the first million users, would the current architecture supports it and etc.

The confusion was from a real experience, that one of the recent startups wanted to go with one of the big cluster management / micro services platform in Azure for a relatively simple operational business.

Technical stakeholders of the project felt that, the micro services approach is not required at the beginning, and suggested the simple solution. The first question popped from the business stakeholders is – “Fine, but how much effort is required to go for micro services model at a later stage if it’s required?”

The answer is simple “WE DON’T KNOW, but we’ll guide you on any technology issues you face in the product and make sure it works” –  That’s the commitment and that’s the way things work.

That does not mean that we shouldn’t not start with complex architecture and cumbersome technologies, there’s always a balance and at the core we should discover what is right for the business. In order to help this decision I coined the term Cost Effective Architecture.

As businesses grow and change, the agility in the technology and the chosen platform becomes quintessential. Moving the business forward with the right technological decisions for the existing context, whilst considering the future change in a cost-effective manner is the core idea of Cost Effective Architecture (CEA).

Let’s categorize the businesses under two categories based on the target users.

1. Internal systems – The user base is known before the enrollment. Multi-tenant systems where a tenant can be an organization or department or any such group. The probability of the sudden usage spikes and user base expansion is highly unlikely. We can call them as tenant based products as well, but they are not necessarily the tenant based. Intranet systems automatically falls under this category (I wonder is there anyone developing Intranet solutions now). Good examples are Office 365, Salesforce and etc.

2. Public systems – Targeted for general public, user enrollment is not known before the enrollment. Mostly these are user based systems, the concept tenants rarely observed. The probability of the sudden usage spikes and user base expansion is expected based on the business model. Any public content applications fall under this category. Good examples are Spotify, LinkedIn and etc.

Another main categorization of the products based on the current user base.

1. Existing systems – These are the systems which already have gone production and have decent battle time in the live environment. Usage patterns and loads are fairly predictable.

2. Startup systems – These are either not launched or not developed. They can have a solid business model/requirement or they simply some great evolving ideas from entrepreneurs.

These two different categorizations leave us, with the following 2×2 model.


Business Categorization of CEA

C1 – Internal systems with existing users. Internal systems can be more complex than some public systems, because they can have large number of users and many integrations with legacy and non legacy systems. The development changes occur due to technology invalidation, cost of maintenance, new feature requirements or scale out/up limitations of the predicted usage.

Since the new user enrollment is mostly predictable and the existing usage of the systems are well known, design and technology decisions have the luxury to have in depth comparison and analysis. These systems often have the bulkiness with them to move, as this is another reason for the slow development. Often collateral systems are developed (APIs, mobile apps) around the C1 systems and core solution of the C1 would move slowly towards the change.

C2 – These are public systems with existing users, the usage is somewhat known but unpredictable. Development of new features and changes are frequent. Technology decisions are based on the principle of keeping enough headroom for next wave of the new active users, thus abundance of the computing resources is justifiable as long as the growth is predictable. Downtime is critical since these are public systems. Agility and DevOps are key existence factors.

C3 – Internal startup systems. Go to market or launch is a critical factor. More often the development of these systems begin with a fixed number of promised user base. Business model is validated through implementations. Since these are internal systems usage spikes are unlikely, but that should not influence to bad technology selection. Sub systems are somewhat understood and discovered.

C4 – Systems with the intention to reach market with possible business cases. Mostly a hyper spike is not expected and validation of the market adaptability and usage needs to be discovered. Mission critical infrastructure or advanced design decisions can be put on hold in order to save time and money. It does not mean to implement bad practices but initiating the development in a way that can be scaled to larger scenarios with significantly less effort. In the cloud PaaS offerings helps to gain this. Primary focus is go to market as soon as possible.

Note :

Though CEA has the above categorizations, it does not force the ideas under a specific category just because a business requirement or case is theoretically mapped to those scenarios. Example – A novel idea can directly be considered under C2 rather being in C4, from the very first day of the system design and development, as long as the user base and market anticipation is clear and well understood. Systems can be started with costly reliable mission critical infrastructure and major design decisions and strategies can be considered upfront in such cases.