How the different DNAs of Amazon, Microsoft and Google influence their Cloud Platforms.

Disclaimer: This is an opinionated post. The views and platitudes are solely based on my own experience and observations.

AAG – AWS, Azure & GCP

AWS, Azure and GCP are respectively from Amazon, Microsoft, and Google. These companies have different roots, values strengths and weaknesses. Each of them has a different DNA, which influences their cloud services in diverse ways. This article shows a completely different perspective about how the different DNAs of these organizations have been shaping their cloud market.  

AWS – The Retail DNA

AWS has the retail DNA from the roots of Amazon’s e-commerce business culture. Few notable key traits of the retail DNA are shipping fast to be the first, more focus on volume than margins and packaging under own brands known as private labeling.

Amazon launched AWS in 2006. Early adapters and open-source folks went ahead with AWS, this includes many current successful startups who were catching up during 2008-2013. Although Microsoft launched Azure later that period in 2011, it was less matured, and Microsoft did not have a good repo with open-source communities then. Being the first to market without a serious competition, AWS took the whole advantage of the situation during that period.

AWS follows a continuous innovation cycle and keeps on releasing new services although those services are either less popular or only useful to a smaller set of customers. AWS does this to be the first in the market, not worrying about the bottom-line.

Another interesting trait of the retail DNA is Private Labeling. Private labeling is a business technique used by retail players to package common goods from suppliers under their own labels with some value additions. AWS uses this technique very cleverly. AWS has an inherent weakness of not having any established software or operating systems of its own. This does not play well for AWS when it comes to cloud lock-in or giving generous discounts to the customers on software licenses. However, using private labeling AWS has been successfully battling this challenge by creating its own services. Few examples are Aurora DB which is a private label of MySQL/Postgres and Redshift is another successful example.

Azure – The Modern Enterprise DNA

Azure has the DNA of a modern enterprise. Modern Enterprise DNA has old traits like bottom line focus, partner ecosystem and speaking the corporate lingo combined with modern traits such as innovation, openness, and platform strategy.

Azure is not a laggard when it comes to innovations, Azure has its own share of innovative services with more focus on developer productivity and enterprise adaption. Azure Active Directory, Azure Cosmos Database, Azure Functions and Azure Lighthouse are few of those several enterprise-focused innovative services.

Generally, Azure targets its innovations at stable markets where they anticipate greater adaption, they do not invest much on niche market areas just to appear cool. This may be because of the traditional bottom-line focused business orientation. Because of this trait, sometimes we can notice that Azure terminates few services at their beta stage without releasing in General Availability, thus focusing on stable high reach bottom-line focused innovations over diversity of the service portfolio.

Having a rich partner ecosystem is another key strength of Microsoft. This has given an unbeatable position for Microsoft in hybrid cloud market with its Azure Stack suite. Azure Stack is a portfolio of products that extends the Azure capabilities to any environment. It has three products Azure Stack Edge, Azure Stack HCI and Azure Stack Hub. In other terms, Azure Stack is Azure in different versions, loaded in different hardware and bundled together for customers having different hybrid cloud demands. This is only possible by Microsoft because of its long-standing partner ecosystem and OEM partner network.

GCP – Internet Services DNA

GCP has the DNA of an Internet services company; in fact, there is no surprise as it is coming from Google. Google leads the Internet based consumer services; we all use Google services in our day-to-day life. Internet services DNA prioritizes individual services over a whole platform, and it prioritizes B2C over B2B.

GCP is the third largest cloud provider by revenue, but the gap between GCP and Azure is big. Also, GCP has a serious competition from Ali Cloud.

GCP has all the required foundational building blocks of a modern cloud, but it lacks the rich portfolio of services what AWS or Azure has. GCP tries to sell the same thing under different packaging, one example – API management service is listed as ‘New Business Channels using APIs’ and ‘Unlocking Legacy Applications using APIs’. Those are two different use cases of the same product, but not two different services. Though some may debate, this is an approach to attract customers with two different needs, other cloud providers do not do the same trick under their list of products.

Google is a successful Internet services company; Google should have been the leader in cloud computing. Ironically, it did not happen because Google did not believe in enterprise businesses. They were so focused on Internet based services and generating revenue by content advertisements. Individual users were more important than big businesses. When they realized big corporates are the big customers for the cloud business it was bit too late, and they had to bring the leadership from outside to get that thinking.

Google’s Internet service DNA has made GCP fragmented, the perception about GCP as one solid platform is vastly missing. Most of us use GCP services without much attention to the whole platform. We use Google Maps in applications, Firebase has become a necessity for mobile development, we use Google search APIs, but we see them as individual services, not as single cloud platform. The single platform thinking is essential to win the enterprise customers, not having such perception is a major downside of GCP.

However, it is not all bad for GCP, amongst these odds Google seems happy with what they are doing. They are showing upward trend in the revenue, and recently won few notable enterprise customers.

Azure Lighthouse – A Cloud Native Managed Services Model for Service Providers

Recently Azure announced this service called ‘Azure Lighthouse’. It allows managed service providers and customers to manage the tenant access and the delegation from a single point of interface in the Azure Portal itself. With some marketing garnish, I would like to call it as Cloud Native Managed Service Model. Let me take you through the fundamentals of Azure Lighthouse.

Before proceeding further, this post assumes, you’re familiar with AAD concepts like tenants/directories, object ids, service principles, RBAC etc. I have not referenced or elaborated them here.

Before diving in, let’s look at how the existing managed service providers access their customer tenants. Generally, they use either one of the following.

  1. Service Provider access Customer Tenant as a Guest.
  2. Service Provider access Customer Tenant with a customer tenant user account.

Consider this example, Aventude Digital with its Azure tenant looking for a partner to manage our Azure Resources. MassRover is a managed service provider; Aventude Digital reaches MassRover and requests their service. Bob is the support engineer from MassRover with his UPN (bob@massrover.onmicrosoft.com) should gain access to Aventude Digital tenant.

Scenario #1

Bob gets access to Aventude Digital tenant as a Guest user. In this case Aventude Digital administrator Linda should invite Bob to her tenant, with the required RBAC permissions. Once Bob receives the invitation, he can access Aventude Digital directory. When Bob logs in using his own UPN (bob@massrover.onmicrosoft.com), he can see two directories in Azure – MassRover directory where he is a direct member and Aventude Digital directory where he’s a guest user.

Bob can switch between them and access the resources as per the granted permissions and continue his support work. The invitation process is manual and repetitive. Below image shows, how Bob access different tenants, being the Guest user.

aventude guest directories

Scenario #2

Bob gets a user account from Aventude Digital tenant. Aventude Digital administrator creates a user account in their directory for Bob, something like bob_ext@aventudedigital.onmicrosoft.com. Bob must use this user to access Aventude Digital tenant. This becomes a mess when Bob manages many customers, because he has to switch between different tenants using different UPNs and related passwords. Bob ends up maintaining a table of UPNs and passwords for each tenant he works for.

In short, Guest access is commonly used. But still this is an AAD level delegation only. It is manual and when Bob selects different directories the authentication takes place and the experience is not smooth.

How Azure Lighthouse Improves this.

Azure Lighthouse offers service providers a single control plane to view and manage Azure across all their customers with higher automation, scale, and enhanced governance. With Azure Lighthouse, service providers can deliver managed services using comprehensive and robust management tooling built into the Azure platform. This offering can also benefit enterprise IT organizations managing resources across multiple tenants.

At the core of Azure Lighthouse is Azure Delegated Resource Management, on top of this Azure Portal based Cross Tenant Management Experience comes. Addition to this, we can have Extended Scenarios like Market Place & Managed Apps.

Rest of this post covers the technical implementation of Azure Delegated Resource Management and Cross Tenant Management Experience.

Delegated access can be done by two aspects, one is by manually executing the Azure Delegated Resource Management ARM scripts or by installing the published Market Place Managed Service Offering from the customer. In this post will cover the manual approach.

First, as a service provider, we should create the required ARM template to obtain the Azure Delegated Resource Management permissions from the customer tenant. These permissions can be obtained at subscription level or at the resource group level. Service Provider prepares the required ARM template, and this should be executed at the customer subscription

Below is the ARM template and the associated parameter file.

{
"$schema": "https://schema.management.azure.com/schemas/2018-05-01/subscriptionDeploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"mspOfferName": {
"type": "string",
"metadata": {
"description": "Specify the name of the offer from the Managed Service Provider"
}
},
"mspOfferDescription": {
"type": "string",
"metadata": {
"description": "Name of the Managed Service Provider offering"
}
},
"managedByTenantId": {
"type": "string",
"metadata": {
"description": "Specify the tenant id of the Managed Service Provider"
}
},
"authorizations": {
"type": "array",
"metadata": {
"description": "Specify an array of objects, containing tuples of Azure Active Directory principalId, a Azure roleDefinitionId, and an optional principalIdDisplayName. The roleDefinition specified is granted to the principalId in the provider's Active Directory and the principalIdDisplayName is visible to customers."
}
}
},
"variables": {
"mspRegistrationName": "[guid(parameters('mspOfferName'))]",
"mspAssignmentName": "[guid(parameters('mspOfferName'))]"
},
"resources": [
{
"type": "Microsoft.ManagedServices/registrationDefinitions",
"apiVersion": "2019-06-01",
"name": "[variables('mspRegistrationName')]",
"properties": {
"registrationDefinitionName": "[parameters('mspOfferName')]",
"description": "[parameters('mspOfferDescription')]",
"managedByTenantId": "[parameters('managedByTenantId')]",
"authorizations": "[parameters('authorizations')]"
}
},
{
"type": "Microsoft.ManagedServices/registrationAssignments",
"apiVersion": "2019-06-01",
"name": "[variables('mspAssignmentName')]",
"dependsOn": [
"[resourceId('Microsoft.ManagedServices/registrationDefinitions/', variables('mspRegistrationName'))]"
],
"properties": {
"registrationDefinitionId": "[resourceId('Microsoft.ManagedServices/registrationDefinitions/', variables('mspRegistrationName'))]"
}
}
],
"outputs": {
"mspOfferName": {
"type": "string",
"value": "[concat('Managed by', ' ', parameters('mspOfferName'))]"
},
"authorizations": {
"type": "array",
"value": "[parameters('authorizations')]"
}
}
}
{
"$schema": "https://schema.management.azure.com/schemas/2018-05-01/subscriptionDeploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"mspOfferName": {
"value": "Aventude Ops Servive"
},
"mspOfferDescription": {
"value": "Aventude Ops Service for Azure Managed Customers Tier1"
},
"managedByTenantId": {
"value": "261e3bf5-f768-49cc-a8bb-ab7dcc73817c"
},
"authorizations": {
"value": [
{
"principalId": "6665e9a2-e27a-42f0-8ce1-203c03255695",
"principalIdDisplayName": "Individual User",
"roleDefinitionId": "b24988ac-6180-42a0-ab88-20f7382dd24c"
},
{
"principalId": "52f00b53-e404-4b0e-9564-ffb8388702cd",
"principalIdDisplayName": "User Group Id (reccomended)",
"roleDefinitionId": "b24988ac-6180-42a0-ab88-20f7382dd24c"
}
]
}
}
}
view raw parameter.json hosted with ❤ by GitHub

The ARM template expects certain meta data like the managed service offering name, description and mainly the required delegated permissions (as authorizations). These authorizations are AAD principles (users / groups / service principles) paired with the RBAC roles. The values are fed to the ARM template using the corresponding parameter file.

AAD principle Ids can be found in the relevant blades (we need to use the respective Object IDs) and RBAC role IDs can be obtained from this link

Example: Bob’s Object ID in the MassRover (service provider) tenant is – 6665e9a2-e27a-42f0-8ce1-203c03255695 and we’re requesting a Contributor permission for this user. Azure RBAC ID for the Contributor role is – b24988ac-6180-42a0-ab88-20f7382dd24c. (obtained from the above link). This combination along with a name we provide to be displayed makes one authorization delegated access management record as below.

azure lighthouse authorization snippet

We can add many and different authorizations.

parameter file with different authorizations

Once the ARM template and associated parameter file are completed, customer should execute this in their subscription. In order to execute this, a non-guest user from the customer tenant with Owner permissions to the subscription is required.

PS C:\Windows\system32> az deployment create --name AzureLightHouseDeployment1 --location southeastasia --template-file "C:\Users\Thuru\Desktop\light house blog\json\al.json" --parameters "C:\Users\Thuru\Desktop\light house blog\json\alparam.json" –verbose

It takes, some time and the CLI will spit out an output json.

I used two tenants for this testing. One is called MassRover (service provider) and the other one is Aventude Digital (customer). Above script is executed at the Aventude Digital subscription and script was prepared with the parameters from MassRover. (Bob is in the MassRover tenant).

After execution. In the MassRover tenant Lighthouse, under the My Customers section we can see Aventude Digital.

In the  Aventude Digital tenant Lighthouse, under the Service Providers section we can see MassRover.

This explains the basic of Azure Lighthouse, but it has some limitations at this point. One of the key limitations is, if DataBricks is provisioned in a tenant, then Azure Delegated Resource Management fails, and there are some other limitations too.

If you’re a service provider Azure Lighthouse provides a greater visibility by being in the marketplace. This requires additional setup via partner portal. Also, using service principle delegation, service providers can programmatically automate management tasks. Customers can view the Service Providers at one place including the granted access permissions.

In this post I have covered only one path of Azure Lighthouse, (subscription level delegated resource management), Let me know your experience with Azure Lighthouse and any interesting combinations.

Enterprise data life cycle management using Azure Storage

Storage is one critical component in the Enterprise world. Managing data and its life cycle is a crucial element in many aspects, such as optimizing storage usage, managing cost, adhering to the compliance & archival requirements, security and etc.

Primarily data is stored in database systems (relational and non-relational sources) and as files (includes data lake and blobs), addition to that, data resides in other systems like email servers, document systems, file shares, event and messaging pipes, logs, caching systems and etc.

Laying out a comprehensive data strategy for an organization is a complex process. However, in most cases the data lands in a flat storage as the final tail grade destination. So managing the storage and life cycle management is an important task.

Let’s consider a simple backup storage scenario.

A relational data source assume a SQL Server VM, has following backup requirement.

Frequency Backup Type # backups Access Frequency
4 hours Incremental 42 Medium
Daily Full 30 High
Weekly Full 12 High
Monthly Full 12 Low
Semi-Annual Full 6 Very Low
Year Full 8 Very Low

At any given time (assuming a complete 8 years span) there should be 110 backups maintained. Those 110 backups, should be kept in the right storage based on the access frequency and retention period.

Azure Storage provides access tiers which helps us to determine and auto manage the storage requirements.  Azure storage (storage generation v2) let us define life cycle policies at blob level.

The below diagram depicts this

storage tiers

As shown in illustration, there are three access tiers, hot, cool and archive. Hot and Cool access tiers can be set at the storage account level, and archive tier is set at the individual blob level.

We can define life cycle policies, where the blob movement between tiers from hot to archive and all the way to deletion can be automated to match our requirements.

Sample life cycle policy of a blob.


{
"rules": [
"enabled": true,
"name": "yearly backup rule",
"type": "Lifecycle",
"definition": {
"actions": {
"baseBlob": {
"tierToCool": {
"daysAfterModificationGreaterThan": 30
},
"tierToArchive": {
"daysAfterModificationGreaterThan": 60
},
"delete": {
"daysAfterModificationGreaterThan": 370
}
}
},
"filters": {
"blobTypes": [
"blockBlob"
],
"prefixMatch": [
"backups/annual"
]
}
}
}
]
}

You can see, under the filters section, we can specify the path, where the rule should be applied. In this way we can have more than one rule for a storage account addressing different paths.

Out of different options in the Azure storage, we should have a standard general purpose V2 storage, in order to get the access tier capability. Standard blob also has the access tier capability. Standard storage is powered by magnetic disks.

Whereas, Premium storage is powered by SSDs but does not offer access tier. Premium storage is intended for the page blobs, like virtual machine disks. Addition to the page blobs, we can use premium storage as blob storage and file shares.

At summary this is the high level view of the available options in Azure Storage.

stroage summary view

 

Deep Dive into Azure Managed Identities – Behind the scenes

Introduction

Sometime back when it was in the preview, I posted an article on Azure Managed Service Identity (MSI) and how we can use it, to eliminate storing credentials in the code, whilst avoiding the bootstrap problem. Read the link for more details.

This post is about Managed Identity, in short, Managed Identity is the new name for Managed Service Identity. Though the purpose and the functionality stay the same, Managed Identities provide more granular control, Azure Portal options and sophisticated improved SDK support, which convinced me enough to write a post.

Managed Identities is a feature of Azure Active Directory (free to use), which helps to eliminate storing credentials in the code. Since, Managed Identities is a feature of AAD, it can be used to authenticate to any Azure service that supports AAD authentication. Let’s start from AAD and drill down into the Managed Identities.

AAD Principles

AAD can have two different principles, user principle and service principle. A user principle is a user object, and a service principle is an instance of application registration.

1

So what is an application in AAD ?  An application is a global template for a service principle. The directory (AAD tenant), the application is registered is known as the home directory. When the permissions / consent has been given to a application the service principle object is created.

Other than the creation and configuration phases, what we deal with is a service principle. I recommend you to use the terms user principle, service principle and application to have the clear understanding in the communication. You can read more about the application and service principle from this link

Managed Identities

Managed Identities are special type of service principles, they are two types.

User assigned Managed Identity – Available to create as a standalone Azure resource. Should be created manually, when created, a corresponding AAD application will be registered (more details below). One Azure resource can have many user assigned managed identities. The life cycle of a user assigned managed identity is independent of the resource life cycle, meaning a user assigned managed identity can exist without being attached to any resource.

System assigned Managed Identity – These are created by Azure when enabling the Managed Identity for a service. The lifetime is scoped to the lifetime of the resource. One service can have only one system assigned Managed Identity.

The below image summarizes the things. Mindful this, is a specific diagram I have created to illustrate the AAD principles. AAD is not limited to below context.

aad managed identities full.PNG

Enabling Managed Identities to a Service (App Service)

I take a simple example of how we can use Managed Identities to access a Azure Key Vault, which contains the secrets.  This article covers creation and assignment of the Managed Identities to App Service, —

app service capture

System Assigned Identity, we have to enable and in the second tab, you can see the User Assigned identity (still in preview).

Enabling System Assigned Managed Identity 

Switch the status to ON and this will create a system assigned managed identity. Just to explain what is happening behind the scenes.

Before enabling run this PowerShell command (you need GA permissions to the tenant) to see the number of service principles in the AAD.


(Get-AzureRmADServicePrincipal).Count

view raw

gistfile1.ps

hosted with ❤ by GitHub

This will give you the number of service principles in the AAD, and after enabling the System Assigned Managed Identity when you run the above command the count will be +1. Also in the portal, you can see the object id of the service principle.

2

Executing the below will give the details


Get-AzureRmADServicePrincipal -ObjectId db9c6f9e-bea0-4325-b18c-dcd6eda668af
ServicePrincipalNames : {98b1ebaf-b6b2-4368-ba53-c36ae0551b90, https://identity.azure.net/4zuSVB9vvyfEk5wvTupj9aFQnGVY0bvqMPfQ9bTKrwk=}
ApplicationId : 98b1ebaf-b6b2-4368-ba53-c36ae0551b90
DisplayName : chimp01
Id : db9c6f9e-bea0-4325-b18c-dcd6eda668af
Type : ServicePrincipal

Behind the scenes, Azure has created a service principle for us. In the portal, under Enterprise Applications, make a search with the Display Name (retrieved from PowerShell), you will see the associated service principle. (make sure, you have selected All Applications in the drop down)

But, this is a special kind of a service principle, which we cannot configure any explicit permissions. If you navigate to the Permissions section you will notice that.

Enabling User Assigned Managed Identity

This is still in preview, and in the second tab of the Identity blade. Here we add a Managed Identity as a standalone resource in Azure. You can add an existing user assigned Managed Identity in the tab as shown below.

Screen Link 001

3.PNG

In order to create a User Assigned Managed Identity,  you can add it in the portal, as a separate resource. Search for User Assigned Managed Identity, and click create.

4

This is like any other Azure resource creation, fill the details and create it.

5.png

After creating the User Assigned Managed Identity, run the above count script, you will see one more service principle in AAD tenant.

Also, if you search the resource name under the Enterprise Applications (All Applications enabled) you will see the service principle.

Additionally, we can see the created Managed Identity as a resource in the specified Resource Group.

6

Now go back to the screen link 001, and you can add the created user assigned Managed Identity.

7

As you can see, we can add more than one user assigned managed identities to a Azure service.

Continuation

We have created and assigned the Managed Identities to our service, next article will explain how to use them both in production and development.

 

 

Democratizing Enterprise Cloud in Azure

Cloud is the new normal; almost, all the enterprises have been going through or at least planning their cloud adoption. Gone are the days, enterprise IT deals with big chunks of metal.

Though the cloud adoption is at its peak, I rarely see democratized cloud adoption in enterprises. Cloud is often used as a centralized IT hosting solution. In this article, let’s analyze the issues for such cases, and what are the options available in Azure to enable democratized cloud adoption with enterprise governance.

It is predicted that, 83% of the workloads will be running in some form of cloud in 2020, where 41% on public cloud.

where IT workloads will run in 2020 : aventude

https://www.logicmonitor.com/wp-content/uploads/2017/12/LogicMonitor-Cloud-2020-The-Future-of-the-Cloud.pdf

Cloud is not only the successor of IT assets and management, but also, it has evolved to provide agility and innovation at scale. These aspects, have been changing the way organizations deal with technology along with other techno-cultural and techno-commercial shifts like DevOps, PaaS and Opex.

public cloud drivers aventude

https://www.logicmonitor.com/wp-content/uploads/2017/12/LogicMonitor-Cloud-2020-The-Future-of-the-Cloud.pdf

As per the above graph, the key motives are agility, DevOps and innovative aspects.

In order to leverage the full potential of the cloud, it is mandatory for the enterprise IT to deliver cloud with its real essence. This will help the cloud adoption, without putting the key motives under threat.

If your enterprise has cloud but still require calls, emails and requests to spin up a resources or to make change, it kills the agility the cloud naturally offers. It’s like buying a Ferrari and restricting it to go in 20 kmph.

Once the agility is killed, innovation is blocked, and soon the cloud becomes a mere hosting solution.

A successful enterprise cloud adoption is not just things are in the cloud, it should be democratized with proper governance, in order to leverage the agility whilst maintaining the governance.

What makes the enterprises not to democratize their cloud adoption?

In most enterprises, the cloud adoption is strictly controlled by the IT, often tampering the autonomy of the business agility and digital transformation cadence.  There are several reasons for this.

  • Cloud Sprawl – Organizations fear cloud sprawl, cloud sprawl refers to the unwanted/uncontrolled cloud footprint, which leads to unnecessary cost.
  • Security – Concerns about security implementations, how the resources should be created, linked, managed and monitored. This knowledge mostly stays with the IT teams and often sensitive, this leads the IT to keep the management within themselves.
  • Governance and Policies – Organizational policies in terms of access levels and governance should be adhered, this is an organizational knowledge (internal) where it often remains tacit. Example – Organizational policies in firewall settings? Patch administration and etc.
  • Unified Tools and licenses – Larger enterprises, especially who have complex IT structure should leverage the maximum return of investments they have made on tools and licenses. So certain tools and licenses are commonly used and certain things are prohibited (partner relationships also play a significant role here). Historically, IT has the knowledge and the relationship management of these tools and license offerings, it creates a dependency on IT to decide on tools and licenses. Example – What license to bring to cloud? what are the available ones? Do we have any alternative tools in-house and etc.
  • Lack of cloud knowledge – Lack of knowledge about the cloud and offerings. Business stakeholders often get confused and try to compare things in wrong ways, this kind of experience often leads the IT to keep the cloud as a black box as possible and forces the IT to centrally manage the cloud.
  • Centralized culture – Enterprises have cultural problems that often create authoritative and knowledge pools, which blocks the democratization of the technology and decision making.

With all these challenges, Finding the right balance between autonomy and the governance is the key.

What Azure has in place?

Earlier, Azure subscriptions are part of a tenant, and under the subscription we have resource groups and then the resources. This hierarchy is very basic and it does not have the flexibility to govern and mange enterprise complexity.

Azure got a new hierarchical elements in structuring enterprise cloud footprint closer to the organizational structure.

The below figure shows the current new structure.

azure management group hierarchty

These management groups can have policies to ensure the governance. Policies can be set at any level. Policies by default inherit the permissions from the level above.

Policies can be very granular like which restrict resource types, SKUs and locations, policies to ensure security aspects like patch, endpoint controls and etc.

Use Cases and structuring

There’s no hard and fast rule on how do we structure the management groups and subscriptions, but it is often better to follow the organizational decision tree. Below are some common structuring approaches.

One organization with departmental separation

aventude: departmental management group structure

Global organization with geographic footprint

aventude : global management group structure

Conglomerates

aventude : conglomerate management group structure :

 

Once the right policies are in place, IT can take a relax approach, like a development team shouldn’t create that big VM, you are always afraid of.

Though, the above hierarchical approach gives lots of flexibility, in certain cases still you may find challenges to address the hierarchical management, especially in the group of companies, where each company has its own CIO office and some policies are controlled centrally. Also, when these business units use different tenants it adds more complexity to the picture.

Regardless, of the tools – the key point I want to stress out from this article is – in enterprise cloud adoption IT teams and management should focus on democratizing the IT much as possible whilst maintaining the governance policies intact.  Too much control at central place will tamper the agility of the cloud and kills the momentum of the digital transformation.

 

 

 

Optimizing Web delivery of the modern front end Applications

JavaScript based front end frameworks have made their unprecedented dominance in application development, even beyond the web. Single Page Application (SPA) delivery is one big aspect of the modern software development.  

Though the engineering aspect has changed over time with many frameworks and tools, the underlying fact of, they are static files haven’t changed. This gives the opinion of serving those static content, from the locations closer to the consumer, rather than from a remote web server.

This will give high performance by reducing network latency. In this article let’s see how to deliver a SPA or any static content with Azure DevOps with the best optimum setup in terms of performance and cost.

Approach

Follow the below approach……

  • Enabling and hosting the static website in Blob Storage
  • Setting up Azure DevOps pipeline
  • Setting Custom Domain
  • Optimization with edge using CDN and enable SSL
  • Azure DevOps considerations in CDN delivery

Enabling & hosting static website in Blob

You can create a standard blob storage in Azure. You will get the static website feature by default, you should enable it for the use. 

Static website hosting has a special container named $web, which is the www root of the static website.

Normally, Blob storage does not allow us to create containers with non alphanumeric characters, but this is a special container created for static website hosting. 

You will get two endpoints primary and secondary. Both will point to the index document. You can upload the index document. You can optionally configure the error document as well.

In this case index.html is used for both.  For testing purposes, just upload a simple html file with the name index.html, then browse any of those endpoints, then you would see the uploaded index.html.

This confirms the Blob storage static web hosting has been enabled and working properly. 

Setting Azure DevOps Pipeline

Now, we have to setup the DevOps pipeline for Continuous Integration and
Continuous Deployment.  Regardless of the framework you use for development (React, Angular, Vue, WebAssembly or anything that came today morning) – end of the day the build artifacts should be bundled as static files.

Different frameworks require different build steps and it varies based on the project context as well. Once the build is completed the artifacts should be uploaded $web container of the Blob storage.

In Azure DevOps you can use Azure Blob File Copy build step to achieve this. This will copy the pointed artifacts to the specified container.

Note, use the version 2* (still in preview as of this writing) the previous versions would complain that a container name cannot be validated with $ character

Custom Domain in Azure Blob static website hosting

Let’s setup a custom domain to our static web site, this would be one important step you require to accomplish in production.

You can use your DNS management or migrate your domain to Azure DNS Zone.

I have used Azure DNS Zone – Go to your DNS settings and create a CNAME record with one of the endpoints as below.

You cannot create a DNS ‘A’ record here, because Storage doesn’t provide a IP.

Because we do not have an ‘A’ record mapping in the DNS, the downside of this is, that we can browse http://www.28368833.com but we CANNOT resolve http://28368833.com

Optimization with edge using CDN and enable SSL

You can further optimize the delivery by bringing the content files to the CDN. Configure a CDN endpoint to the Azure Blob storage.

Delivering via CDN allows to have SSL enabled as well.

Create a CDN endpoint in a CDN profile.

Select the Custom Origin and enter the static website host name of the Blob storage. DO NOT select the Storage as origin type.

Now if you browse through the CDN endpoint (https://aventude-spa.azureedge.net) you will see the web page (the change need sometime)

Since we have changed the delivery address to the CDN, now we have to map the domain to the CDN endpoint. This is quite straightforward as the previous step. You have to create a CNAME entry pointing the CDN endpoint.

Once done, you can navigate to the custom domains in the CDN endpoint and enable the custom domain HTTPS.

Azure DevOps Considerations in delivering in CDN Delivery

When the delivery is optimized via CDN – Whenever we do the artifact publishing to the Blob storage – either we have to purge the CDN or wait for the content to propagate.

Most of the cases, purging is the recommended approach. Azure DevOps has a handy Purge Azure CDN endpoint build step.

This step will trigger the purge operation and changes will be immediately available once purge is completed.

Azure CDN provides DSA (Dynamic Site Acceleration) delivery, in this case purge is not required. Because the content is not stored in the CDN.

In the DSA mode, what CDN does is, optimize the route path from the caller to the origin in the best possible way. If your static content has frequent changes then this approach is recommended over purging.

DSA should be enabled at the time of CDN endpoint creation.

Following drawing summarizes the whole idea here.

You can choose the delivery at the Blob storage level or at the CDN level. The mechanism from the Blob storage to CDN changes based on the update frequency of the content.

Also, note – as stated earlier both the Blob storage and CDN does not allow us to have A record mapping in the DNS. This may be a drawback, but in most practical cases, the front end applications are delivered with the sub domain URL like app.<domain>.com.

In case you need to resolve A record like https://28368833.com, you should do a URL rewrite from the mapped A record destination IP.

Service mesh in Service Fabric

Introduction

Microservices is here to stay and we can witness the increasing popularity and the maturing technology stack which facilitate microservices. In this great article which explains about the maturity of microservices and the 2.0 stack, it mentions three key aspects.

  1. Service mesh
  2. Matured orchestrators
  3. RPC based service protocols.

This post focuses on the communication infrastructure in Service Fabric. Service Mesh is about the communication infrastructure in a microservices / distributed system platform.

First, let’s look at What is a service mesh ?  In the simplest explanation, service mesh is all about service to service communication. Say, service A wants to talk to service B, then Service A should have all the network and communication functionality and the corresponding implementations, in addition to its business logic. Implementation of the the network functionality makes the service development complex and unnecessarily big.

Service mesh abstracts all or the majority of the networking and communications functionality from a service by providing a communication infrastructure, allowing the services to remain clean with their own business logic.

So with that high level understanding if we do some googling and summarize the results, we will have a definition of a service mesh, with these two key attributes.

  • Service mesh is a network infrastructure layer
  • Primary (or the sole) purpose is to facilitate the service to service communication in cloud native applications.

Cloud native ?? – (wink) do not bother much on that, for the sake of this article, it is safe to assume a distributed system’s service communication.

imgpsh_fullsize

Modern service mesh implementations are proxies which run as sidecar for the services. Generally an agent runs on each node and the services run on the node talk to the proxy and proxy does the service resolution and perform communication.

When Service A wants to talk to Service B

  1. When service A calls its local proxy with the request.
  2. The local proxy perform service resolution and makes the request to Service B
  3. Service B replies to the proxy running in Container 1
  4. Service A receives the response from its local proxy
  5. Service B’s local proxy is NOT used in this communication. Only the caller needs a proxy not the respondent.
  6. Service A is NOT aware of service resolution, resiliency and other network functionalities required to make this call.

There are notable service mesh implementations in the market, Linkered and Istio are quite famous and Conduit is another one and many more in the market. This is a good article explaining those different service mesh technologies.

The mentioned service mesh implementations are known in the Kubernetes and Docker based microservices, but what about service mesh in Service Fabric. 


Service mesh is inherent in Service Fabric

Service Fabric has a proxy based communication system. Defining this as a service mesh is up to the agreed definition of service mesh. Typically there should be a control plane and data plane in a service mesh implementation. Before diving into the details of it, let’s see the available proxy based communication setup in Service Fabric.

Reverse Proxy for HTTP Communication

SF has a Reverse Proxy implementation for HTTP communications. This proxy runs an agent in each node when enabled. This reverse proxy handles the service discovery and resiliency in HTTP based service to service communication. If you want to read more practical aspect of the Reverse Proxy implementation, this article explains the service communication and SF reverse proxy implementation.

Reverse Proxy by default runs on port 19081 and can be configured in the clusterManifest.json


{

............

"reverseProxyEndpointPort": "19081"

............

}

In the local development machine this is configured in the clusterManifest.xml

<HttpApplicationGatewayEndpoint Port="19081" Protocol="http" />

When Service A wants to call the Service B’s APIs, it calls its local reverse proxy with a following URL structure.

http://localhost:{port}/{application name}/{service name}/{api action path}

There are many variations of reverse proxy URLs should be used depending what kind of a service the calls are made. This is a detailed article about Service Fabric Reverse Proxy.

RPC Communication in Service Fabric

RPC Communications in Service Fabric are facilitated by the Service Fabric Remoting SDK. The SDK has the following ServiceProxy class.

Microsoft.ServiceFabric.Services.Remoting.Client.ServiceProxy

Service Proxy class creates a lightweight local proxy for RPC communication and provided by the factory implementation in the SDK. Since we use the SDK to create the RPC proxy, in contrast to the HTTP reverse proxy this has the application defined lifespan and there’s no agent runs in each node.

Regardless of the implementation both the HTTP and RPC communication are well supported by Service Fabric by native and has the sidecar based proxy model implementation.


Data Plane and Control Plane in Service Fabric

From the web inferred definition of service mesh, it has two key components, (note, now we’re talking the details of service mesh) known as data plane and control plane. I recommend to read this article which explains the data plane and the control plane in service mesh.

The inbuilt sidecar based communication proxies in Service Fabric form the network communication infrastructure : which represents the data plane component of the service mesh. The sidecar proxies in Service Fabric form the data plane. 

Control plane is generally bit confusing to understand, but in short, it is safe to assume  control plane has the policies to manage and orchestrate the data plane of the service mesh.

In Service Fabric, control plane is not available as per the complete definition in the above article. Most of the control plane functions are application model specific and implemented by the developers and some are in built in the communication and federation subsystem of Service Fabric. The key missing piece in the control plane component of Service Fabric is, the unified UI to manage the communication infrastructure (or the data plane).

The communication infrastructure cannot be managed separate to the application infrastructure, thus a complete control plane is not available in Service Fabric.

With those observations, we can conclude:

Service Fabric’s service mesh is a sidecar proxy based network communication infrastructure, which is leaning much on the data plane attributes of a service mesh.

Service Fabric placement constraints and cluster planning : Virtual Clusters

Introduction

This article explains how to achieve a right service placement strategy and Service Fabric (SF) cluster capacity planning. I have written this post as a continuation of this previous article. Continuing the previous article allows me to extend the same contextual problem and find solutions.

According to the previous article, we should place WFE services in certain set of nodes exposed to LB and internal services in a different set of nodes which are not exposed to LB and optionally they may have access to the backdoor database infrastructure.

In fact what I have tried to achieve is a typical infrastructure setup with DMZ and non DMZ. The difference is I have used single SF cluster to hold the DMZ and non DMZ. 

SF is such a powerful and a flexible platform that you can map many kinds of scenarios like this. In SF, we can achieve these logical splits using placement constraints. In its simplest form placement constraints work based on the properties we set to the nodes.

Node properties are key value pairs used to tag nodes. Through the application we then instruct SF to place certain services in certain nodes which satisfy the placement constraint rules.

Placement constraint is the logical composition of node properties which yields a Boolean value to the run time.

NodeProperty1 == "super" && NodeProperty2 == "nvidGPU"

SF will place the node which meets this criteria and place the service in that node. We decorate the node with these node properties and access them in the application and put placement constraints on services.

You can configure the node properties in Azure portal under the node types. If you’re running the on premise setup we can configure it in the ClusterConfig.json. Like any configuration, placement constraints can also be parameterized in the ApplicationManifest.xml using the corresponding parameters xml file. This article describes it very clearly. 

Virtual Clusters

Let’s see how to setup the cluster. In a sample setup with 6 nodes and FD:UD = 6:6, the DMZ and non DMZ setup is made like below. Here DMZ has 2 nodes and non DMZ has 4 nodes.

FD : Fault Domain, UD : Update Domain

cluster setup - virtual clusters

Nodes are marked with NodeType property ex or nex. WFE services have the placement constraint  (NodeType == ex) and internal services have the placement constraint (NodeType == nex).

Node properties make the logical idea of DMZ. Infrastructure and network configuration will give the real separation. In this case we placed ex nodes and nex nodes in different networks and additionally configured a software firewall in between both subnets.

So this placement strategy creates two virtual clusters inside the real cluster. WFE services are placed in the DMZ (red box) and internal services are placed in non DMZ (yellow box).

Dive Deeper

The above virtual cluster setup creates some challenges in cluster planning. Example, though we have FD:UD = 6:6, by imposing the constraint, WFE services have a FD:UD = 2:2 cluster and internal services have a FD:UD = 4:4 cluster.

So overall cluster planning and how SF makes placement decisions are better be understood and simulated for a better understanding. Before diving, I highly recommend to read this article.

So we know, when setting the cluster we have to specify the FDs and UDs, in fact it is the most important step.

In the simplest form FD:UD ratio is a 1:1 setup. It serves majority of the scenarios.

I have played and with this 1:1 mode and I don’t think I will look into other ratios unless there’s a quirky requirement. Also, if you’re using the Azure cluster this is the default setup and I’m not sure whether you can change that. 😉

Though we can have any number of nodes in the cluster, placement of a service is decided by the availability of FD/UDs. Just increasing the number of nodes in the cluster will not result capacity increase.

First let’s look how SF places the services when there’s no placement constraints defined. The default placement approach SF is adaptive approach. It is a mix of two approaches known as Maximum Difference and Quorum Safe. 

  • Maximum difference is a highly safe placement approach where any replica of a single partition will not be placed in same FD/UD.
  • Quorum safe approach is a minimal safety mode, it is chosen when specific conditions are met. Here SF tries to be economical of the node capacity. The replicas belong to a single partition and the quorum will be treated in maximum difference way and others may be placed in same FD/UD.

Instance / Replica : The term instance is used to refer the stateless service copies and replica is used to refer the stateful service copies but in this article I have used the term replica to refer both.

Quorum: A quorum in a stateless service is the number of requested (instance count) replicas, and a quorum in a stateful service is the number of requested minimum replica set size.

If you have read the recommended article, we can summarize the placement approach of SF with a simple pseudo code like below.

rs: replica size, fd : fault domain, ud: update domain n: number of nodes

if ( rs % fd == 0 && rs % ud == 0 && n <= (fd * ud) )
        return "quorum safe"
else
       return "maximum difference"

SF deciding an approach would not yield the successful placement. Because this is just a decision for the placement strategy, once the decision is made SF looks for available nodes which meet the placement criteria.

If there’s not enough nodes to place the services then SF will throw either an error / warning depending on the situation.

FD:UD = 1:1 Case with Virtual Cluster

The below table shows the cluster  simulation. I created this Excel sheet to understand the cluster and added some functions to simulate the cluster. I have translated the high level logical decisions SF makes into simple Excel functions.

Download from : Cluster Visualization Excel

a1

The first section of this report shows the scenario without any placement constraints. So the all FDs/UDs and all nodes are available to all the services.

Replica minimum is a must to have replica count of a partition of a Stateful service. Target replica is the desired number of replicas for the partition. Stateless services have the replica minimum equal to the target number of replicas, because there’s no such idea as minimum replica in Stateless services.

Observations

  1. Row #15 and #16 – Stateless service replica is greater than available FD/UD. Though they are different approaches the bottom line is that cluster does not have enough number of FD/UD. SF reports an ERROR.
  2. Row #9 – Stateful service minimum replica size is greater than available FD/UD. SF will report an ERROR.  This is a very similar case like above.
  3. Row #10 – Stateful service minimum replica size is lower than available FD/UD but target replica size is higher. SF reports a WARNING.
  4. Row #16 – Stateless service replica is greater than available FD/UD. It’s obvious increasing the number of nodes doesn’t make any sense and SF will not use them as long the FD/UD is not expanded. In Row #17 the same scale is achieved with the optimal setup.
  5. Row #22 and #23 – looks same but they have different approaches.  Both run in the warning state because both approaches have met the minimum replica size but not the target replica size.

Second section has the cluster implementation with the placement constraints. So the report is filled with FD:UD 2:2 in ex and FD:UD = 4:4 in nex. Visualizing them as two difference clusters.

Summary

Here I’ve summarized things for quick decision making.

Rule #1:  In stateless services replicas CANNOT scale more than the number of valid fault domains in the cluster. Trying so will cause error.

Rule #2: In stateful services configured minimum (this cannot be lower than 3) replica count of a partition CANNOT scale more than the number of valid fault domains in the cluster. Trying so will cause error.

Rule #3: Whenever possible SF tries to be economical in its placement decision not using all nodes. Consider Row #18 and #19, here in #19 the SF has 4 nodes in four different FD/UD but still decides Quorum Safe.

Like the static node properties there can be dynamic node properties which are also considered in decision making and influences the available FD/UD. In this article I haven’t covered those cases.

In fact if we’re to summarize the ultimatum

If you’re to scale your service (regardless of stateless or stateful) to x number of copies then you should have minimum x number FDs satisfying all specified placement constraints of that service.

It sounds very analogous to a typical stateless web application scale out. 😉