How the different DNAs of Amazon, Microsoft and Google influence their Cloud Platforms.

Disclaimer: This is an opinionated post. The views and platitudes are solely based on my own experience and observations.

AAG – AWS, Azure & GCP

AWS, Azure and GCP are respectively from Amazon, Microsoft, and Google. These companies have different roots, values strengths and weaknesses. Each of them has a different DNA, which influences their cloud services in diverse ways. This article shows a completely different perspective about how the different DNAs of these organizations have been shaping their cloud market.  

AWS – The Retail DNA

AWS has the retail DNA from the roots of Amazon’s e-commerce business culture. Few notable key traits of the retail DNA are shipping fast to be the first, more focus on volume than margins and packaging under own brands known as private labeling.

Amazon launched AWS in 2006. Early adapters and open-source folks went ahead with AWS, this includes many current successful startups who were catching up during 2008-2013. Although Microsoft launched Azure later that period in 2011, it was less matured, and Microsoft did not have a good repo with open-source communities then. Being the first to market without a serious competition, AWS took the whole advantage of the situation during that period.

AWS follows a continuous innovation cycle and keeps on releasing new services although those services are either less popular or only useful to a smaller set of customers. AWS does this to be the first in the market, not worrying about the bottom-line.

Another interesting trait of the retail DNA is Private Labeling. Private labeling is a business technique used by retail players to package common goods from suppliers under their own labels with some value additions. AWS uses this technique very cleverly. AWS has an inherent weakness of not having any established software or operating systems of its own. This does not play well for AWS when it comes to cloud lock-in or giving generous discounts to the customers on software licenses. However, using private labeling AWS has been successfully battling this challenge by creating its own services. Few examples are Aurora DB which is a private label of MySQL/Postgres and Redshift is another successful example.

Azure – The Modern Enterprise DNA

Azure has the DNA of a modern enterprise. Modern Enterprise DNA has old traits like bottom line focus, partner ecosystem and speaking the corporate lingo combined with modern traits such as innovation, openness, and platform strategy.

Azure is not a laggard when it comes to innovations, Azure has its own share of innovative services with more focus on developer productivity and enterprise adaption. Azure Active Directory, Azure Cosmos Database, Azure Functions and Azure Lighthouse are few of those several enterprise-focused innovative services.

Generally, Azure targets its innovations at stable markets where they anticipate greater adaption, they do not invest much on niche market areas just to appear cool. This may be because of the traditional bottom-line focused business orientation. Because of this trait, sometimes we can notice that Azure terminates few services at their beta stage without releasing in General Availability, thus focusing on stable high reach bottom-line focused innovations over diversity of the service portfolio.

Having a rich partner ecosystem is another key strength of Microsoft. This has given an unbeatable position for Microsoft in hybrid cloud market with its Azure Stack suite. Azure Stack is a portfolio of products that extends the Azure capabilities to any environment. It has three products Azure Stack Edge, Azure Stack HCI and Azure Stack Hub. In other terms, Azure Stack is Azure in different versions, loaded in different hardware and bundled together for customers having different hybrid cloud demands. This is only possible by Microsoft because of its long-standing partner ecosystem and OEM partner network.

GCP – Internet Services DNA

GCP has the DNA of an Internet services company; in fact, there is no surprise as it is coming from Google. Google leads the Internet based consumer services; we all use Google services in our day-to-day life. Internet services DNA prioritizes individual services over a whole platform, and it prioritizes B2C over B2B.

GCP is the third largest cloud provider by revenue, but the gap between GCP and Azure is big. Also, GCP has a serious competition from Ali Cloud.

GCP has all the required foundational building blocks of a modern cloud, but it lacks the rich portfolio of services what AWS or Azure has. GCP tries to sell the same thing under different packaging, one example – API management service is listed as ‘New Business Channels using APIs’ and ‘Unlocking Legacy Applications using APIs’. Those are two different use cases of the same product, but not two different services. Though some may debate, this is an approach to attract customers with two different needs, other cloud providers do not do the same trick under their list of products.

Google is a successful Internet services company; Google should have been the leader in cloud computing. Ironically, it did not happen because Google did not believe in enterprise businesses. They were so focused on Internet based services and generating revenue by content advertisements. Individual users were more important than big businesses. When they realized big corporates are the big customers for the cloud business it was bit too late, and they had to bring the leadership from outside to get that thinking.

Google’s Internet service DNA has made GCP fragmented, the perception about GCP as one solid platform is vastly missing. Most of us use GCP services without much attention to the whole platform. We use Google Maps in applications, Firebase has become a necessity for mobile development, we use Google search APIs, but we see them as individual services, not as single cloud platform. The single platform thinking is essential to win the enterprise customers, not having such perception is a major downside of GCP.

However, it is not all bad for GCP, amongst these odds Google seems happy with what they are doing. They are showing upward trend in the revenue, and recently won few notable enterprise customers.

Service Fabric placement constraints and cluster planning : Virtual Clusters

Introduction

This article explains how to achieve a right service placement strategy and Service Fabric (SF) cluster capacity planning. I have written this post as a continuation of this previous article. Continuing the previous article allows me to extend the same contextual problem and find solutions.

According to the previous article, we should place WFE services in certain set of nodes exposed to LB and internal services in a different set of nodes which are not exposed to LB and optionally they may have access to the backdoor database infrastructure.

In fact what I have tried to achieve is a typical infrastructure setup with DMZ and non DMZ. The difference is I have used single SF cluster to hold the DMZ and non DMZ. 

SF is such a powerful and a flexible platform that you can map many kinds of scenarios like this. In SF, we can achieve these logical splits using placement constraints. In its simplest form placement constraints work based on the properties we set to the nodes.

Node properties are key value pairs used to tag nodes. Through the application we then instruct SF to place certain services in certain nodes which satisfy the placement constraint rules.

Placement constraint is the logical composition of node properties which yields a Boolean value to the run time.

NodeProperty1 == "super" && NodeProperty2 == "nvidGPU"

SF will place the node which meets this criteria and place the service in that node. We decorate the node with these node properties and access them in the application and put placement constraints on services.

You can configure the node properties in Azure portal under the node types. If you’re running the on premise setup we can configure it in the ClusterConfig.json. Like any configuration, placement constraints can also be parameterized in the ApplicationManifest.xml using the corresponding parameters xml file. This article describes it very clearly. 

Virtual Clusters

Let’s see how to setup the cluster. In a sample setup with 6 nodes and FD:UD = 6:6, the DMZ and non DMZ setup is made like below. Here DMZ has 2 nodes and non DMZ has 4 nodes.

FD : Fault Domain, UD : Update Domain

cluster setup - virtual clusters

Nodes are marked with NodeType property ex or nex. WFE services have the placement constraint  (NodeType == ex) and internal services have the placement constraint (NodeType == nex).

Node properties make the logical idea of DMZ. Infrastructure and network configuration will give the real separation. In this case we placed ex nodes and nex nodes in different networks and additionally configured a software firewall in between both subnets.

So this placement strategy creates two virtual clusters inside the real cluster. WFE services are placed in the DMZ (red box) and internal services are placed in non DMZ (yellow box).

Dive Deeper

The above virtual cluster setup creates some challenges in cluster planning. Example, though we have FD:UD = 6:6, by imposing the constraint, WFE services have a FD:UD = 2:2 cluster and internal services have a FD:UD = 4:4 cluster.

So overall cluster planning and how SF makes placement decisions are better be understood and simulated for a better understanding. Before diving, I highly recommend to read this article.

So we know, when setting the cluster we have to specify the FDs and UDs, in fact it is the most important step.

In the simplest form FD:UD ratio is a 1:1 setup. It serves majority of the scenarios.

I have played and with this 1:1 mode and I don’t think I will look into other ratios unless there’s a quirky requirement. Also, if you’re using the Azure cluster this is the default setup and I’m not sure whether you can change that. 😉

Though we can have any number of nodes in the cluster, placement of a service is decided by the availability of FD/UDs. Just increasing the number of nodes in the cluster will not result capacity increase.

First let’s look how SF places the services when there’s no placement constraints defined. The default placement approach SF is adaptive approach. It is a mix of two approaches known as Maximum Difference and Quorum Safe. 

  • Maximum difference is a highly safe placement approach where any replica of a single partition will not be placed in same FD/UD.
  • Quorum safe approach is a minimal safety mode, it is chosen when specific conditions are met. Here SF tries to be economical of the node capacity. The replicas belong to a single partition and the quorum will be treated in maximum difference way and others may be placed in same FD/UD.

Instance / Replica : The term instance is used to refer the stateless service copies and replica is used to refer the stateful service copies but in this article I have used the term replica to refer both.

Quorum: A quorum in a stateless service is the number of requested (instance count) replicas, and a quorum in a stateful service is the number of requested minimum replica set size.

If you have read the recommended article, we can summarize the placement approach of SF with a simple pseudo code like below.

rs: replica size, fd : fault domain, ud: update domain n: number of nodes

if ( rs % fd == 0 && rs % ud == 0 && n <= (fd * ud) )
        return "quorum safe"
else
       return "maximum difference"

SF deciding an approach would not yield the successful placement. Because this is just a decision for the placement strategy, once the decision is made SF looks for available nodes which meet the placement criteria.

If there’s not enough nodes to place the services then SF will throw either an error / warning depending on the situation.

FD:UD = 1:1 Case with Virtual Cluster

The below table shows the cluster  simulation. I created this Excel sheet to understand the cluster and added some functions to simulate the cluster. I have translated the high level logical decisions SF makes into simple Excel functions.

Download from : Cluster Visualization Excel

a1

The first section of this report shows the scenario without any placement constraints. So the all FDs/UDs and all nodes are available to all the services.

Replica minimum is a must to have replica count of a partition of a Stateful service. Target replica is the desired number of replicas for the partition. Stateless services have the replica minimum equal to the target number of replicas, because there’s no such idea as minimum replica in Stateless services.

Observations

  1. Row #15 and #16 – Stateless service replica is greater than available FD/UD. Though they are different approaches the bottom line is that cluster does not have enough number of FD/UD. SF reports an ERROR.
  2. Row #9 – Stateful service minimum replica size is greater than available FD/UD. SF will report an ERROR.  This is a very similar case like above.
  3. Row #10 – Stateful service minimum replica size is lower than available FD/UD but target replica size is higher. SF reports a WARNING.
  4. Row #16 – Stateless service replica is greater than available FD/UD. It’s obvious increasing the number of nodes doesn’t make any sense and SF will not use them as long the FD/UD is not expanded. In Row #17 the same scale is achieved with the optimal setup.
  5. Row #22 and #23 – looks same but they have different approaches.  Both run in the warning state because both approaches have met the minimum replica size but not the target replica size.

Second section has the cluster implementation with the placement constraints. So the report is filled with FD:UD 2:2 in ex and FD:UD = 4:4 in nex. Visualizing them as two difference clusters.

Summary

Here I’ve summarized things for quick decision making.

Rule #1:  In stateless services replicas CANNOT scale more than the number of valid fault domains in the cluster. Trying so will cause error.

Rule #2: In stateful services configured minimum (this cannot be lower than 3) replica count of a partition CANNOT scale more than the number of valid fault domains in the cluster. Trying so will cause error.

Rule #3: Whenever possible SF tries to be economical in its placement decision not using all nodes. Consider Row #18 and #19, here in #19 the SF has 4 nodes in four different FD/UD but still decides Quorum Safe.

Like the static node properties there can be dynamic node properties which are also considered in decision making and influences the available FD/UD. In this article I haven’t covered those cases.

In fact if we’re to summarize the ultimatum

If you’re to scale your service (regardless of stateless or stateful) to x number of copies then you should have minimum x number FDs satisfying all specified placement constraints of that service.

It sounds very analogous to a typical stateless web application scale out. 😉

Deep dive into Azure Cosmos Db Change Feed

Azure Cosmos Db has an impressive feature called ‘Change feed’. It enables capturing the changes in the data (inserts and updates) and provides an unified API to access those captured change events. The change event data feed can be used as an event source in the applications.  You can read about the overview of this feature from this link

From an architecture point of view, the change feed feature can be used as an event sourcing mechanism. Applications can subscribe to the change event feed, By default Cosmos Db is enabled with the change feed,  there are 3 different ways to subscribe to the change feed.

  1. Azure Functions – Serverless Approach
  2. Using Cosmos SQL SDK
  3. Using Change Feed Processor SDK

Using Azure Functions

Setting up the change feed using Azure Functions is straight forward, this is a trigger based mechanism. We can configure a Azure Function using the portal by navigating to the Cosmos Db collection and click ‘Add Azure Function’ in the blade. This will create an Azure Function with the minimum required template to subscribe to the change feed. The below gist shows a mildly altered version of the auto generated template.


using Microsoft.Azure.Documents;
using System.Collections.Generic;
using System;
public static async Task Run(IReadOnlyList<Document> input, TraceWriter log)
{
foreach(var changeInput in input)
{
if(changeInput.GetPropertyValue<string>("city") == "colombo")
{
log.Verbose("Something has happened in Colombo");
}
else
{
log.Verbose("Something has happened in somewhere else");
}
}
log.Verbose("Document count " + input.Count);
log.Verbose("First document Id " + input[0].Id);
}

The above Function gets triggered when a change occurs in the collection (insertion of a new document or an update in the existing document). One change event trigger may contain more than one changed documents, IReadOnlyList  parameter receives the list of changed documents and implements some business logic in a loop.

In order to get the feed from the last changed checkpoint, the serverless function need to persist the checkpoint information. So when we create the Azure Function, in order to capture the change, it will create a Cosmos Db document collection to store the checkpoint information. This collection is known as lease collection. The lease collection stores the continuation information per partition and helps to coordinate multiple subscribers per collection.

The below is a sample lease collection document.


{
"id": "applecosmos.documents.azure.com_BeRbAA==_BeRbALSrmAE=..0",
"_etag": "\"2800a558-0000-0000-0000-5b1fb9180000\"",
"state": 1,
"PartitionId": "0",
"Owner": null,
"ContinuationToken": "\"19\"",
"SequenceNumber": 1,
"_rid": "BeRbAKMEwAADAAAAAAAAAA==",
"_self": "dbs/BeRbAA==/colls/BeRbAKMEwAA=/docs/BeRbAKMEwAADAAAAAAAAAA==/",
"_attachments": "attachments/",
"_ts": 1528805656
}

In practical implementations, we would not worry much about the lease collection structure as this is used by the Azure Function to coordinate the work and subscribe to the right change feed and right checkpoint. Serverless implementation abstracts lots of details and this is the recommended option as per the documentation from Microsoft.

Using Cosmos SQL SDK

We can use the Cosmos SQL SDK to query the change events from Cosmos Db. Use the Cosmos Db NuGet package to add the Cosmos SQL SDK.

Install-Package Microsoft.Azure.DocumentDB

This SDK provides methods to subscribe to the change feed. In this mode, developers should handle the custom checkpoint logic and persist the checkpoint data for continuation. The below gist shows a sample, which describes how to subscribe to the changes per logical partition.


using Microsoft.Azure.Documents;
using Microsoft.Azure.Documents.Client;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace SQLSDK
{
public class ChangeFeedSQLSDKProvider
{
private readonly DocumentClient _documentClient;
private readonly Uri _collectionUri;
public ChangeFeedSQLSDKProvider()
{
}
public ChangeFeedSQLSDKProvider(string url, string key, string database, string collection)
{
_documentClient = new DocumentClient(new Uri(url), key,
new ConnectionPolicy { ConnectionMode = ConnectionMode.Direct, ConnectionProtocol = Protocol.Tcp });
_collectionUri = UriFactory.CreateDocumentCollectionUri(database, collection);
}
public async Task<int> GetChangeFeedAsync(string partitionName)
{
//var partionKeyRangeReponse = await _documentClient.ReadPartitionKeyRangeFeedAsync(_collectionUri, new FeedOptions
//{
// RequestContinuation = await GetContinuationTokenForPartitionAsync(partitionName),
// PartitionKey = new PartitionKey(partitionName)
//});
//var partitionKeyRanges = new List<PartitionKeyRange>();
//partitionKeyRanges.AddRange(partionKeyRangeReponse);
var changeFeedQuery = _documentClient.CreateDocumentChangeFeedQuery(_collectionUri, new ChangeFeedOptions
{
StartFromBeginning = true,
PartitionKey = new PartitionKey(partitionName),
RequestContinuation = await GetContinuationTokenForPartitionAsync(partitionName),
});
var changeDocumentCount = 0;
while (changeFeedQuery.HasMoreResults)
{
var response = await changeFeedQuery.ExecuteNextAsync<DeveloperModel>();
foreach(var document in response)
{
// TODO :: process changes here
Console.WriteLine($"changed for id – {document.Id} with name {document.Name} and skill {document.Skill}");
}
SetContinuationTokenForPartitionAsync(partitionName, response.ResponseContinuation);
changeDocumentCount++;
}
return changeDocumentCount;
}
private async Task<string> GetContinuationTokenForPartitionAsync(string partitionName)
{
// TODO :: retrieve from a key value pair : persistence
return null;
}
private async Task SetContinuationTokenForPartitionAsync(string partitionName, string lsn)
{
// TODO :: get the continuation token from persistence store
}
}
}

The commented lines from line 31-38 shows the mechanism of subscribing at the partition key range. In my opinion, keeping the subscriptions at the logical partition level makes sense in most of the business cases, which is what shown in the above code. Logical partition name is passed as a parameter.

When the change feed is read the continuation token for the specified change feed option  (partition key range or partition key) is returned by the Cosmos Db. This should be explicitly stored by the developer in order to retrieve this and resume the change feed consumption from the point where it was left.

In the code you can notice that the checkpoint information is stored against each partition.

Using Change Processor Library

Cosmos Db has a dedicated Change Processor Library, which eases up the change subscription in custom applications. This library can be used in advance subscribe scenarios as developers do not need to manage partition and continuation token logic.

Install-Package Microsoft.Azure.DocumentDB.ChangeFeedProcessor

Change Processor Library helps handles lots of complexity in handling the coordination of subscribers. The below gist shows the sample code for the change processor library. The change feed subscription is made per the partition range key.


public class ChangeFeedProcessorSDK
{
private readonly DocumentCollectionInfo _monitoredCollection;
private readonly DocumentCollectionInfo _leaseCollection;
public ChangeFeedProcessorSDK(DocumentCollectionInfo monitorCollection, DocumentCollectionInfo leaseCollection)
{
_monitoredCollection = monitorCollection;
_leaseCollection = leaseCollection;
}
public async Task<int> GetChangesAsync()
{
var hostName = $"Host – {Guid.NewGuid().ToString()}";
var builder = new ChangeFeedProcessorBuilder();
builder
.WithHostName(hostName)
.WithFeedCollection(_monitoredCollection)
.WithLeaseCollection(_leaseCollection)
.WithObserverFactory(new CustomObserverFactory());
var processor = await builder.BuildAsync();
await processor.StartAsync();
Console.WriteLine($"Started host – {hostName}");
Console.WriteLine("Press any key to stop");
Console.ReadKey();
await processor.StopAsync();
return 0;
}
}
public class CustomObserverFactory : Microsoft.Azure.Documents.ChangeFeedProcessor.FeedProcessing.IChangeFeedObserverFactory
{
public Microsoft.Azure.Documents.ChangeFeedProcessor.FeedProcessing.IChangeFeedObserver CreateObserver()
{
return new CustomObserver();
}
}
public class CustomObserver : Microsoft.Azure.Documents.ChangeFeedProcessor.FeedProcessing.IChangeFeedObserver
{
public Task CloseAsync(IChangeFeedObserverContext context, Microsoft.Azure.Documents.ChangeFeedProcessor.FeedProcessing.ChangeFeedObserverCloseReason reason)
{
Console.WriteLine($"Closing the listener to the partition key range {context.PartitionKeyRangeId} because {reason}");
return Task.CompletedTask;
}
public Task OpenAsync(IChangeFeedObserverContext context)
{
Console.WriteLine($"Openning the listener to the partition key range {context.PartitionKeyRangeId}");
return Task.CompletedTask;
}
public Task ProcessChangesAsync(IChangeFeedObserverContext context, IReadOnlyList<Document> docs, CancellationToken cancellationToken)
{
foreach(var document in docs)
{
// TODO :: processing logic
Console.WriteLine($"Changed document Id – {document.Id}");
}
return Task.CompletedTask;
}
}

In the above code, the monitored collection and the lease collection are given and the change feed processor builder is built with the minimum required details. As a minimum requirement you should pass the IChangeFeedObserverFactory to the builderThe change feed processor library can manage rest of the things like how to share leases of different partitions between different subscribers and etc. Also, this library has features to implement custom partition processing and load balancing strategies which are not addressed here.

Summary

Cosmos Db change feed is a powerful feature to subscribe to the changes. There are three different ways to do this as mentioned above.

The below table summarizes the options and features.

cosmos change feed summary table

 

Thick API Gateways

I came across the term ‘Overambitious API Gateways’ from Thought Works tech radar. The point is, whether is it good or bad to have business logic in the API Gateways? Since the term Gateway is not a functional requirement and serves the purpose of a reverse proxy; it is quite obvious that including business logic in an API gateway is NOT a good design. But the idea behind the overambitious API gateways, seems to be a finger pointing at the API Gateway vendors, rather than considering the solution design and development and how the API Gateways should be used.

I prefer the term ‘Thick API Gateways‘ over overambitious API Gateways because the implementation is up to the developer regardless of what the tool can offer. This ensures an anti-pattern.

With the advent of microservices architecture, API Gateways gained another additional boost in the developer tool box, compared to other traditional integration technologies.

giphy

Microservices favor the patterns like API composer (aggregation of results from multiple services) Saga (orchestration of services with compensation) at the API Gateway. API Gateways also host other business logic like authorization, model transformation and etc. resulting a Thick API Gateway implementations.

Having said, though thick API gateway is a bad design and brings some awkward feeling at night when you sleep, in few cases it is quite inevitable. If you’re building a solution with different systems and orchestration of the business flows is easy and fast at the API gateway. In some cases it is impossible to change all the back-end services, so we should inject custom code between the services and API gateways to achieve this, which would result other challenges.

At the same time, as developers when we get a new tool we’re excited about it, and we often fall into the ‘if all you have is a hammer, everything looks like a nail‘ paradigm. It’s better to avoid this.

giphy1

Let’s see some practical stuff; in fact, what kind of business logic the modern API gateways can include? For example, if we take the gateway service offered in Azure API Management (APIM), it is enriched with high degree of programmable request/response pipeline.

Below APIM policy, I have provided an authorization template based on the role based claims.

The API gateway decides the authorization to the endpoints based on the role based claims. The sections are commented, first it validates the incoming JWT token, then sets the role claim in the context variable and finally handle authorization to the endpoints based on the role claim.


<policies>
<inbound>
<!– validates RS256 JWT token –>
<validate-jwt header-name="massrover_token" failed-validation-httpcode="401" failed-validation-error-message="Unauthorized"
require-expiration-time="true" require-signed-tokens="true">
<audiences>
<audience>audience id</audience>
</audiences>
<issuers>
<issuer>issuer id</issuer>
</issuers>
<required-claims>
<claim name="role" match="any">
<value>admin</value>
<value>moderator</value>
<value>reader</value>
</claim>
</required-claims>
<openid-config url="https://massrover.idenityserver/.well-known/openid-configuration&quot; />
</validate-jwt>
<!– sets the role claim to the context variable –>
<set-variable name="massrover_role"
value="@(context.Request.Headers["massrover_token"].First().Split(' ')[1].AsJwt()?.Claims["role"].FirstOrDefault())" />
<!– performs authorization based on role claim and allowed http method –>
<choose>
<when condition="@(context.Variables.GetValue("massrover_role").Equals("admin"))">
<forward-request/>
</when>
<when condition="@(context.Variables.GetValue("massrover_role").Equals("moderator")">
<when condition="@(context.Request.Method.Equals("delete", StringComparison.OrdinalIgnoreCase))">
<return-response>
<set-status code="403" reason="Forbidden" />
<set-body>Moderators cannot perform delete action</set-body>
</return-response>
</when>
<otherwise>
<forward-request/>
</otherwise>
</when>
<when condition="@(context.Variables.GetValue("massrover_role").Equals("reader")">
<when condition="@(context.Request.Method.Equals("get", StringComparison.OrdinalIgnoreCase))">
<forward-request/>
</when>
<otherwise>
<return-response>
<set-status code="403" reason="Forbidden" />
<set-body>Readers have only read access</set-body>
</return-response>
</otherwise>
</when>
<otherwise>
<return-response">
<set-status code="405" reason="Not Allowed" />
<set-body>Invalid role claim</set-body>
</return-response>
</otherwise>
</choose>
<base />
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>

Note: This is a thick API gateway implementation and the pros and cons of this is subject to the problem in hand. This above is a practical elaboration of one thick API implementation.

WADLogsTable missing in Cloud Service project

This is observed in Visual Studio Cloud Service Template (Environment  VS 2015 Enterprise Update 3 and Azure SDK 2.9.6 with .NET 4.6). This could be observed in most of the other versions – probably in Azure SDK version 2.4 and above, but I stated my working environment as soon or later this issue will be resolved.

Quick read and the reason : The template has the Trace.TraceInformation as the logging code line, but the configuration is set to log Errors by default. So when you run the application, the service has nothing to log and it doesn’t create WADSLogsTable. By changing the code to Trace.Error or changing the configuration to log information/verbose would solve the issue.

Analysis

Mostly beginners bounce into this issue, and fair reason to get panic because when the just create the fresh Azure Cloud Service out of the box from the available template, it doesn’t work as expected.

Go to the Worker Role properties and you can change the application log level settings to log Information level logs.

default trace log setting - error

or, change the code to this

image

The application creates the table when and only the information need to be persisted, so the available template does not create the WADSLogsTable until you do either of the suggested changes.

Controlling access to your Azure resources using RBAC

Being part of a software services company, customers often ask the question how to restrict access to Azure resources. It is understandable that any organization wouldn’t prefer to give all the rights of the organizational Azure subscription to a person.

In the classic Azure model the only way to give access to Azure portal is, adding the user as a co-admin for the subscription. This gives all the permissions to that user within the subscription except managing the administrators.

But the new Role Based Access Control  (RBAC) helps to solve this problem. Using RBAC we can control the permission scope to either subscriptions, resource groups or to individual resources.

Permissions in the top level scope are automatically inherited to the level below – meaning subscription level users have the same permissions to the resource groups and the resource group level users have the same permission to the individual resources within the resource group.

RBAC has several roles – read more about different roles

Here I’ve explained the flow of adding a new user to a Azure resource group and how his/her experience in accessing Azure via portal. Assume the user doesn’t have any permission in the Azure, and he’s just a developer with a Gmail account.

First, a subscription admin logs in to the portal and add this user in the Azure Active Directory of the specific subscription.

1

Note at this point, developer1 does not have a Microsoft account. She clicks on the link in the email  she received. She will be directed to create a Microsoft account with the specified email address. (if there’s a Microsoft account already available this step will not be required)

2

After creating the Microsoft account (entering a new password and create the Microsoft account), she can login to the Azure portal using – https://portal.azure.com But within the portal this user cannot create any resources. In case if the user tries to create or try to perform any action she will get the below message. This is a very similar message to the old grey error box in classic portal, as the user exist in the Azure Active Directory but does not have a subscription, in this case does not have any resource.

3

Now let the admin assign a resource group for the user.  Assume you have a resource group DevelopmentRG and in the resource group IAM settings add the user (developer1) as a contributor.

4

Contributor is a predefined role in Azure which has the create/edit/delete permissions of the resources within the specified scope. In this case developer1 has those permissions within the resource group – DevelopmentRG.

5

After setting developer1 as a contributor, you can notice that the access type of the user is set to Assigned, because this is a an assigned permission. Also note that the subscription admins have the permission to the resource group as Inherited permission.

6

Now the developer1 logins to the portal and she will see the assigned resource group. Developer1 can perform actions within this resource group.

7

Also note that since, developer1 has only the specified resource group, she cannot create a new resource group or any permission outside the scope of the specific resource group.

8

RBAC provides more granular permissions with various roles required for the businesses, this helps the organizations to carefully delegate the permissions to the people without exposing the entire Azure subscription.

The feature to limit/set the quota for a resource group is in the request from the community.

Distributed Transactions in Azure SQL Databases–Azure App Service and EF

Are you handing more than one SQL database in Azure for your application ? Most of the times the answer would be YES. In dedicated database multi-tenant systems at least you have your customer information in the master database and dedicated application database for each customers. Some CRUD operations need to touch both the master and customer specific databases.

We need MSDTC (Microsoft Distributed Transaction Controller) for distributed transactions in on premise systems, but in Azure the SQL Databases has the elastic distributed transaction feature enabled and using .NET 4.6.1 we can use them via TransactionScope class from Systems.Transactions.

This link explains how this works, but I wanted to test this with EF and Azure App service as the Azure App service has the target platform option .NET 4.6 and not 4.6.1.

I created two logical Azure SQL servers in two different regions, and enabled the transaction communication link between them using PowerShell.

2016-08-27_18-13-09

Then I created a small Web API project using .NET 4.6.2 (which is higher than the required version) and tested the app from the local machine and things worked well. I deployed the same stuff and things worked fine in Azure as well.

Even the though the target platform is .NET 4.6 in the Azure App Service, when we deploy the .NET 4.6.1 and .NET 4.6.2 projects, the required assemblies in the respected platform version are referenced.

But my swagger endpoint behaved strange and didn’t output the results, no idea why and need to launch another investigation for that.

You can reference the test project from my Github

Conclusion – We can use the Distributed transactions in Azure SQL Database using EF and deploy your projects written in .NET 4.6.1/ 4.6.2 in the Azure App Service platform targeting .NET 4.6

Directory contains one or more applications that were added by a user or administrator.

This post summarizes and lists down the information you need to solve the specific error which occurs when you try to delete a AAD.

There are plenty of articles for this and I recommend to read the link below which explains very frequent error messages of AAD and the fixes.

https://support.microsoft.com/en-us/kb/2967860

Also read this blog for a detailed description.

http://alexmang.com/2015/06/directory-contains-one-or-more-applications-that-were-added-by-a-user-or-administrator-solution/

Some useful information

In order to manage the AAD using PowerShell we need to install two components.

  1. AAD Module for PowerShell
  2. Microsoft Online Services Sign-in Assistant

See the below article on how you can accomplish this.

https://onlinehelp.coveo.com/en/ces/7.0/administrator/installing_the_windows_azure_ad_module_for_windows_powershell.htm

Quick Steps

  1. Login to Azure Portal using your service administrator account.
  2. Make sure there are no users or other external applications. (If you find any of them delete them)
  3. Create a Global Admin user in the AAD you want to delete.
  4. In the PowerShell, login as the created Global Admin user
  5. Run the following scriptimage
  6. You will get error messages as mentioned in Alexe’s Blog but you can simply ignore them.
  7. Then go to the portal and delete the created Global Admin user.
  8. And finally delete the AAD.