Understanding GDPR and personal data


General Data Protection Regulation (GDPR) the law imposes new rules on companies, government agencies, non-profits, and other organizations that offer goods and services to people in the European Union (EU), or that collect and analyze data tied to EU residents. The GDPR applies no matter where you are located.

I have been reading the key aspects of the GDPR from the above official site, and thought to summarize the points I came across in order to understand what GDPR is, how we can make the systems complaint to the GDPR.

  1. GDPR is a regulation will take effect in May 2018
  2. It gives more control to the EU citizens over their personal data
  3. GDPR enforces controls over data collection, usage, storage, handling breaches, transparency, expiry and etc.
  4. GDPR puts non-technical regulations as well, in few scenarios. Ex – A must to have data security officer in organization in certain scenarios.
  5. It does not explicitly mention on encryption or end-to-end data protection but the regulatory requirements will force the applications deal with personal data to use those technologies.
  6. Strict security audits, logs and documentation should be in place in order to be complaint with the GDPR.

GDPR defines ‘personal data’ as – any information related to a natural person or ‘Data Subject’, that can be used to directly or indirectly identify the person.

According to the definition name, address, credit card number, social security number, bank account number, telephone number, license number, vehicle number and any other explicit identification information is a personal data. But the following types of data also fall under the personal data category.

  1. Calls to customer care services or any other voice based services where they record the voice of the user. Though the voice record takes place without the name and with the full anonymity, still the voice data itself should be treated as a personal data.
  2. Any video surveillance recordings (CCTV) or any other visual recordings should be treated as personal data (both in the edge and cloud storages).
  3. Any forms of biometric data should be treated personal data.
  4. Drawings which represents the state of a real image like drawing of a portrait, a family or anything which exhibits the behavior of socio cultural aspects of a family or an individual can be considered as personal data.
  5. Value of an asset – this is a number but still considered as a personal when it is linked with the a person’s profile and can be used to guestimate the economic state and the obligation of the person.
  6. Call logs and any other data usage logs.
  7. Real-time geo location monitoring and geo-location data. Ex –  Uber drivers
  8. Meeting minutes and any forms of such data for official purpose which any relatable links or traceable information to a person.
  9. Any sort of medical imagery with traceable information.
  10. Photos, videos and voice recordings of any person on any sort is a personal data.
  11. Any non-aggregated data which reveals consumer patterns on goods and services.
  12. IP addresses – including dynamic IPs

The above list is does not include all, but summarizes the personal data in short from the above link. In my opinion any data seems to be a personal data as long as it can be traced and tracked to a person.

In certain scenarios GDPR ensures the organizations to have non-technical complaints such as having a data security officer and etc.

Encrypting the personal data is a one aspect of GDPR which is covered by the clause of “pseudonymous data”. This does not make the solutions complaint to the GDPR because encrypted personal data is also considered as personal data, but this gives some relaxation on on security breaches and how the breach should be handled and notified.

In summary all the solutions should have the technical and non-technical aspects of

  1. Why we collect and store the personal data
  2. How the personal data is used
  3. Transparency of the usage and sharing policies of the personal data
  4. Store personal data as pseudonymous data
  5. Continuous security auditing and monitoring
  6. Notification to the users upon breaches and policy changes

Azure B2C with custom attributes with predetermined values


Azure B2C is a large membership database which also provides the features of tokens, sessions and membership/authentication experience (sign-up, sign-in, forget password and etc). But there are some scenarios which are little tricky based on how the entire solution is handled. Let me explain such a use case and describe different ways to handle that in B2C.

Case : You have an application which is a reselling portal, where a user can either be a seller or a buyer. During the registration / sign-up process user type will automatically be detected by the application, thus the user does not need to select the type. The below diagram explains the case.

Figure 1

 

Question: In this case why we cannot pass the parameters from step 1which holds the user type value and populate that value in a hidden field in the custom.html or the rendered mobile view in step 2. So it is straight forward to persist that information in B2C.

Answer: Since the rendering is controlled by B2C, any script execution is not allowed in that context. (I’m not sure is there any way to do this). Also the same html view is rendered in mobile or any other native clients passing the information from step 1 to step 2 in any means is not a safe option even if it is possible in any manner.

So we end up having the trouble of passing the user type information from step 1 to step 2 and instruct the B2C to persist that information.

Solutions:

There are different solutions. The trade off is always between how much of control we’re going to take from B2C and how much of control we let the B2C to have. This comes with the cost of development effort and control.

Solution 1: Let B2C handle the case much as possible and application controlled fields like user type to be saved in custom database column. And optionally update the B2C custom attribute using Graph API. Figure 2 explains this.

 

Figure 2

 

In this way, we get the benefits of the B2C policies and how to handle the auxiliary authentication services like password management, profiles offered by B2C. Most applications follow this without updating the B2C back using the Graph API.

Solution 2: Take control from B2C to the custom application and use B2C as a membership database.

Figure 3

In this model – In the step 1 some custom attribute values are determined (ex – user type) and passed to the step 2 which is a view controlled purely by the developer. Then step 2 passes the information to the application API in step 3. Server application updates B2C in step 4 and receives the JWT token in step 5. Application updates the database with the oid and other parameters in step 6.

What B2C could do in the future

In the current state of B2C – B2C has applications and policies. Applications and policies can be used many combinations. One application can have many policies of the same type with different settings and also one policy can be used across many applications as well.

In the custom rendering B2C should allow to have hide attributes with the default values. In that way in modeling the above scenario we can have have different policies with different settings and default values.

Primary critical requirement is in the below screen that B2C should add are..

  1. show/hide fields
  2. set default values

image

Place your business right– Cost Effective Architecture.


In the recent times, I have been working on building a product for a ‘pure startup’. What I meant by ‘pure startup’ is not about how novel the concept is, but development of the product needs to be started from the ground zero.

This scenario often includes the technical stakeholders to take part in the the product features and feature release plans. The technical stakeholders share and believe (to some extent) the vision of the entrepreneur.

Generally my role is to facilitate this process from the technical point of view, whilst aiding the entrepreneur reach his goals with his product.

Technical stakeholders tend to put all the great new stuff and deliver the product for a billion dollar business; entrepreneurs often occupied by the big dream of the product and can easily be mislead from the business goals.

The argument is not about not using great technologies and all the buzzwords, but the argument mostly comes – what we require at which stage, how big the idea is, how fast it can grow to hit the first million users, would the current architecture supports it and etc.

The confusion was from a real experience, that one of the recent startups wanted to go with one of the big cluster management / micro services platform in Azure for a relatively simple operational business.

Technical stakeholders of the project felt that, the micro services approach is not required at the beginning, and suggested the simple solution. The first question popped from the business stakeholders is – “Fine, but how much effort is required to go for micro services model at a later stage if it’s required?”

The answer is simple “WE DON’T KNOW, but we’ll guide you on any technology issues you face in the product and make sure it works” –  That’s the commitment and that’s the way things work.

That does not mean that we shouldn’t not start with complex architecture and cumbersome technologies, there’s always a balance and at the core we should discover what is right for the business. In order to help this decision I coined the term Cost Effective Architecture.

As businesses grow and change, the agility in the technology and the chosen platform becomes quintessential. Moving the business forward with the right technological decisions for the existing context, whilst considering the future change in a cost-effective manner is the core idea of Cost Effective Architecture (CEA).

Let’s categorize the businesses under two categories based on the target users.

1. Internal systems – The user base is known before the enrollment. Multi-tenant systems where a tenant can be an organization or department or any such group. The probability of the sudden usage spikes and user base expansion is highly unlikely. We can call them as tenant based products as well, but they are not necessarily the tenant based. Intranet systems automatically falls under this category (I wonder is there anyone developing Intranet solutions now). Good examples are Office 365, Salesforce and etc.

2. Public systems – Targeted for general public, user enrollment is not known before the enrollment. Mostly these are user based systems, the concept tenants rarely observed. The probability of the sudden usage spikes and user base expansion is expected based on the business model. Any public content applications fall under this category. Good examples are Spotify, LinkedIn and etc.

Another main categorization of the products based on the current user base.

1. Existing systems – These are the systems which already have gone production and have decent battle time in the live environment. Usage patterns and loads are fairly predictable.

2. Startup systems – These are either not launched or not developed. They can have a solid business model/requirement or they simply some great evolving ideas from entrepreneurs.

These two different categorizations leave us, with the following 2×2 model.

Picture1

Business Categorization of CEA

C1 – Internal systems with existing users. Internal systems can be more complex than some public systems, because they can have large number of users and many integrations with legacy and non legacy systems. The development changes occur due to technology invalidation, cost of maintenance, new feature requirements or scale out/up limitations of the predicted usage.

Since the new user enrollment is mostly predictable and the existing usage of the systems are well known, design and technology decisions have the luxury to have in depth comparison and analysis. These systems often have the bulkiness with them to move, as this is another reason for the slow development. Often collateral systems are developed (APIs, mobile apps) around the C1 systems and core solution of the C1 would move slowly towards the change.

C2 – These are public systems with existing users, the usage is somewhat known but unpredictable. Development of new features and changes are frequent. Technology decisions are based on the principle of keeping enough headroom for next wave of the new active users, thus abundance of the computing resources is justifiable as long as the growth is predictable. Downtime is critical since these are public systems. Agility and DevOps are key existence factors.

C3 – Internal startup systems. Go to market or launch is a critical factor. More often the development of these systems begin with a fixed number of promised user base. Business model is validated through implementations. Since these are internal systems usage spikes are unlikely, but that should not influence to bad technology selection. Sub systems are somewhat understood and discovered.

C4 – Systems with the intention to reach market with possible business cases. Mostly a hyper spike is not expected and validation of the market adaptability and usage needs to be discovered. Mission critical infrastructure or advanced design decisions can be put on hold in order to save time and money. It does not mean to implement bad practices but initiating the development in a way that can be scaled to larger scenarios with significantly less effort. In the cloud PaaS offerings helps to gain this. Primary focus is go to market as soon as possible.

Note :

Though CEA has the above categorizations, it does not force the ideas under a specific category just because a business requirement or case is theoretically mapped to those scenarios. Example – A novel idea can directly be considered under C2 rather being in C4, from the very first day of the system design and development, as long as the user base and market anticipation is clear and well understood. Systems can be started with costly reliable mission critical infrastructure and major design decisions and strategies can be considered upfront in such cases.

Multi tenancy – A myth in the cloud


Multi tenancy is a popular buzzword, often heard coupled along with cloud computing terminology. This is the issue and has created a dogma and some believe that multi tenancy can only be related with the cloud and cloud based SaaS services.

Based on the definition from Wikipedia

The term “software multitenancy” refers to a software architecture in which a single instance of software runs on a server and serves multiple tenants. A tenant is a group of users who share a common access with specific privileges to the software instance….

The core concern is – Pragmatically it is hard to define what a single software instance is ?

If multi tenancy is all about handling group of users (as tenants) we’ve been doing that with our good old, role based software systems, that’s not something new.

If we consider about the single instance of a software system, then there are confusions on how to consider the scale out scenarios and replications. (do not forget the DR plans with read access). In the software instances there could be confusions in the granular levels of programmatic constructs as well, like – are we talking about application level mutexes or object level singletons.

Handling different group of users in a software is very common scenario.  We see this problem in different layers of a software solution – in terms of data partitioning, security, noisy neighbor issues and much more.

But taking a SaaS solution and pointing the whole as a multi-tenant or non multi-tenant solution is technically wrong, though it has some conceptual truth in it.

In particular it becomes really annoying when someone tells that cloud supports multi tenancy and moving to cloud will give the ability to support multiple tenants.

Let’s see how things get complicated, consider a simple scenario where organizations (for the ease of understanding the tenants) can upload and convert videos to desired formats and stream them online.  Look at the below diagram in a simple overview. (note – the connections and directions do not reflect any and only data flow)

18-01-2017-20-53-office-lens

Here, the WFE is a web application and runs in single web server (multi tenant ?), and the service layer is running on two machines with the load balancer. (multi tenant ? or not) Web application access the cache which is a single instance serving as a common cache for all tenants. (multi tenant ?)  Messaging Queue, storage and transcoding services also runs single instances (multi-tenant ?) Each tenant has a different dedicated database (single tenant ?).

So in the entire system each layer or the software infrastructure has its own number of instances – so it’s almost impossible to call an entire system as multi tenant in technical perspective.

As a business unit, the system supports group of users (organizations in the above example) and regardless of the cloud or not, it is a multi tenant system.

Hope this will clear the misunderstanding of believing or arguing a SaaS is a multi tenant or just because moving a solution to cloud would make it multi tenant.

WADLogsTable missing in Cloud Service project


This is observed in Visual Studio Cloud Service Template (Environment  VS 2015 Enterprise Update 3 and Azure SDK 2.9.6 with .NET 4.6). This could be observed in most of the other versions – probably in Azure SDK version 2.4 and above, but I stated my working environment as soon or later this issue will be resolved.

Quick read and the reason : The template has the Trace.TraceInformation as the logging code line, but the configuration is set to log Errors by default. So when you run the application, the service has nothing to log and it doesn’t create WADSLogsTable. By changing the code to Trace.Error or changing the configuration to log information/verbose would solve the issue.

Analysis

Mostly beginners bounce into this issue, and fair reason to get panic because when the just create the fresh Azure Cloud Service out of the box from the available template, it doesn’t work as expected.

Go to the Worker Role properties and you can change the application log level settings to log Information level logs.

default trace log setting - error

or, change the code to this

image

The application creates the table when and only the information need to be persisted, so the available template does not create the WADSLogsTable until you do either of the suggested changes.

Controlling access to your Azure resources using RBAC


Being part of a software services company, customers often ask the question how to restrict access to Azure resources. It is understandable that any organization wouldn’t prefer to give all the rights of the organizational Azure subscription to a person.

In the classic Azure model the only way to give access to Azure portal is, adding the user as a co-admin for the subscription. This gives all the permissions to that user within the subscription except managing the administrators.

But the new Role Based Access Control  (RBAC) helps to solve this problem. Using RBAC we can control the permission scope to either subscriptions, resource groups or to individual resources.

Permissions in the top level scope are automatically inherited to the level below – meaning subscription level users have the same permissions to the resource groups and the resource group level users have the same permission to the individual resources within the resource group.

RBAC has several roles – read more about different roles

Here I’ve explained the flow of adding a new user to a Azure resource group and how his/her experience in accessing Azure via portal. Assume the user doesn’t have any permission in the Azure, and he’s just a developer with a Gmail account.

First, a subscription admin logs in to the portal and add this user in the Azure Active Directory of the specific subscription.

1

Note at this point, developer1 does not have a Microsoft account. She clicks on the link in the email  she received. She will be directed to create a Microsoft account with the specified email address. (if there’s a Microsoft account already available this step will not be required)

2

After creating the Microsoft account (entering a new password and create the Microsoft account), she can login to the Azure portal using – https://portal.azure.com But within the portal this user cannot create any resources. In case if the user tries to create or try to perform any action she will get the below message. This is a very similar message to the old grey error box in classic portal, as the user exist in the Azure Active Directory but does not have a subscription, in this case does not have any resource.

3

Now let the admin assign a resource group for the user.  Assume you have a resource group DevelopmentRG and in the resource group IAM settings add the user (developer1) as a contributor.

4

Contributor is a predefined role in Azure which has the create/edit/delete permissions of the resources within the specified scope. In this case developer1 has those permissions within the resource group – DevelopmentRG.

5

After setting developer1 as a contributor, you can notice that the access type of the user is set to Assigned, because this is a an assigned permission. Also note that the subscription admins have the permission to the resource group as Inherited permission.

6

Now the developer1 logins to the portal and she will see the assigned resource group. Developer1 can perform actions within this resource group.

7

Also note that since, developer1 has only the specified resource group, she cannot create a new resource group or any permission outside the scope of the specific resource group.

8

RBAC provides more granular permissions with various roles required for the businesses, this helps the organizations to carefully delegate the permissions to the people without exposing the entire Azure subscription.

The feature to limit/set the quota for a resource group is in the request from the community.

Using Akka.NET with ASP.NET Core – Creating a Quiz API


This is a template and quick start guide for Akka.NET with ASP.NET Core. You can grab the concepts of using Akka.NET with ASP.NET Core and how Akka.NET actor model can be used in a simple quiz or survey based scenario.

But at the same time, this post will not provide all the fundamentals of actor model programming or Akka.NET. It assumes that you already have the understanding of the actor model and reactive programming basics, along with the some practical experience with the concepts of Akka.NET.

Scenario :  A quiz engine has many quizzes and users can attend the quizzes. Each user can attend many quizzes as possible at the same time. So each user session is associated with a quiz. One user can have many quiz sessions at the same time. A simplest session key is a combination of quiz Id and user Id. This combination is unique and referred as a session Id. Each session is an actor.

Also a template actor provides the quiz templates during the session creation. Each session actor gets the fresh copy of the quiz during session creation.

The below diagram shows the actor system used in this scenario.

Akka.NET actor model for quiz engine

Step by step explanation

  • In the ASP.NET Core Startup class the actor system (QuizActorSystem) is instantiated.
  • QuizMasterActor is created in the context of QuizActorSystem and the QuizActorSystem is added to the ASP.NET Core services collection, to be consumed by the controllers.
  • QuizMasterActor creates QuizSessionCoordinatorActor and QuizTemplateActor under its context.
  • For simplicity the QuizController of ASP.NET Core has two actions.
    1. GetQuestion – This gets session Id and question Id. The controller asks for the session actor from QuizSessionCoordinatorActor. If the session actor is already available it will be returned else QuizSessionCoordinatorActor will create a new session actor under its context. QuizSessionActor loads the quiz from the QuizTemplateActor in the initial creation, gets the fresh copy of the quiz and returns the requested question. Consequent requests will be served directly by the QuizSessionActor.
    2. GetAnswer – This action methods takes the session Id and the answer for the question and pass it to the right QuizSessionActor for the update.

The entire QuizSessionActor tree is created upon the request for a question under a specific session and this is quite safe and straight forward.

You can download the source code from this Github repo.