Contact Tracing Apps – A holistic perspective

Contact Tracing Apps are one of the most argued topics these days. Several countries are trying to implement contact tracing apps. Google & Apple announced a joint partnership in enabling contact tracing; it is a two-step approach – first it will be released as an interoperability API, later as a platform level functionality. At the same time, countries like China and Singapore have implemented contact tracing apps including location-based services, this is proven to be effective, compared to Bluetooth based tracing. However, location-based tracing is not widely accepted due to the obvious privacy concerns.

Bluetooth based contact tracing

These applications use Bluetooth to detect who’s around you. Gathered information is then processed either in real time or based on an action. In Sri Lanka there are several projects emerging from different individual developers. Also, there are entities who are trying to implement this solution for the government. We were asked to provide clarity and some working building blocks, to understand the internals of a typical tracing app. This post contains some of the observations and concerns of a contact tracing app from a general perspective.

These kinds of apps trigger concerns of data privacy and related issues (more on this below), but first thing came to my mind was how to do this in iPhone, as iPhone has restrictions in unpaired Bluetooth communication – it requires the app to be in foreground for a successful handshake – you can read more about the limitation from this link

After a quick Internet research, we understood, prevailing COVID tracing apps do have the foreground limitation in iPhone. Also, we came across this app TraceCovid : Fight COVID-19 Together, from the health department of Abu Dhabi.

Here are the screen shots of the app in an iPhone (click to enlarge the images). It is obvious, the app should run in foreground to function properly.

Minimal PII Footprint Implementation

Second concern is data, mainly the PII – Personally, Identifiable Information. PII has a broader data coverage, addition to the obvious data like phone number, email, name, IP address etc.

PII classification and severity vary broadly, which makes it hard to comprehend at times. Couple of interesting examples of PII are, In EU under GDPR, a drawing of a child is a PII as it may reveal the social and environmental impact of a kid’s surrounding. An advertisement put out to sell a car is a PII – not only because you forgot to mask the number plate, because the selling price can be used to inference the financial status of the person at a given time.

With that, note, let us look at how we can implement a contact tracing app with minimal PII footprint. In fact, we can have a contact tracing app with zero PII stored in the systems. Initial validations require a phone number or an email.

User installs the app, enters mobile number, receives one-time password and register. The mobile number is not stored in the systems, it is used to send the one-time password and then wiped off. System generates an id (UUID) like a username and the push notification id will be mapped against this id.  For the system UUID is the user, it has no meaningful mapping to the real person of the UUID. The below deck illustrates how the contact tracing can happen in such case.

This is a fair solution in terms of data privacy, as no PII is persisted in the contact tracing app provider’s systems. However, reaching people is challenging as system relies only on push notifications. Storing mobile number removes this obstacle and eases up the process.

In mapping the UUID to the real person, phone number is preferred because phone communication is more effective in reaching a person at emergency over email. Also, with established policies, telecommunication services should provide APIs to related authorities to retrieve more data about a person based on the phone number.

Data related Concerns

An app of this nature, is a natural victim for concerns related to data, some key concerns would be

  • Data privacy – Discussing this limiting to this context, data privacy is about who will access my data and how they will use it. Will they use this to other purposes than tracing the infection. Will it be shared with others? In case of any findings related to me, will it be shared with others? if so with whom and how they will use it? As you see, data privacy is about how the data is used.
  • Data residency – The geographical location where data is stored. Public cloud or a private datacenter. Within the country or outside. Within a geopolitical region or outside. Within a specific standard datacenter or outside.
  • Data handling – This is very important aspect but often missed. This is the most crucial piece of all. This is about the policies and procedures of the authorized stakeholders who handle the data. This includes screening of such individuals/entities, tools and services used to process the data, the ways data will be processed, what are the data protection facilities of used tools and services etc. This a mix of both technical and processes.

Summary

Plotting the effectiveness of tracing and the privacy, we will end like below.

contact tracing app effectivness vs privacy
contact tracing app effectivness vs privacy

Since the tracing is about finding a specific id and its trails, data analytical component does not require meaningful data, it can work on the anonymized data (as described in the slides) and later be mapped to the real data. Or the entire data set can be processed encrypted using homomorphic encryption.

This allows some freedom in the data residency as well. Anonymized data can be kept in public cloud platform leveraging cheap and scalable infrastructure for real time lambda architecture-based analytics and later brought down to be mapped with the meaningful data.

However, Bluetooth tracing remains obstructive in iPhone.