Couple of weeks back I posted an article on how to design an online forum application on top of Azure Table Storage. This post is about how to design the same application using Azure Document DB. Same as the previous article, I want to stress the point that the way we design an application and the thinking behind the design completely differ based on the NoSQL technology we select.
These are the basic requirements / functionalities of the application.
- Forum members can post questions under categories.
- Forum members can reply to posts.
- Users have points based on their forum contribution.
Document type NoSQL databases are handy in storing data as documents, most of the modern document databases support JSON as the document storage format.
Also I assume that you have the understanding of Azure Document DB about indexing, consistency levels and how it is structured as databases, collections, documents and more.
Based on the above requirements, if we design a single document it would look similar to this.
As you see we can have a single document structure to cover everything the application requires, but it has some drawbacks too.
Mainly user data is redundant and if we want to update the points of the user we have to go through all the documents and update it, we use other data operations like map reduce to perform these operations in a large scale document type implementations.
Design for Azure Document DB
It is recommended and straight forward to have a dedicated collection for each identified entitiy. Thinking on that base, we would require four main collections they are users, categories, posts and replies.
This design is easy, highly scalable but expensive, because Document DB databases are containers for the collections. Collections are the containers for the documents and also a single collection is the billing entity, meaning that if you create a database and two collections within that database in a tier priced $25 per month, then you will be billed $50 per month as you have two collections.
In this design will be having 4 collections.
But this would not be the ideal design for the solution in terms of the best tradeoff between the cost and the solution design.
Because having a dedicated collection for the category is not necessary, we can simply have the category as an attribute in the posts. Having a dedicated collection for users might sound too much. I do not totally offend this – because, sometimes it is a good idea to have a dedicated collection for the users, especially when the number of users grow in large scale.
Also remember the design using the Azure table storage where we used bucketing strategies to partition the users, we can use the same strategy here if we have millions of users. We can put them in different collections rather than keeping them in one single collection.
But say that we have only few hundreds of users and few categories, then we do not want to have separate collection for each. So we need a mechanism to put them in the same collection and query them.
The idea is simple, again this is not the technology but it is the best decision we make on top of the technologies we use.
Have two documents with their Ids represent the entities or you can have an attribute called type which represents the document.
When you change the design like this, there is a significant change that you should do in your queries.
But again the idea here is to give you the possibilities how you can design the solution on top of Document DB.
Also thinking about the posts and replies, better practice is to keep the posts and replies in separate collections as designed earlier. Because not only that you can scale them individually but also it is not a best practice to have unbounded attribute in a document, meaning an attribute who’s values have theoretically no limits. Replies is an unbounded array, so we will have a dedicated collection for that.
This is the second series of the post in designing applications on Azure NoSQL offerings, however one of the main point I want to clarify is the design decisions we make vary based on the NoSQL technology we pick.