AWS Serverless and DynamoDb Single Table Design using .Net 6 – Part 2

Introduction

This is a continuation of the previous article AWS Serverless and DynamoDb Single Table Design using .Net 6 – Part 1. In this part we’re going to create a sample Serverless application using DynamoDb and deploy that on AWS Lambda.

Tools

Configure

Configure the AWS Toolkit using the link. While creating the IAM user make sure to attach the below policies

AWS Policies
Image: 1 

We’ll be using this user for creating the serverless application and deploying the same from Visual Studio or dotnet tools command line interface.

Why Serverless

Serverless solutions offer technologies for running code, managing data, and integrating applications, all without managing servers. Serverless technologies feature automatic scaling, built-in high availability, and a pay-for-use billing model to increase agility and optimize costs. These technologies also eliminate infrastructure management tasks like capacity provisioning and patching, so you can focus on writing code that serves your customers. Some of the popular serverless solutions are AWS Lambda and Azure Functions.

Development

AWS Toolkit for Visual Studio provides many built-in templates for creating AWS based serverless applications quickly.

Create a new project from Visual Studio and type ‘serverless’ in the search box and select `AWS Serverless Application (.NET Core – C#).

Image: 2

Enter the project name and continue.

Image: 3

Then select the ASP.NET Core Web API blueprint from the selection and click Finish

Image: 4

Once the project is ready in Visual Studio, you can see a file called serverless.template. This is AWS CloudFormation Serverless Application Model template file for declaring your Serverless functions and other AWS resources. Make sure to add two policies( AWSLambda_FullAccess and AmazonDynamoDBFullAccess) as shown below: These permissions are required for the Lambda to Read and Write to DynamoDb

Image: 5

Then add the below Nuget packages:

AWSSDK.DynamoDBv2Newtonsoft.JsonSwashbuckle.AspNetCore.SwaggerGenSwashbuckle.AspNetCore.SwaggerUI

Create an Interface called IEmployeeDb to define the methods

public interface IEmployeeDb{   Task<IEnumerable<EmployeeModel>> GetAllReporteesAsync(string empCode);   Task<EmployeeModel> GetEmployeeAsync(string empCode);   Task SaveAsync(EmployeeModel model);   Task SaveBatchAsync(List<EmployeeModel> models);}

Create a class to implement the IEmployeeDb interface. The constructor would look like the below:

public EmployeeDb(ILogger<EmployeeDb> logger, IWebHostEnvironment configuration){   //Comment out the below four line if you're not using the DynamoDb local instance.   if (configuration.IsDevelopment())   {      _clientConfig.ServiceURL = "http://localhost:8000";   }   _client = new AmazonDynamoDBClient(_clientConfig);   _context = new DynamoDBContext(_client);   _logger = logger;}

We configured the ServiceURL to point the localhost in case we’re using DynamoDb local instance. We also initialized the AmazonDynamoDBClient and DynamoDBContext. We’ll be mainly using the Highlevel API called DynamoDBContext for reading and writing data from DynamoDb.

The below methods are responsible for writing/saving the data:

public async Task SaveAsync(EmployeeModel model){   await SaveInDbAsync(GetUserModelForSave(PrepareEmpModel(model)));   await SaveInDbAsync(GetReporteeModelForSave(PrepareEmpModel(model)));}private async Task SaveInDbAsync(EmployeeModel model){   await _context.SaveAsync(model);   _logger.LogInformation("Saved {} successfully!", model.EmployeeCode);}private EmployeeModel PrepareEmpModel(EmployeeModel model){   model.EmployeeCode = model.EmployeeCode?.ToUpper();   model.ReportingManagerCode = model.ReportingManagerCode?.ToUpper();   return model;}

When saving a record, this method will actually insert two objects, one for user type and the other for reportee type. We discussed the reason and logic for creating two entries in the previous part.

In the below method we implemented the logic for fetching the employee by EmployeeCode:

public async Task<EmployeeModel> GetEmployeeAsync(string empCode){   var result = await _context.LoadAsync<EmployeeModel>(empCode.ToUpper(), empCode.ToUpper());   if (result != null)      result.ReportingManagerCode = ""; //ReportingManagerCode was same as EmployeeCode, so just remove it   return result;}

Next method will cover the logic for fetching the reportees by EmployeeCode:

public async Task<IEnumerable<EmployeeModel>> GetAllReporteesAsync(string empCode){   var config = new DynamoDBOperationConfig   {      QueryFilter = new List<ScanCondition> {         new ScanCondition("Type", ScanOperator.Equal, "Reportee"),         new ScanCondition("LastWorkingDate", ScanOperator.IsNull)      }   };   var result = await _context.QueryAsync<EmployeeModel>(empCode.ToUpper(), config).GetRemainingAsync();   return PrepareReporteeReturnModel(result); //swap the EmployeeCode and  ReportingManagerCode and return}

All the other code fragments and complete solution can be downloaded from the GitHub repository.

Once you complete the development you need to create a DynamoDb table in your AWS account. There are many ways to create a service in AWS. You can use CLI, Console, SDK or even Visual Studio Toolkit. Below is the CLI command for creating the table and setting up the pk and sk.

aws dynamodb create-table --table-name employees --attribute-definitions AttributeName=EmployeeCode,AttributeType=S         AttributeName=ReportingManagerCode,AttributeType=S --key-schema AttributeName=EmployeeCode,KeyType=HASH AttributeName=ReportingManagerCode,KeyType=RANGE --provisioned-throughput ReadCapacityUnits=1,WriteCapacityUnits=1 --table-class STANDARD

Now you can deploy the serverless application either using Visual Studio or dotnet tools. To deploy using Visual Studio, right click on the project and select the Publish to AWS Lambda button.

To deploy using dotnet tools you need to follow the below steps in the command line.

dotnet tool install -g Amazon.Lambda.Toolscd "AWSServerlessDynamoDb/AWSServerlessDynamoDb" #or whatever the folderdotnet lambda deploy-serverless

After successful deployment, you will get a Lambda endpoint(ApiURL)  as below:

Image: 6

You can access your SwaggerUI by adding /swagger in the above url and you can test the APIs.

Complete source code can be found here.

Happy coding!!

AWS Serverless and DynamoDb Single Table Design using .Net 6 – Part 1

Introduction

When developing a high performance scalable application everybody tends to use the below technologies.

  • Serverless Functions or Lambdas
  • Cloud managed NoSQL databases like DynamoDb or CosmosDb
  • Database design strategies like Single Table Design

In this article we’ll cover about Single Table Design. Next part we’ll create a Serverless application using .Net 6 and DynamoDb.

Use Case

Recently we worked on a Social Networking platform and we used Single Table Design in that project. That use case is very complex and overwhelming for a beginner, so let’s consider an imaginary use case (this may not exactly fit the Single Table Design). But let’s consider an Employee REST API which will help us to design a basic Single Table Design. Here are the features of the API:

  • User will be able to add a Employee
  • User will be able to fetch the Employee details with the EmployeeCode
  • User will be able to fetch the immediate Reportees (for the sake of simplicity) of the Employee/Manager

Schema

EmployeeCodeEmailIdFirstNameLastNameReportingManagerCode

In the RDBMS world EmployeeCode will be the Primary Key and ReportingManagerCode will be the foreign key pointing to the same table using self join.

Single Table Design

In RDBMS, we use multiple tables in a database, and that tables may be interrelated with foreign keys and we tend to normalize the tables up to a certain level and avoid duplicate storage as far as we can. In the NoSQL world(especially in DynamoDb), there are no foreign keys and joins(and there is a reason for that), and do not care about duplicacy. In Single Table Design, we put all the entities(eg: Post, User, Comment, Follower etc.) in a single table and may use the ‘Type’ attribute to identify the entities.

Why

In a Read Heavy database, many(millions of) users will be accessing the different content at the same time. So you have to fetch the data as fast as possible. If you want to return the data quickly you have to minimize the database requests for a single API call. In RDBMS even though we’re making a single call most of the queries will have complex joins and involve multiple tables, as the data size increases these queries take more time.

If you have to fetch the Posts of all the users who I’m following, then in SQL-based databases you have to join ‘Users’, ‘Followers’, ‘Posts’, ‘Comments’ etc. If you store the entities in separate DynamoDb tables then you’ve to make multiple calls from your backend to DynamoDb and do some JSON manipulations and return that to the Frontend. We cannot afford that many Db calls from the backend, so we need to get all the data in a single Db request.

How

In DynamoDb within each table, you must have a partition key, which is a string, numeric, or binary value. This key is a hash value used to locate items in constant time regardless of table size. It is conceptually different to an ID or primary key field in a SQL-based database and does not relate to data in other tables. When there is only a partition key, these values must be unique across items in a table.

Each table can optionally have a sort key. This allows you to search and sort within items that match a given primary key. While you must search on exact single values in the partition key, you can pattern search on sort keys. It’s common to use a numeric sort key with timestamps to find items within a date range, or use string search operators to find data in hierarchical relationships.

With only partition keys and sort keys, this limits the possible types of query without duplicating data in a table(even though there is no harm in duplicating the data as storage cost is very less, but modifying the multiple copies is another headache). To solve this issue, DynamoDB also offers two types of indexes ie: Local secondary indexes (LSIs) and Global secondary indexes (GSIs). We can discuss these topics in a separate session.

Single Table Design is not ‘Agile’, you have to identify the Data Access Patterns in the beginning of the project, otherwise you may identify a use case later which may require an entire redesign of the data structure. So let’s identify our access patterns first.

Data Access Patterns

Our patterns are the GET API responses we discussed earlier. In our case it’s simple as of now.

  • User will be able to fetch the Employee details with the EmployeeCode
  • User will be able to fetch the immediate Reportees of the Employee/Manager
User will be able to add an Employees

Let’s create the table as follows. Partition Key is EmployeeCode, and Sort Key is ReportingManagerCode

Table: 1
Table: 1

Now things look simple, we can get the entity based on pk, let’s evaluate the next access pattern and come back here if required.

User will be able to fetch the immediate Reportees of the Employee/Manager

Suppose we need to fetch all the direct reportees of user 11. We can see if we can query using the sort key then we could have fetched all the reportees of 11 in a single query, but here your challenges start. You cannot query a DynamoDb table without providing a pk equals statement. So if you query pk=11 then you’ll get only one record.

Now the next step is to evaluate whether we can use LSI or GSIs to solve the problem or not. LSI is just another sort key and in query we need to provide pk then LSI won’t work here. If we use GSI then you can create one more pk and sk, but then you can’t use the main pk or sk.

Next option is to duplicate the data, let’s think about that. In the above table structure one record was self-sufficient(it had both EmployeeCode and ReportingManagerCode), but in the below format.we separated the entity. We also added a type attribute to identify the type of entity.

Table: 2
Table: 2

In user entities both pk and sk are the same (ie EmployeeCode). We duplicated the same entities and swapped the pk and sk and assigned the type as ‘reportee’. Now let’s evaluate the query. If we ran a query as pk=11 we will get three records

Table: 3
Table: 3

One record is for the Manager and multiple records for the reportees, we can filter out the user type by filter expressions if required. So the second access pattern has been solved, but we’ve a problem in the first access pattern now. Our first query was pk=11, but now that will return 3 records so we need to fix that. We can use pk=11 and sk=11. Solved!

Conclusions

The use case we discussed here was very basic one, but still we had to take care of multiple things and we also had to duplicate the data. Duplicating the data will further complicate things like updation and deletions etc. You may need to implement Message Queues like SQS and BackgroundService to solve that issue.

In the next article, we will cover the actual practical implementation with code samples.