At HealthKart, we use lambda architecture for building real time analytics pipeline. However the most critical part in this setup is picking the framework which are extensible and does not cost a heavy toll on your infrastructure cost.
Keeping these thing in mind, AWS was the most viable option to have lambda architecture for achieving real time analytics for HealthKart platform. Below is the architectural diagram of the setup that we have which comprises of multiple frameworks to achieve the same and has been explained below.
- AWS Pinpoint – AWS pinpoint is primarily a mobile analytics framework which also has JS SDK available along with REST APIs. This framework provides API to fire pre-build and custom events from client side which will get stored on S3 buckets in JSON format. Since it has client SDK available, it provides lots of pre-build client matrix like session time, DAU/MAU, geographical information in the pinpoint dashboard. On top of it 100M events are free and they charge 1$ for additional 1 Million event. This really makes this cost optimal if you are events are in few hundred millions per month.
- S3 Bucket – All the events data which are fired up from client side gets stored in S3 bucket which is scalable and easy to integrate service with other services of AWS.
- Kinesis Stream – Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application. We use Kinesis to push all events data received from our app in real time manner.
- Application Groups Listener – These are Kinesis clients which listen to Kinesis stream and powers up parallel processing on streaming data in real time. There could be multiple application groups which run in parallel to process large amount of data. We process these streaming data which is being used in defining the products which are trending in real time, recently viewed history of users, creating personalized result in listing, sending real-time push notification based on event data rule etc.
- Redis Cluster – Application group listener prepares the required data for trending, viewing history, personalized data etc and put it it Redis cluster. Our platform uses this data stored in redis cluster to show this to users on App/Web in real time. Since Redis has multiple data structure support apart of just key-value pair, it becomes easy to serve different kind of pre-build data based on need in realtime manner.
- Redshift – AWS redshift posers of analytics workload of petabyte scale of data. We further pass S3 event data to Redshfit so that on-demand and adhoc queries for analytical use can be processed in faster manner for in-house reporting purpose.
- QlickSense – QlickSense is BI reporting tool which is integrated with Redshift columnar database to power up our business reporting.
- Athena – Athena can be used to even fire up SQL queries on the data stored in S3 in JSON format for analytical and reporting purpose.
- QuickSight– Amazon QuickSight is a fast, cloud-powered business intelligence service that makes it easy to deliver insights to everyone in organization. As a fully managed service, QuickSight lets you easily create and publish interactive dashboards.
We also use the same setup to power up user engagement in realtime manner since it is extensible architecture and follow the Open/Closed architectural pattern . Our user journey work flow system listen to same stream to send personalized push notification in real time to users based on his action. We use flowable workflow engines to integrate it with Kinesis application groups for this purpose.
The above content sharing is based on our experience and work here at HealthKart. Please feel free to comment with your thoughts on this.