Data Platform in AWS
This company has close to 100 years’ experience in crafting the world’s smoothest, most delicious ice cream. They were looking to create Modern Cloud Data Platform in AWS to access data with the highest degree of quality and accuracy using AWS. By creating Data Platform, the company will have ability to take fast data driven decisions. They will also have clear overview of KPI’s for the entire organization (e.g., Product, Operations and Sales) to understand how the business is performing and act accordingly.
For this project, the company was having several types of business data available in existing NAV system. They did not have any Data Platform setup for Data Analytics purpose. They were doing data analysis manually.
The challenge was to set up scalable data platform in existing on-premise environment that can accommodate growing volumes of data and setup analytical visualization layer in existing environment with growing data volume and KPI needs. In accordance with customer requirement, data volume tends to grow in GB’s and TB’s for which they need a highly scalable, secured data platform which needs to be setup as data lake, data analytical layer (Warehouse) in AWS and visualization layer in BI tools like QuickSight.
A traditional on-premises data platform in these days is not sufficient, due to modern day demands such as scalability, performance and being able to deliver global applications. As such a cloud solution was deemed necessary. The cloud-based data platform is designed to meet the company’s requirements, offering scalable ingestion, processing, analytical layer.
Initial efforts consisted of setting up base data platform which was comprised of ingestion, transformation, analytical layer, and visualization layers. Using the Azure AD integration with AWS IAM security has been the authentication/authorization layer for accessing AWS resources.
A number of AWS tools and services were critical to creating the solution, including:
Glue Job: Serverless data ingestion and processing layer that can connect to different data sources and ingest to data lake like S3. Also has transformation layer for data processing.
Amazon S3: Data Lake that can store raw data ingested from on- prem, curated data from transformation layer.
Redshift: Data Warehouse that can store facts and dimension tables where BI-layer like Power BI can connect to RedShift and create visualizations.
Site to Site VPN: Secured managed connection from on-prem to cloud with data encrypted.
Microsoft on-prem Gateway: Gateway for connecting for Power BI service running in internet to the Redshift in private subnet in VPC.
Amazon QuickSight: A fast, cloud-powered business intelligence service that delivers insights to everyone in your organization.
Amazon Lambda: An event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code.
Amazon Athena: Provides a simplified, flexible way to analyze petabytes of data where it lives. It was used to query on AWS S3 Datalake.
Amazon CloudWatch Logs & Analytics: Used to identify anomalous behaviour through the collection of data in the form of logs, events, and metrics. Also provided a unified view of all resources.
A Solution that creates Value & Benefits
The migration started with phase 1 as setting up base platform that will be enhanced in phase 2 with multiple data sources incorporating different business rules.
- To test the validity of the solution, we started with 20 -30 tables for a given data source setting up ingestion layer, incorporating 10 to 15 business rules for data transformation, setting up analytics layer integrated with BI visualization layer.
- The solution affords increased opportunity for disaster recovery, unrestricted availability and most importantly, enhanced performance with secured solution for data encryption at rest and during transit.
- They achieved faster overall turnaround time for achieving KPI’s and zero downtime.
- The solution also delivers scalability from day one, optimising both utilisation and cost.
- QloudX will continue to monitor application performance to consistently guarantee an optimal end-user experience.
- Similarly, security issues and cyber-attacks, which have been plaguing the industry in recent years, can now more readily be identified, and tackled.
In short, the company has gained a higher level of control and transparency of their data, while the solution provides fast, consistent, and reliable response times for a better end-user experience with good amount of business and technical KPI’s.