Overview
The fortune 100 Firm set out to revamp their Enterprise Data Lake which was on a
Cloudera platform to a new data platform primarily for
- cost reduction
- adopt to modern data architectures and
- importantly,
to enable the data lake to be easily governed and consumable
Business Challenges:
The data lake development at this customer was in works for many years. With iterative development and on-the-fly data governance, the lake over time has become very difficult to consume with lack of proper metadata management, access provisioning and cataloging. The lake was built on a Hadoop cluster and with the costs growing higher,
there was a need to retake a look on options on new technologies and platforms.
Solutions Delivered:
Quadratic Systems was the primary partner for designing and implementing the data lake solution on
a new platform comprising of.
- On-Prem S3 object store (Scality) for data storage (replacing HDFS),
- Spark/Scala on Kubernetes containers (CaaS Platform)
- Dremio as a query tool (replacing Hive/Impala)
Scality was chosen for Data Storage and a Caas Platform (Kubernetes) for compute to replace the exisiting Hadoop environment.