The fortune 100 Firm set out to revamp their Enterprise Data Lake which was on a
Cloudera platform to a new data platform primarily for
- cost reduction
- adopt to modern data architectures and
to enable the data lake to be easily governed and consumable
The data lake development at this customer was in works for many years. With iterative development and on-the-fly data governance, the lake over time has become very difficult to consume with lack of proper metadata management, access provisioning and cataloging. The lake was built on a Hadoop cluster and with the costs growing higher,
there was a need to retake a look on options on new technologies and platforms.
Quadratic Systems was the primary partner for designing and implementing the data lake solution on
a new platform comprising of.
- On-Prem S3 object store (Scality) for data storage (replacing HDFS),
- Spark/Scala on Kubernetes containers (CaaS Platform)
- Dremio as a query tool (replacing Hive/Impala)
Scality was chosen for Data Storage and a Caas Platform (Kubernetes) for compute to replace the exisiting Hadoop environment.