Data is at the core of every organizational strategy. Almost all tech-leading organizations are moving ahead with their decisions based on intelligently empowered insights. The data architecture is very critical to the efficiency, scalability, and usability of the data. And hence architecture in the core defines the insights of the data.
Over the past years, organizations have adapted and excelled with the various scaled forms of data architectures like monolith data architecture (data lake approach). But with expansion across domains, the centralized data platform architectures like monolith have few challenges to deliver data with the speed and flexibility of the scaling organizations need.
What next?
Organizations are progressively switching to a more scalable form of data architecture called Distributed Data Mesh. It is a highly decentralized data architecture with a focus on data as a product.
How does a DataMesh Works?
A data mesh architecture works mainly on four principles, as shown in Figure and explained below:
- Domain Oriented Approach: With a data mesh, a mass of data is broken down into domain-specific streams, which can be transformed to create a joint aggregate view of the business domain. These data streams are owned by independent teams or users, who are, in turn, attached to business experts who analyze the data for insights.
- Data as a Product: The concerned team sees the data as their product and is responsible for processing the data to be used as a ready-to-use unit.
- Self-serve data infrastructure as a platform: Further, an automated platform (Infra as a platform) allows the product team to control the lifecycle of data from the hands of a developer at the source to connect the data product and then run the semantic queries at mesh level. The platform provides storage, pipeline, data catalog, and access control to the domains. It reduces any chance of duplication in effort and hence increases the efficiency of data treatment.
- Federated Governance: In the end, the implementation of data mesh architecture is defined by federated governance with standards and interoperability as primary architectural guidelines. It emphasizes that each domain should be discoverable, addressable, self-describing, secure, trustworthy, and interoperable.
Why is this Important?
The data mesh architecture deals with the data more reliably and processes that information in real-time. Additionally, it can overcome multiple challenges of monolith data architecture. A few of them are listed below:
- It brings ownership and hence the accountability and responsibility of sharing that data as a product.
- It improves quality as data is treated at the source and is good to be used as an independent unit.
- With the data cleaned at the source, the responsibilities are equally shared and hence support organizational scaling.
- It increases efficiency with the incorporation of data infrastructure as a platform.
- It can handle a vast set of data without any difficulty, as they are domain-wise segregated for their respective domain experts to work on them.
- The data can now be directly incorporated into data analytics tools, making it more feasible in operational values.
How is it different from monolith architecture?
Moving to the data mesh architecture from monolith require a mindset shift, which is more apparent from the below-listed analogy (shift from monolith mindset to the data mesh mindset):
- Centralized ownership to decentralized ownership
- Pipelines as a priority to domain data as a first-class concern
- Data as a by-product to data as a product
- A siloed data engineering team to cross-functional domain-data teams
- A centralized data lake/warehouse to an ecosystem of data products
With this transition to the next generation of data architecture, The world of data science is looking forward to relying more on AI-powered insights and hence the decisions.
Interested in Learning more about DataMesh?
Here are the few easy to understand links that I loved out of many:
- Read what Zhamak Dehghani says about Moving Beyond a Monolithic Data Lake to a Distributed Data Mesh
- Data mesh blogs by Thoughtworks
- A blog on How Not to Mesh it (Data Mesh) Up Monte Carlo
#datamesh #thoughtworks #dataarchitecture #dataengineering
Thanks for reading this till the end! Don’t forget to add your feedback in the comment!
Let’s connect on Linkedin.com/in/jhakamal to collaborate on projects!
Originally published at https://www.linkedin.com.