The most popular and comprehensive Open Source ECM platform
The centralization of data has been a core design concept of enterprise data management. Data platforms like data lakes and data warehouses try to pool data from dispersed systems into a central access point so that all data can be easily searched, explored, retrieved, and standardized.
Now some are saying that a centralized perspective of data has flaws. Increased volumes make centralized management hard, and as the number of use cases for data grows, the amount of transformations going on at the server begin to snowball.
Sri Ambati, the CEO of H2O.ai, said that “going back to the 1990s, companies with the best warehouses were able to mine it and get the most insights. I think there’s always going to be value for centralization. The data lakes of the bygone big data era, if you will, have become data swamps, because people have not been able to use that effectively.”
In addition, the growing popularity of edge computing is at odds with the centralized approach, and increasingly people are trying to develop better decentralized approaches.
For example, Zhamak Dehghani, Director of Next Tech Incubation, said that “in order to decentralize the monolithic data platform, we need to reverse how we think about data, its locality and ownership. Instead of flowing the data from domains into a centrally owned data lake or platform, domains need to host and serve their domain datasets in an easily consumable way.”
Dehghani and her team at Thoughtworks promote the concept of a data mesh. The data mesh is the connector between data stored in different locations and by different organizations.
Dehghani said that “it is up to the engineers and leaders in organizations to realize that the existing paradigm of big data and one true big data platform or data lake, is only going to repeat the failures of the past, just using new cloud based tools. This paradigm shift requires a new set of governing principles accompanied with a new language.”
The four principles of a datamesh include:
- Domain-oriented decentralized data ownership and architecture
- Data as a product
- Self-serve data infrastructure as a platform
- Federated computational governance