Mike Stonebraker, PhD : “Make cleaning data everyone’s job”
Guest post below by my trusted colleague, friend & Co-Founder @ Tamr & Mike Stonebraker, PhD — great advice:
The cornerstone of data-driven enterprises is clean, integrated data. It’s just that simple. And as we head into 2023, here is my advice to you.
1.Move data, starting with decision support data, to the cloud as aggressively as possible
Starting with decision support is easier than starting with online transaction processing, which is why I suggest you start there. But instead of taking the traditional “lift and shift” approach, I strongly suggest that you incorporate data integration as part of your data movement to the cloud. Restructure, reformat, and integrate your data along the way so it is prepared for consumers of data within the enterprise.
2.Create an enterprise-wide strategy for managing the idiosyncrasy created by data silos
All enterprises today have many data silos and they are not going away. But creating the infrastructure required to eliminate the idiosyncrasy created by the silos generates huge potential business value. That’s why businesses need to develop an enterprise-wide strategy for mastering data across data silos. Currently, this occurs on an ad hoc basis for individual projects, with the emergence of “self service data prep” 10 years ago serving as a reflection of this problem. But when you implement a systematic way of mastering your data across your entire organization, you’ll realize greater benefits.
3.Put a Chief Data Integration Officer in charge of creating enterprise best practices
I believe this is the most important thing an enterprise can do. By documenting a consistent set of best practices for how an organization integrates and masters its data, organizations can apply learnings from the past to future projects.
4.Make cleaning data everyone’s job
If clean, integrated data is the goal, then cleaning data must be everyone’s responsibility. Whoever creates the data should feel obligated to make the data as clean as possible at the point of data entry and should curate data at the point of data creation. This is the aspirational goal, and will help organizations move towards a state where their data is cleaner and more integrated than before. Additionally, as much as Google Knowledge Graph, Apple
Knowledge Graph, and other consumer data organizations curate the world’s consumer data, the internal data organization inside of each company will become responsible for engaging creators and consumers of data in the curation process.
By applying this advice, you’ll accelerate your ability to become data-driven and provide the clean, integrated data your organization needs to succeed.