Digital transformations are more important than ever before. But as countless enterprise leaders have learned, when they do not build a data-driven culture centered on managing data as an asset, their efforts are hardly transformative. At Tamr we’ve seen companies make significant strides over the past decade, but many still have a long road ahead. The proverbial “long pole in the tent” for becoming data-driven — having clean, curated, and continuously-updated data for many users to consume — is a problem that is as old as time in the enterprise — the solution is Data Products per this post from McKinsey. We’ve been through decades of attempts to deliver on this dream: data warehouses in the 1990s, data lakes in the 2000s, and most recently, cloud data platforms.
After years of prediction, the shift to cloud is here. Every organization I know is migrating their infrastructure to the cloud, some rapidly, some gradual. And it’s clear that as enterprises adopt cloud data platforms, they’ll continue to live in a hybrid state of cloud and on-prem for at least the next two decades. The growing pains from these migrations can be substantial, but so is the change. The shift from on-prem to cloud is no less significant than the shift from minicomputers to personal computers in the 1980s and 1990s. The cloud provides opportunity. But the reality is that a number of legacy, on-prem requirements will persist for the coming decades. Hybrid, while not the preference, will be the reality for the foreseeable future. Organizations — and vendors — who embrace this reality, are the ones that will succeed.
While cloud data platforms seem like just the latest in many generations of shiny new data technologies, the true bottleneck that will prevent success is, once again, the problem that we’ve had since we started automating business processes back in the 1970s: the resolution of the idiosyncrasies of the data from many operational systems into tables of clean, curated, continuously-updated data that is organized as required for consumption. Solving this age-old problem requires modern technology that uses the power of probabilistic modeling and leverages human expertise very efficiently. Over the coming year, many of the cloud data platforms will embrace next-generation data mastering as a keystone of the modern enterprise cloud data platform.
We’re also seeing the boundaries between structured and unstructured data disappear. To borrow the words of my trusted partner and Tamr Chief Product Officer, Anthony Deighton, “there is no such thing as unstructured data. Only data that is yet to be structured.” Anthony is exactly right and increasingly, we’re seeing tooling and infrastructure orienting around this fact.
Finally, the distinction between internal and external data is much less pronounced than it was in the past. In reality, much of the data in a company’s enterprise systems is incorrect. That’s why data enrichment is becoming increasingly more important — and necessary — to deliver clean, curated data. Companies are finally realizing that they must use external data to enrich and complete their internal data.
As we head into 2023, I’ll leave you with two final thoughts: anything that feels like a panacea is probably wrong and your biggest barrier to becoming data-driven isn’t technology. It’s people. We’ve reached the point where technology is no longer the issue. It’s a person’s ability to consume, organize, and understand the data available to them that is the most significant barrier we face.