“One Size Does Not Fit All in Database Systems” — true in early 2000's — now we have the opposite problem….
More than 20 years ago — my partner Mike Stonebraker (MIT)and his academic colleague Uğur Çetintemel(Brown) published a paper entitled ‘“One Size Does Not Fit All” an idea whose time has come and gone’. The abstract from their paper is below :
“The last 25 years of commercial DBMS development can be summed up in a single phrase: “One size fits all”. This phrase refers to the fact that the traditional DBMS architecture (originally designed and optimized for business data processing) has been used to support many data-centric applications with widely varying characteristics and requirements. In this paper, we argue that this concept is no longer applicable to the database market, and that the commercial world will fracture into a collection of independent database engines, some of which may be unified by a common front-end parser. We use examples from the stream-processing market and the data warehouse market to bolster our claims. We also briefly discuss other markets for which the traditional architecture is a poor fit and argue for a critical rethinking of the current factoring of systems services into products.”
They were calling for a new era of innovation in database systems that was sorely overdue after 20+ years where most of industry focused on adopting the database systems that were originally designed/built in the late 1970’s and 1980’s — Oracle, IBM DB2, Microsoft SQL Server, Sybase, Teradata et al. All of these systems had the same fundamental design pattern that was the result of the key business challenge at the time — “OLTP” (Online Transaction Processing). Essentially storing the data that was generated from a person sitting at a terminal that was running some application (mainframe, client/server or other). During the ‘80’s and ‘90’s the need for database systems that were optimized for read oriented applications often referred to as “OLAP”(Online Analytical Processing) emerged. The key difference was the need to optimize for querying data regardless of how the data arrived in the system. The existing database vendors that had underlying engines optimized for write/OLTP repurposed their exisiting engines (they didn’t want to create a new code line) to satisfy read oriented workloads. Essentially the vendors were leading their customers down the path of using one engine for all their workloads. This was obviously beneficial for the vendors as it created more lock in with their customers and enabled them to create more value for their customers with the same fundamental engine with extensions that enabled the newer workloads. This resulted in features such as materialized views in Oracle — essentially caching systems that created copied of the data that were optimized for reads.
When the systems traditionally built for OLTP were repurposed for read oriented workloads — all kinds of sub-optimizations began to occur — what was incredible was how accommodating most database professionals were in the interest of making the engine they were familiar with work for new workloads. This was most acutely obvious when you had large amounts of data and a need for fast queries — many of us including the folks at Netezza, Vertica and many others rode this suboptimization of the old vendors during the early 2000’s to build new systems and companies.
Over the subsequent 20 years since Mike and Uğur published their paper — many people in academic research and in commercial companies have driven an incredible phase of new innovation in database systems that is best illustrated by the relatively superficial chart below from Gartner Group.
What is remarkable about this chart is that if you stepped the same data all the way back to 2000 — the list would have been an even smaller — Oracle, IBM (DB2), Microsoft (SQL Server), Sybase and Teradata. It’s shocking how quickly the industry responded to Mike and Uğur’s call to innovation and the needs of customers for database systems whose underlying design pattern better matched their specific workloads.
Mike and I have worked closely together on a number of innovative projects — Vertica (read oriented columnstore that started as academic project CStore @ MIT), Volt (high performance OLTP that ), Paradigm4 (array native) and our colleagues in academic and commercial database system research have driven a wide variety of fundamental new design patterns into commercial practice. I might also add that there have been many people duplicating the same fundamental design patterns — some doing better implementations of known patterns — some just poor mimics.
The cloud data platforms have amplified the number of options for storing data — AWS now has more than 13 different types of data storage services. Snowflake is truly an amazing system — credit to Thierry Cruanes, Benoit Dageville and Marcin Zukowski — truly awesome product and amazing company.
IMHO we are now at a point in database systems research where we have too many options and the bottleneck for people using database systems is proverbial selection of the right tool for the right job. In short — we need to make it easier for consumers/users/data professionals to select the tools that match their workloads. I see a few key leaders driving this — Andi Gutmans @ Google with Spanner, Colin Mahony @ AWS with Aurora and Redshift just to name a few.
There have been some innovations that went thru a winding path — for example the broad adoption of JSON via the adoption of document databases — MongoDB, CouchDB et al — was a fantastic step forward. Although I would argue that the “NoSQL” mantra of these companies was a massive distraction for our entire industry. All the products/companies that claimed “NoSQL” have added some sort of declarative language on top of their system to make it more accessible — and what do you know — many of them embraced SQL. NoSQL was a huge distraction imho — but I do appreciate having JSON.
Very interested to hear what others think.