Categories
Scalability

What is HTAP?

I’ve begun seeing the database-related acronym HTAP thrown about more and more, so I did a little research to understand its meaning and implications.
The acronym HTAP has been established by Gartner (by Wikipedia) as follows:
Hybrid transaction/analytical processing (HTAP) is an emerging application architecture that “breaks the wall” between transaction processing and analytics. It enables more informed and “in business real time” decision making.
So, yeah, that doesn’t really add too much to our understanding. Like many other categorizations in the world of databases, the precise definition of what HTAP is, what it isn’t, etc. is relatively loose and informal. Its applicability to any one database system is potentially disputable and arguable on a case-by-case basis.
But there is something noteworthy here. It is–and I will try not to gag when I write this–a paradigm shift. To see why, it might help to briefly remind ourselves of the distinction that is made between OLTP and OLAP databases.
An Online Transactional Processing (OLTP) database is the system that receives the initial writes from the source, usually measured in the range of  thousands of inserts/second. Traditionally, such databases are built to handle a high volume of relatively small amounts of data per write. An OLTP database is often considered the system of record or source of truth. It is the canonical source of the information for the organization, so the database is entrusted with making sure the data is accurate, consistent, up-to-date, and so on. Users of the database can rely on the data to have these characteristics and may even return to the database to retrieve updates or corrections as needed. Any reads are usually designed by the user to be very specific, use as few tables as possible, return as little data as possible, and take advantage of indexes.
On Online Analytical Processing (OLAP) database is where data is analyzed after being collected by the OLTP database. OLAP databases are “read-mostly” and are focused on being flexible enough to handle the myriad of query access patterns all while keeping latency of query results to a minimum. OLAP databases generally receive data in batch or bulk both from the upstream OLTP database as well as potentially other, complementary sources of data. OLAP databases employ their own user-level data organization techniques (star schemas, wide tables, fact tables, etc.) which require that they organize the data internally much differently than OLTP databases to allow for ad hoc queries of the data to perform more efficiently.
But as regards HTAP, the most critical point of note between an OLTP and an OLAP systems is… they’re separate systems. They each manage data separately. Getting data from the OLTP system to the OLAP system necessitates a copy of the data. This copying step is time-consuming to write, introduces points of failure and error, is time-consuming to process, leading to data staleness and edge cases where the data is out of sync. This is all in addition to the need to purchase and maintain two separate systems to begin with.
HTAP is an attempt to bring those two systems together into one and eliminate those (very real) issues. HTAP is the Great Conjunction of database systems. HTAP seeks to:
  • Eliminate ETL process
  • Provide drill-down from analytical results that can point to recent data
  • Eliminate multiple copies of data, multiple servers, and their operational complexity
How is this hybrid accomplished? Basically, an HTAP system is created with the following characteristics:
  • Data is kept primarily in memory
  • Data is distributed, allowing scale out of the data to allow for more processing power
  • Peers dedicated to executing analytical queries
As usual, existing vendors are retooling their systems to provide an HTAP experience while new vendors may proclaim themselves as “HTAP native” from the start.
As with all things in life, YMMV in terms of the benefits you can expect to see from adopting techniques or systems from the HTAP world. According to the latest Gartner Hype Cycle for Data Management, HTAP is “Climbing the Slope” and will likely soon be on the cusp of the “Peak of Inflated Expectations”, so be pragmatic about its benefits, don’t get oversold on the hype, and get management’s buy-in for PoCs before writing any checks.