RDBMS is dead, long live the RDBMS!Posted on January 20, 2017 by TheStorageChap in Big Data and Analytics, Exasol, RDBMS
So, after over 12 years at (Dell) EMC, I recently made the decision to leave and join EXASOL.
When I told my friends and family they gasped with amazement. “But you had such a great job” they said. “You worked with talented colleagues; you got to travel across EMEA; talked with lots of different people, stayed in (sometimes) nice hotels; and when you were in the UK, much of the time you got to work from home!” And they were right. So the answer to the inevitable question “why are you leaving?” was something that I pondered over for some time before I accepted the role at EXASOL.
Over the past few years, I have spent a lot of time with a wide range of enterprise and mid-market customers talking about subjects including analytics, digital transformation, enterprise data warehouse modernisation, and the misnomer that is ‘big data.’ Ultimately, in all of these cases, customers of all sizes want to be able to make data-driven decisions that help them to:
- Increase sales
- Decrease cost
- Reduce risk
In all of these conversations there continues to be three common themes:
- How to simplify technology to accelerate business outcomes
- How to increase analytical performance to accelerate insights
- How to reduce the total cost of ownership of solutions
Enterprises have been told that Hadoop is the answer to their ‘big data’ problem. Whilst ‘big’ data can indeed be big, most organisations only need to analyse a small proportion of that data. I am not disputing that the amount of data (both structured and unstructured) a company will need to store and analyse will continue to grow, but in terms of analysis, the data is still relatively small. In fact, the latest KDnuggets poll continues to show that the majority of the largest data sets analysed are still in the GB range, rather than TB or PB.
This survey relates to the type of analytics completed specifically by data scientists, which might be different to the analytics being completed by general business, but even here the RAW data is normally in the low 100s of TBs.
What organisations do have is a performance problem. In order to gain competitive advantage they need to process the data that they have much more quickly, in some cases, in real time. A performance problem which the major enterprise data warehouse (EDW) vendors will tell you can be solved by adding lots more resources. Which in some cases it can. But there’s a catch: Those additional resources require additional database licenses.
The word ‘enterprise’ has become synonymous with expensive. Organisations already had issues with the amount of money, time and resources they had to invest in their ‘enterprise’ data warehouse even before they needed to process increasing amounts of data, increasingly faster.
The growing requirement to analyse more of the unstructured ‘big data’ has contributed to organisations undertaking EDW modernisation or optimisation projects, the ultimate aim of which is to reduce the cost of these systems by archiving cold data into Hadoop and moving ETL processes outside of the database to scale back the amount of compute power and storage, as well as the number of ‘core’ licenses required for the database. But this can add huge amounts of complexity and can see the hardware (both storage and compute) requirements simply shifted to Hadoop.
Hadoop has gained popularity because of the nature of its distributed processing capabilities, the ability to use technologies such as Hive to enable data warehousing and technologies like Hawq and Impala to then enable SQL language support. In the pursuit of performance and reduced cost, customers have effectively taken the Hadoop ecosystem of technologies and tried to build an MPP database with SQL support!
But this is not what Hadoop was designed for. Customers who have tried this have ended up with hundreds of HIVE nodes only to find that they could have done far more, more cost effectively and efficiently, using a purpose built in-memory, MPP solution. Hadoop provides a great mechanism to store large amounts of unstructured data cost-effectively. In the real world that data then needs to be wrangled and in many cases transformed into something more structured that could be analysed better/quicker outside of Hadoop.
With all of the hype surrounding big data it is easy to forget that most organisations still run their business on structured data. What they want is something that provides best-of-breed analytics performance at a reasonable price point, reduces the total cost of ownership and can closely integrate with the ecosystem of data management (including Hadoop) and visualisation tools to deliver business insights quickly and effectively.
This is why I chose to join EXASOL, a relatively small in-memory analytic database vendor that has spent the last ten years focused on building from scratch an easy-to-use, in-memory, massively parallel processing database. EXASOL has been quietly accruing an impressive customer base and now has the potential to dethrone the so-called market leaders of OLAP database solutions thanks to its ability to provide unrivalled performance at a price point that makes the analysis of business data, trend analysis and data modelling available to any organisation, large or small, either on-premise or in the cloud.
Personally I could not think of a more exciting time to join an organisation. EXASOL is at the point of crossing Geoffrey Moore’s chasm (some may argue given the customer base that they already have) and I look forward to being part of the journey as we build on the success of the early adopters and continue to put in place the necessary people, processes and technology to move to the next level.
Want to test it yourself? Check out the Free Small Business Edition of EXASOL and download a copy at www.exasol.com/download