Exposé has a new capability
Exposé: RETAIL Data Exposed. View the attached brochure to see what all the fuss is about
Exposé has a new capability
Exposé: RETAIL Data Exposed. View the attached brochure to see what all the fuss is about
I am increasingly asked by customers – Is the Data Warehouse dead?
In technology terms, 30 years is a long time. This is how old the Data Warehouse is – that makes the Data Warehouse an old timer. Can we consider it a mature yet productive worker, or is it a worker gearing up for a pension?
I come from the world of Data Warehouse architecture and in the mid to late naughties (2003 to 2010) whilst working for various high profile financial service institutions in the London market, Data Warehouses were considered all important and companies spent a lot of money on their design, development, and maintenance. The prevailing consensus was that you could not get meaningful, validated and trusted information to business users for decision support without a Data Warehouse (whether it followed an Inmon, or a Kimbal methodology – the pros and cons of which are not under the spotlight here). The only alternative for companies without the means to commit to the substantial investment typically associated with a Data Warehouse was to allow Report Writers to develop code against the source systems database (or a landed version thereof), but this, of course, leads to the proliferation of reports, and it caused a massive maintenance nightmare and it went against every notion of a single trusted source of the truth.
Jump ahead to 2011, and businesses started showing a reluctance to invest in Data Warehouses – a trend that accelerated from that point onward. My observations of the reasoning for this ranged from the cost involved, the lack of quick ROI, a low take-up rate, difficulty to align it to ongoing business change, and, more recently, a change in the variety, volume and velocity of data that businesses are interested in.
In a previous article “From watercooler discussion to corporate Data Analytics in record time” (https://exposedata.wordpress.com/2016/09/01/from-watercooler-discussion-to-corporate-data-analytics-in-record-time/) I stated that the recent acceleration of changes in the technology space, “…now allows for fast response to red-hot requirements…” and how the “…advent of a plethora of services in the form of Platform-, Infrastructure- and Software as a Service (PaaS, IaaS and SaaS)…are proving to be highly disruptive in the Analytics market, and true game changers.”
Does all of this mean the Data Warehouse is dead/ dying? Is it an old timer getting ready for pension, or does it still have years of productive contribution to a corporate data landscaper left?
My experience across the Business Intelligence and Data Analytics market, across multiple industries and technology taught me that:
A Data Warehouse is no longer a must-have for meaningful, validated and trusted information to the business users for decision support. As explained in the previous article the PaaS, SaaS and IaaS services that focus on Data Analytics (for example the Cortana Intelligence Suite in Azure (https://www.microsoft.com/en-au/cloud-platform/cortana-intelligence-suite), or the Amazon Analytics Products (https://aws.amazon.com/products/analytics/) allows for modular solutions that can be provisioned as required which collectively answers all the Data Analytics challenges and ensures data gets to users (no matter where it originates, its format, its velocity or its volume) fast, validated and in a business-friendly format.
But this does not mean that these modular Data Platforms that use a clever mix of PaaS, Saas, and IaaS services can easily provide some of the fundamental services provided by a Data Warehouse (or more accurately, components typically associated with a Data Warehouse), such as:
Where these requirements exist, a Data Warehouse (or more accurately, components typically associated with a Data Warehouse) is required. But even in those cases, a Data Warehouse (or more accurately, components typically associated with a Data Warehouse) will merely form part of a much larger Data Analytics Data Landscape. It will perform the workloads described above, and there is a larger data story delivered by complimentary services.
In the past, Data Warehouses were key to delivering optimized analytical models that normally manifested themselves in materialized Data Mart Star Schemas (the end result of a series of layers such as ODS, staging, etc.) Such optimized analytical models are now instead handled by business-friendly metadata layers (e.g. Semantic Models) that source data from any appropriate source of information, bringing fragmented sources together in models that are quick to develop and easy for the business to consume. These sources include those objects typically associated with a Data Warehouse/ Mart (for example materialized SCD2 Dimensions, materialized facts resulting from complex business rules, entities created for regulatory purposes, etc.) and they are blended with data from a plethora of additional sources. The business user still experiences that clean and easy to consume Star Schema-like model. The business-friendly metadata layer becomes the Data Mart, but is easier to develop, provides a quicker ROI, is much more responsive to business change, etc.
The data warehouse is not dead but its primary role as we knew it is fading. It is becoming complementary to a larger Data Analytics Platforms we see evolving. Some of its components will continue to fulfil a central role, but it will be surrounded by all manner of services and collectively these will fulfil the organisation’s data needs.
In addition, we see the evolution of Data Warehouse as a Service (DWaaS). This is not a Data Warehouse in the typical sense of the word as spoken of in this article, but rather a service optimized for Analytical Workloads. Can it serve those requirements typically associated with a Data Warehouse such as SCD2, materialization due to complex rules, hierarchies or regulatory requirements? Absolutely. But its existence does not change the need for those modular targeted architectures and the need for a much larger Data Analytics Data Landscape using a variety of PaaS (including DWaaS), IaaS and SaaS. It merely makes the hosting of typical DW workloads much simpler, better performing and more cost-effective. Examples of DWaaS are Microsoft’s Azure SQL DW and Amazon’s Redshift.
Just some highlights from 2016. Thank you to all our customers, and our awesome team.
This video shows a comprehensive solution geared around disruption in the Banking market, from transactions through to advanced analytics.
The viewer meets 3 different customers, their challenges and how the bank responds to them.
A solution that helps the after market service department of a manufacturer responds to machine issues proactively rather than reactively.
For those who have been involved in conventional Business Intelligence projects, you will be all too familiar with the likely contiguous chain of events and the likely outcome. It typically goes something like this: The idea is incubated by someone (very often this would be within ICT), a business case for a project is written, someone holding the purse strings approves it, suitable business and functional requirements are gathered-, written- and approved, the solution is architected and developed-, tested- and signed off, the solution is implemented, and voila, the business MIGHT use the solution.
These conventional solutions are problematic for a number of reasons:
Requirements are lukewarm at best – If the idea was not incubated by someone experiencing a real business pain point, or wanting to exploit a real opportunity, then the “requirements” gathered are always going to be skewed towards “what do you think you need” rather than “what do you know you need right now”.
Suboptimal outcome – Lukewarm requirements will lead to solutions that do not necessarily add value to the business as they are not based on real pain points or specific opportunities that need to be exploited. So when the formal SDLC process conclude the business may, in the spirit of trying to contribute to successful project outcomes, try to use the solution as is, or send it back for rework so that the solution at least satisfies some of their needs.
Low take-up rate – accepting a suboptimal solution so as to be a good team player, over time, usage will drop off as it’s not really serving a real need.
Costly rework and an expensive project – these issues means that the business often gets to see a final product late in the project lifecycle and often only then start thinking about what it could do for them if changes are applied to meet their real needs. The solution is sent back for changes, and that is very costly.
Technology has finally caught up with what business users have been doing for a long time.
For many years, given the costly and problematic outcomes of conventional BI, users have often preferred access to the data they need, rather than fancy reports or analytics. They would simply download the data into Excel and create their own really useful BI (see related blog here https://exposedata.wordpress.com/2016/07/02/power-bi-and-microsoft-azure-whats-all-the-fuss-about/ ). This and the advent of a plethora of services in the form of Platform-, Infrastructure- and Software as a Service (PaaS, IaaS and SaaS), and more recently BI specific services such as Data Warehouse as a Service (DWaaS), are proving to be highly disruptive in the Analytics market, and true game changers.
In the end, the basic outcome is still the same. Getting data from some kind of source to the business in a format that they can consume by converting it to useful information.
But technology and clever architectures now allow for fast response to red-hot requirements. This is game-changing as solutions are now nimble and responsive and can, therefore, respond to requirements often discussed informally when pain points and/ or opportunities are highlighted in the course of the normal business day such as in meetings, over drinks with colleagues and at the “watercooler”. The trick is to recognize these “requirements” and to relate them back to the opportunities that the new world of data and analytics provide. If that can be done, then the solutions typically respond much better to these organic requirements vs. solutions that respond to requirements incubated and elicited in a much more formal way.
I use this term to describe informal discussions around the organization about pain points or opportunities in the business. These pain points or opportunities represent organic requirements that should be responded to fast if they hold real value. This DOES NOT mean that formal requirements are no longer valid, not at all, but it means that we need to recognize that real requirements manifest itself in informal ways too. Here is an example of how a requirement can originate in a formal and in an informal way:
City planning realize that there are issues with parking availability as they receive 100s of calls each month from irate commuters stuck driving around looking for parking. It seems as if commuters are abandoning the city in favor of suburban shopping centers where ample parking is provided. This is not good for businesses in the city center, and not good for the city council.
The head of planning realizes that information (in the form of data) will be key to any of his decisions to deal with the problem, so requests reports on traffic volumes, finances and works management planning. It seems as if this data is not in the data warehouse, so a business analyst is employed to elicit the requirements around the reports required, and so the long and costly process starts.
The Head of Planning tells a colleague that he wishes he could expedite getting his hands on the information he needs but he has a limited budget so he cannot employ more resources to move his reports along quicker.
This is overheard by the Data Analytics consultant who realizes that in order to maximize supporting such an important decision, the Head of Planning will have to look at the issue from multiple angles which will likely not be provided by such formal reports. The data he needs, I.e. traffic volumes, finances and works management planning must be blended with other contextual data such as weather, events, date and time of day.
The consultant knows that:
He calls a meeting with the Head of Planning who is intrigued with the idea and the quick return on investment (ROI) at a fraction of the cost, and commissions the consultant to provide a proof of concept (POC) on the matter.
In my example (which is based one of our real-world examples using Microsoft Azure) the city has an existing investment in a cloud service. Also, note that I provide a high-level resource mapping of the POC and solution at the end of this blog for both Amazon Web Services (AWS)® and Microsoft Azure®.
The POC architecture involves the following resources (both Azure and AWS are shown):
Please note that the diagram below by no means implies a detailed design, but is a true representation of the high-level architecture we used to achieve the specific solution.
o A1: Into storage;
o A2: Into predictive analytics where it is blended with B and C;
o A3: Directly into real-time visualizations;
o B1: Into predictive analytics where it is blended with A and C;
o C1: Into predictive analytics where it is blended with A and B;
The result of such a real-world example POC was the realization by the business that very deep insights can be achieved by leveraging the appropriate data wherever it exists and by cleverly architecting solutions with components and services within easy reach, superior outcomes can be achieved fast.
The building blocks created in the POC was adopted and extended into a full production solution and it set the direction for future data analytical workloads.