18/6/2018
Data Warehousing in the era of Big Data | LinkedIn Search
Paul Needleman
Follow
Specialist Master at Deloitte
169
17
46
Data Warehousing in the era of Big Data Published on June 12, 2018
Disclaimer: The views expressed here are mine alone and do not necessarily reflect the view of my current, former, or future employers.
Over the past decade, there has been a fundamental shift in the way we interact with and use technology. It’s almost hard to remember a time without smartphones, GPS, Facebook and Twitter, and always connected devices such has Phillips Hue light bulbs and Amazon Alexa. This "always-connected" world is creating data at an exponential rate. According to an insideBIGDATA article, data will grow by 50x from 2010 to 2020. The question is - how will organizations make use of the exponentially growing operational and third party data to make better informed decisions? In the past, the solution was the centralized Data Warehouse. This solution was elegant, efficient, and provided insights into organization data for business managers and executives. The Data Warehouse is still an effective platform to analyze organization data, but it lacks the flexibility to support the large and varied data sets being created though new mediums. Businesses want to ask any question, with any data, at any time to gain analytical insights. The highly structured nature of a Data Warehouse makes it inflexible to support queries without significant data modeling and data integration efforts. To support the on-demand needs of business users, Data Lakes, such as those built upon Hadoop, are being established to support data analysis of raw data that's structured (e.g., relational databases), semi-structured (e.g., XML, log files), or unstructured (e.g., emails, PDFs). Hadoop is the platform of choice for the Data Lake due to its distributed file system (HDFS) which allows for the storage of any data type and efficient processing of large volumes of data. https://www.linkedin.com/pulse/data-warehousing-era-big-paul-needleman-1/
Messaging
1/4
18/6/2018
Data Warehousing in the era of Big Data | LinkedIn
The Hadoop Data Lake enables self-service analytics where end users are now responsible for data preparation and structuring, rather than relying on IT departments. This platform enables business analysts and data scientists to be in control of data sets for purposes that could include open ended data discovery, data mining, statistical analysis, or data visualization. New tools and techniques such as data blending and data wrangling have been established to support these users. The end result - users are able to access more data, more quickly than before. In 2008, the typical flow to build an analysis went like this: Business User have a question to be answered or an analytic need Business Analysts gathers the requirements IT staff would build, test, and deploy the solution End users could use the results for reporting or data analysis Today, it looks more like this: IT provides access to raw data Business Analysts/Data Scientists prepare and analyze data, as needed Today’s process is much more efficient since it provides users with the data they need without IT departments being an impediment. The question that then comes to mind - is the last decade's investment in the traditional Data Warehouse for not? The answer is no. Both the Data Warehouse and Data Lake serve different purposes. The table below highlights some of the differences.
So what's one to make of all this? How does an organization determine what data should go where? How do they grapple with the new and various data and technologies? These are the questions that organizations will need to answer to achieve organizational efficiency though data driven decision making. Here is my take - it depends on the use case. Take the following two use cases and try to fit the technology to it. Not the other way around. Use Case 1: I would like to understand how my organizations sales information is affected by weather patterns. I also would like to use social media andMessaging sentiment l i
d
dh
h
https://www.linkedin.com/pulse/data-warehousing-era-big-paul-needleman-1/
ff
d
f
2/4
18/6/2018
Data Warehousing in the era of Big Data | LinkedIn
analysis to understand how weather affects moods of my customers. Use Case 2: I would like to understand sales by product line and have the ability to analyze by different attributes such as date, region, and customer. From there, I would like to drill into details to research anomalies or to determine trends or patterns. Use Case 1 is open ended since it’s not known exactly how weather patterns may or may not affect sales. It also has the potential for semi-structured data from 3rd party weather organizations (NOAA) and social media sites (Facebook and Twitter). It wouldn't be efficient to structure and model this data in the Data Warehouse, just to find out that weather isn't a good indicator of sales for my product. Use Case 2 has a defined question that's being asked. Yes, there is still some open ended analysis that can be done to determine trends and patterns, but this can be done by looking at the data in a structured manner, using a predefined model. It would take a data scientist a long time to relate sales, customer, product, location, and time information just for this analysis. A predefined structure makes this relatively easy for a business user. Also, if in Use Case 1 a correlation was determined between weather and sales, then it would make sense to add weather information to a structured model for future use. Data Warehouses will continue to be an important part of an organizations analytic strategy. And as we continue to obtain more and more data, I believe the Data Warehouse and Data Lake will become more and more blurred. Just as Use Case 1 requires weather data in the Data Lake, it could also become an important part of how decisions are made using the Data Warehouse. Likewise, relationships that are predetermined within the Data Warehouse could be leveraged to accelerate discovery within the Data Lake environment. As these technologies converge, technologists and business users will start to have more overlap in their daily duties. The organizations that are able to effectively utilize their data will have a competitive advantage moving to the future in the big data era.
Thank you to Mary Kate Sternitzke for your help with this article. Report this
169 Likes …
17 Comments Show previous comments 2d
Sibusiso Shabangu BI Consultant
Nice article...I'm so motivated to work on the practicality of the two scenarios Like
Reply
1 Like 2h
Leonardo Sagum, Jr Data Management Operator/Data Analyst at Formsexpress
Thanks for sharing. Started my career with Data Warehouse as the foundation to reporting and analytics and I agree that this will prevail along side with new technologies. Like
Reply
https://www.linkedin.com/pulse/data-warehousing-era-big-paul-needleman-1/
Messaging
3/4
18/6/2018
Data Warehousing in the era of Big Data | LinkedIn
Add a comment…
Paul Needleman Specialist Master at Deloitte
Follow
More from Paul Needleman
Dimensional Modeling - Part 2 of 2:
Dimensional Modeling - Part 1 of 2:
Creating a Successful Model
An Overview
Paul Needleman on LinkedIn
Paul Needleman on LinkedIn
Messaging
https://www.linkedin.com/pulse/data-warehousing-era-big-paul-needleman-1/
4/4