Get ready for more real-time data, non-relational or unstructured data along with need for self-service tools, data governance and data quality
The year 2016 was an important one in the world of big data. What used to be hype became the norm as more businesses realized that data and the related infrastructure, in all forms and sizes, is critical to making the best possible decisions. In 2017, we will be able to see continued growth of systems that support massive volumes of non-relational or unstructured data as well as move towards processing data in more real-time.Data Governance and Data Quality will grow in importance as organizations bring on additional data sources into the decision making. These systems will evolve and mature to operate well inside of enterprise IT systems and standards. This will enable both business users and data scientists to fully realize the value of big data and move to Advanced Analytics. The top big-data trends for the following year
- Hadoop projects mature! Enterprises continue their move from Hadoop Proof of Concepts to Production: In a recent survey of 2,200 Hadoop customers, only three per cent of respondents anticipate they will be doing less with Hadoop in the next 12 months. 76 per cent of those who already use Hadoop plan on doing more within the next 3 months and finally, almost half of the companies that haven’t deployed Hadoop say they will within the next 12 months. As further evidence to the growing trend of Hadoop becoming a core part of the enterprise IT landscape, we’ll see investment grow in the components surrounding enterprise systems such as security.
- End-users – Meet Big Data: Options expand to access data in Hadoop: With Hadoop gaining more traction in the enterprise, we see a growing demand from end users for the same fast data access capabilities they’ve come to expect from traditional data warehouses. To meet that end user demand, we see growing adoption of technologies that enable the business users to access data directly from Hadoop, further blurring the lines behind the “traditional” BI concepts and the world of “Big Data”. However, this is one area where current technology does not completely meet the querying needs of the end users and we expect vendors to make significant improvements in the coming year.
- The number of options for preparing end users to discover all forms of data grows: Self-service data preparation tools are exploding in popularity. This is in part due to the shift toward business user-generated data discovery tools that considerably reduces time to analyze data. Business users also want to be able to reduce the time and complexity of preparing data for analysis, something that is especially important in the world of big data when dealing with a variety of data types and formats.
- Data Warehouse growth is heating up… in the cloud: The “death” of the data warehouse has been overhyped for some time now, but it’s no secret that growth in this segment of the market has been slowing. But we now see a major shift in the application of this technology to the cloud where Amazon led the way with an on-demand cloud data warehouse. Analysts cite 90 per cent of companies who have adopted Hadoop will also keep their data warehouses and with these new cloud offerings, those customers can dynamically scale up or down the amount of storage and compute resources in the data warehouse relative to the larger amounts of information stored in their Hadoop data lake.
- The buzzwords converge! IoT, Cloud and Big Data come together: The technology is still in its early days, but the data from devices in the Internet of Things will become one of the “killer apps” for the cloud and a driver of petabyte scale data explosion. For this reason, we see leading cloud and data companies such as Google, Amazon Web Services, IBM and Microsoft bringing Internet of Things services to life where the data can move seamlessly to their cloud based analytics engines.
- Data Governance and Data Quality continue to gain prominence: Organizations who are leveraging data for competitive advantage have realized that data quality is key to making the right decisions. Given this, we see organizations investing in Data Governance initiatives including establishing a CoE with Data Stewards, looking at “Mastering” key data elements such as Customer and Product, being able to “profile” data in Data Lakes as they land, allowing business users to create rules and set alerts, focus on end-to-end data lineage and manage metadata (business and technical) better. Some of it is necessitated by regulations, especially in the Banking and Financial services industry, and others out of a need to have more faith in the data itself.
- Organizations have too much of the same data: In traditional Data Warehousing architecture where the data from the source is moved to a Staging, then 3NF DW followed by a Reporting oriented Data Mart, there is too much duplication (sometimes for a good reason) and lag between the time data was created to the time of consumption. However, there was not a better way to do this other than reporting directly off the source thereby spawning “Reporting Marts”. With the advances in Data Virtualization technology, it is possible to create a “logical” view of data that resides across multiple source systems and Data Warehouse type databases without having to push the data to an integrated data storage structure. This helps reduce duplication, and improves business agility besides helping keep costs lower.
- Advanced Analytics goes mainstream: Analytics has been mainly restricted to Descriptive Analytics with some organizations or certain functions in an organization leveraging Predictive Analytics. Quant oriented industries have been leveraging Advanced Analytics for many years. With the advent of Big Data technologies, the traditional models have now become more sophisticated with additional variables. The main challenge in moving from descriptive to predictive continuum has been the inability to scale the computing power and storage at the rate at which data is exploding.
We also expect unsupervised learning and the advanced machine learning techniques such as deep learning in the field of neural networks will become more mainstream in 2017 because of its ability to do auto feature selection, extraction and engineering. Also, use of Cognitive API’s from IBM Watson will see higher adoption especially as they become more industry specific.
The article was originally published on CIOL and is re-posted here by permission.