"There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days."
- Eric Schmidt, Google, (in 2003).
Handling huge data sets in applications have many challenges. By now, you probably have heard or mentioned the same statement in every data oriented team discussion. Big Data is no longer a growing trend. Last year it vanished from the list of emerging technologies in Gartner's chart - it last featured on the chart in 2013, described as "peak of inflated expectations" (See below for the comparison). In fact, Big Data is now time-tested, accepted with a sound, secure, stable architecture. It is synced with information, generating efficient analytics by organizations, ranging from data oriented start-ups to big technology giants around the globe.
"So, does that mean I should start thinking about Big Data solutions for my existing/new systems?".
Well, (obviously) it depends. Given the plethora of tools exhibited on the Internet, there is much more to it than "big data" alone. If in essence, you hold a large "variety" of data, present in large "volume", through which you need to generate answers with optimum "velocity", you might be looking at the opening gates, called "the three V's" of big data. This might potentially mean that you are ready to look at better solutions that complement your data needs. But that again, is not enough. Do you need to turn your existing application assets to a bigger store for a theoretically possible performance? How big are your existing data security concerns and how would you deal with the same on the new infrastructure? Is your data "big" enough for these solutions?
These questions bring you to a point where you need to consider a few important things. You realize that preceding the thought of building an efficient big data infrastructure, lies the very need to build a decision on whether to introduce a big data infrastructure or not.
Understanding the 'Why?'
Microsoft's MSDN puts it in a very "simplistic" fashion -
"Organizations need a big data services to enable them to survive in a rapidly expanding and increasingly competitive market where the sources and the requirements to store data are growing at exponential rate."
Moreover these organizations are looking at solutions to store complex unstructured data which do not have predetermined schemas. Big data solutions do not force a schema onto the stored data. Rather you can store almost any type of structured, semi-structured, or unstructured data and then apply a suitable schema when you query this data. Big data solutions store the data in its raw format and apply a schema only when the data is read, which preserves all information within the data. This is directly in contrast to the way your existing traditional database does it.
Data magnitude determination
Now, let’s determine if your data is big enough. In 1997, the first documented use of the term “big data” appeared in a paper by scientists at NASA, describing the problem they had with visualization (i.e. computer graphics) as one that “provides an interesting challenge for computer systems: data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. We call this the problem of big data. When data sets do not fit in main memory (in core), or when they do not fit even on local disk, the most common solution is to acquire more resources.” Though it was a very vague definition which gradually led us to a more refined one given by Wikipedia today, their reasons to shift to additional resources should be the primary basis of your problem statement.
The next most important factor is what you wish to do with the data you have. The problem with big data solutions is that they are numerous. You get a Swiss knife when all you want is a screw-driver. Getting the right tool for the right job is often a big challenge to realize with the perspective of cost, efficiency and delivery constraints. Creating an understanding of these factors is a core requirement of big data infrastructure requirements.
Need for Advanced Analytics
A very interesting case that gives an idea on analytics, is one of a financial services firm which turned to big data in order to better identify which new client opportunities warrant the most investment. The company supplemented its customer demographic data with third party data purchased from eBureau (a provider of predictive analytics and information solutions). The data service provider appended sales lead opportunities with consumer occupations, incomes, ages, retail histories and related factors. The enhanced data set is then applied to an algorithm which identifies which new client leads should receive additional investment and which should not. The result has been an 11 percent increase in new client win rates while at the same time the firm has lowered sales related expenses by 14.5%.
Getting answers to complex business problems, analyzing existing values to predict faster and better business decisions, creating cost effective requirements to bring in more customers, exploiting machine learning analytics to make self learning systems: some benefits that big data analytics brings to the table. If your data is meant to give you such quick answers that your traditional databases can work out in significant timelines, or worse, cannot come down to these answers at all, then it’s time you start browsing through the inevitable.Today's ability to remain agile in the market with these benefits give organizations with DevOps services in this competitive edge that they didn't have before.
Right time for data transition and security concerns
Data-driven transition must begin with your business goals and objectives. Once you understand your business objectives, you are ready to create a roadmap for leveraging new data sources to help you achieve them. Jeff Hunter, vice president of the NA Insights & Data practice at Capgemini rightly says “By proper alignment of business and technology, firms can start to systematically go through business process and business models and start to ascertain whether a process contains qualitative elements that could be replaced by quantitative elements.”
Also technology is not enough to transform your organization into a data-driven organization. Creating a culture that understands data, securing the data and how to use it is just as important. In 2011, Sony suffered a public relations nightmare in the form of a data breach in its PlayStation Network that exposed the personal information of 77 million users of its cloud-based systems. Among many other examples such as these, the challenge of detecting and preventing advanced persistent threats has bought the importance of security responsibilities in light.
Hence, where you start in terms of both data security and technology will dictate the course of your data journey.
Big Data is the right way to look at, provided you know why you need it. Data is evolving and so is the outlook of organizations managing them. Several verticals of technology like IoT Application, Web and Cloud Analytics, Image Processing, Data science etc. have grown to realize the potential of data mining and the ‘magic’ answers they bring to the table. No doubt, the count of technology solutions provided in this field is growing at a fast pace, but catching the right fish on your application rod is a challenge in itself. There is no debating the fact that big data technologies are evolving rapidly.
Hence the sooner you adapt, the better answers you reach.