A viable data collection strategy is determined in accordance with the technologies that are currently prevailing in your organization. In a bid to do so, strategies of every other department are reassessed and the right kind of foundational infrastructure is set to make the benefits of the data collection strategy reverberate through the entire organization.
Capabilities to process the data even at volumes in the range of petabytes are determined and introduced. This includes offline batch-data processing with the ability to work at highest possible power and the largest possible scale. This also makes real-time data stream processing possible along with the ability to derive valuable action points related to Business Intelligence.
Depending on the type of data that is being accumulated and the type of insights that need to be driven out of it, a data analysis technique among the various possible techniques is chosen. This can range from predictive analytics and data mining to textual and statistical analysis. Technologies such as Hadoop, YARN, NoSQL, Spark, Pig, Hive, MapReduce.
A data execution framework, out of the multiple available ones, is determined here. This can be one out of Hadoop ecosystems like Storm, Stanza, Spark, Flink, and Tez. We choose the framework that has the best chance of solving all your current issues.