Skip to content
Cuelogic
  • Services
    • Services

      Build better software and explore engineering excellence with our industry-leading tech services.

      • Product Engineering
        • Product Engineering
          • Product Development
          • UX Consulting
          • Application Development
          • Application Modernization
          • Quality Assurance Services
          Menu
          • Product Development
          • UX Consulting
          • Application Development
          • Application Modernization
          • Quality Assurance Services
          Migrating application and databases to the cloud, moving from legacy technologies to a serverless platform for a FinTech organization.
          Download ❯
      • Cloud Engineering
        • Cloud Engineering
          • Cloud Services
          • DevOps Services
          • Cloud Migration
          • Cloud Optimization
          • Cloud Computing Services
          Menu
          • Cloud Services
          • DevOps Services
          • Cloud Migration
          • Cloud Optimization
          • Cloud Computing Services
          Building end-to-end data engineering capabilities and setting up DataOps for a healthcare ISV managing sensitive health data.
          Download ❯
      • Data & Machine Learning
        • Data & Machine Learning
          • Big Data Services
          • AI Consulting
          Menu
          • Big Data Services
          • AI Consulting
          Setting up a next-gen SIEM system, processing PB scale data with zero lag, and implementing real-time threat detection.
          Download ❯
      • Internet of Things
        • Internet of Things
          • IoT Consulting
          • IoT App Development
          Menu
          • IoT Consulting
          • IoT App Development
          Building a technically robust IoT ecosystem that was awarded the best implementation in Asia Pacific for a new age IoT business.
          Download ❯
      • Innovation Lab as a Service
        • Innovation Lab as a Service
          • Innovation Lab as a Service
          Menu
          • Innovation Lab as a Service
          Establishing an Innovation Lab for the world’s largest Pharma ISV, accelerating product innovation & tech research while ensuring BaU.
          Download ❯
      • Cybersecurity Services
        • Cybersecurity Services
          • Cybersecurity Services
          Menu
          • Cybersecurity Services
          Big Data Engineering at scale for IAC’s SIEM system, processing PB scale data to help brands like Tinder, Vimeo, Dotdash, etc.
          Download ❯
      • Healthcare IT Services
        • Healthcare IT Services
          • Healthcare IT Services
          Menu
          • Healthcare IT Services
          Upgrading a platform for patients to access doctors via chat or video consultation, modernizing design, & migrating infra to the cloud.
          Download ❯
  • Company
    • Company

      Find out why Cuelogic, a world-leading software product development company, is the best fit for your needs. See how our engineering excellence makes a difference in the lives of everyone we work with.

    • about usAbout

      Discover how Cuelogic is as a global software consultancy and explore what makes us stand apart.

    • CultureCulture

      Read about our free and open culture, a competitive edge that helps clients and employees thrive.

    • Current openingCurrent Openings

      Want to join us? Search current openings, check out the recruitment process, or email your resume.

  • Insights
  • Tell Us Your Project
Tell Us Your Project  ❯
Big Data  5 Mins Read  June 13, 2017  Cuelogic Insights

SMACK (Spark, Mesos, Akka, Cassandra and Kafka) & Fast Data

Share Via –
Share on facebook
Share on twitter
Share on linkedin

Home > SMACK (Spark, Mesos, Akka, Cassandra and Kafka) & Fast Data

As enterprises strive to realize and identify the value in Big Data, many now seek more agile and capable analytic systems. Some of the key factors supporting this include improving customer retention, influence product development and quality and increasing operational efficiencies among others.

Organizations are looking to enhance their analytics workloads to drive several real-time cases such as recommendation engines, fraud detection and advertising analytics. In this whitepaper, I will try to draw your attention toward the applications of SMACK and how it has emerged as a crucial open source technology to handle high volume and velocity of “Big Data”.

Big Data is not only Big it’s Fast also

The way that big data gets big is through a continuous and constant steam of incoming data. In high-volume environments, that information/data arrives at incredible rates, yet still needs to be stored and analyzed.

Less than a dozen years ago, it was nearly impossible to analyze and process those large chunks of data using commodity hardware. Today, Hadoop clusters built from thousands of nodes are almost everywhere to analyze petabytes of historical data.

Apache Hadoop is a software framework that is being adopted by many enterprises as a cost-effective analytics platform for Big Data analytics. Hadoop is an ecosystem of several services rather than a single product, and is designed for storing and processing petabytes of data in a linear scale-out model. Each service in the Hadoop ecosystem may use different technologies in the processing pipeline to ingest, store, process and visualize data.

What’s next After Big Data?

What Hadoop Can’t Do

The big data movement was pretty much driven by the demand for scale in volume, variety and velocity of data. This is often supported by data that is generated at high speeds, such as financial ticker data, click-stream data, sensor data or log aggregation. Certainly, Apache Hadoop is one of the best technologies to analyze all those large amount of data.

However, the technology has a limitation as it cannot be used when the data is moving or in motion. Some of the areas where the use of Hadoop is not recommended are transactional data. Transactional data, by its very nature, is highly complex, as a transaction on an ecommerce site can generate many steps that all have to be implemented quickly. That scenario is not at all ideal for Hadoop.

Nor would it be optimal for structured data sets that require very minimal latency, like when a Web site is served up by a MySQL database in a typical LAMP stack. That's a speed requirement that Hadoop would poorly serve.

Fast Data – Analyze data when they are in motion or in real-time

SMACK stands for Spark, Mesos, Akka, Cassandra and Kafka - a combination that's being adopted for 'fast data'.

The SMACK stack (Spark, Mesos, Akka, Cassandra and Kafka) is known to be as the ideal platform for constructing “fast data” applications. The term fast data underlines how big data applications and architectures are evolving to be stream oriented, so that the data is extracted as quickly as possible from incoming data, while still supporting traditional data scenarios such as batch processing, data warehousing, and interactive exploration.

The SMACK Stack

The SMACK Stack architecture pattern consists of following technologies:

  • Apache Spark (Processing Engine) - Fast and general engine for distributed, large-scale data processing. Often, used to implement ETL, queries, aggregations and applications of machine learning etc. Spark is well-suited to machine learning algorithms, and exposes its API in Scala, Python, R and Java. The approach of Spark is to provide a unified interface that can be used to mix SQL queries, machine learning, graph analysis, and streaming (micro-batched) processing.
  • Apache Mesos (The Container) – A flexible cluster resource management system that provides efficient resource isolation and sharing across distributed applications. Mesos uses a two-level scheduling mechanism where resource offers are made to frameworks (applications that run on top of Mesos).The Mesos master node decides how many resources to offer each framework, while each framework determines the resources it accepts and what application to execute on those resources. This method of resource allocation allows near-optimal data locality when sharing a cluster of nodes amongst diverse frameworks.
  • Akka (The Model) - Microservice development with high scalability, durability, and low-latency processing. A toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM (Java Virtual Machine).The actor model in computer science is a mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent computation: in response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and determine how to respond to the next message received.
  • Apache Cassandra (The Storage) - Scalable, resilient, distributed database designed for persistent, durable storage across multiple data centers. Most environments will also use a distributed file system like HDFS or S3.
  • Apache Kafka (The Broker) - A high-throughput, low-latency distributed messaging system designed for handling real-time data feeds. In Kafka, data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of coordinated consumers.

The SMACK stack helps in simplifying the processing of both real-time and batch workloads as single data stream in real or near real time.

SMACK-Architecture

SMACK stack: How to put the pieces together

Capturing value in fast data

The best way to capture the value of incoming data is to react to it the instant it arrives. Processing of incoming data in batches always takes lot of time which ultimately takes away the value of that data.

Basically, to handle or process data which is arriving at thousands to millions of events per second, two technologies are required i.e. a streaming system capable of delivering events as fast as they come in and a data store capable of processing each item as fast as it arrives.

Delivering the fast data 

In order to deliver fast data, two popular streaming systems i.e. Apache Storm and Apache Kafka have evolved in the market over the past few years. Apache Storm (developed by Twitter) reliably processes unbounded streams of data at rates of millions of messages per second.

On the other hand, Kafka (developed by LinkedIn), is a high-throughput distributed message queue system. Both streaming systems address the need of processing fast data. Kafka, however, stands apart.

Processing the fast data 

It is noteworthy that Hadoop MapReduce has very well solved the batch processing needs of customers across the industries. However, the introduction of more flexible developed big data tools for fast data processing paved the demand of big data darling Apache Spark.

Owing to its fast data processing power, the technology is setting fire across the entire big data world. The technology has actually taken over Hadoop in terms of fast data processing of interactive data mining algorithms and iterative machine learning algorithms.

Lambda Architecture is data processing architecture and has both stream and batch processing capabilities. However, most of the lambdas solutions cannot serve the two needs at the same time:

Handles a massive data stream in real time.

Handles different and multiple data models from multiple data source.

To easily meet both of these needs, we have Apache Spark which is mainly responsible for real-time analysis of both recent and historical information. The analysis results are often persisted in Apache Cassandra which helps the user to extract the real-time information anytime in case of failure.

Spark is increasingly adopted as an alternate processing framework to MapReduce, due to its ability to speed up batch, interactive and streaming analytics. Spark enables new analytics use cases like machine learning and graph analysis with its rich and easy to use programming libraries.

Recommended Content
Data Mesh – Rethinking Enterprise Data Architecture ❯
When and How to Leverage Lambda Architecture in Big Data ❯
Big Data Frameworks ❯
Go Back to Main Page ❯
Tags
big data hadoop SMACK fast data Spark Mesos Akka Cassandra Kafka Lambda Architecture
Share This Blog
Share on facebook
Share on twitter
Share on linkedin

Leave a Reply Cancel reply

People Also Read

Product Development

Low Code Platform: The Future of Software Development

8 Mins Read
Quality Engineering

BDD vs TDD : Highlighting the two important Quality Engineering Practices

8 Mins Read
DevOps

Getting Started With Feature Flags

10 Mins Read
Subscribe to our Blog
Subscribe to our newsletter to receive the latest thought leadership by Cuelogic experts, delivered straight to your inbox!
Services
Product Engineering
  • Product Development
  • UX Consulting
  • Application Development
  • Application Modernization
  • Quality Assurance Services
Menu
  • Product Development
  • UX Consulting
  • Application Development
  • Application Modernization
  • Quality Assurance Services
Data & Machine Learning
  • Big Data Services
  • AI Consulting
Menu
  • Big Data Services
  • AI Consulting
Innovation Lab as a Service
Cybersecurity Services
Healthcare IT Solutions
Cloud Engineering
  • Cloud Services
  • DevOps Services
  • Cloud Migration
  • Cloud Optimization
  • Cloud Computing Services
Menu
  • Cloud Services
  • DevOps Services
  • Cloud Migration
  • Cloud Optimization
  • Cloud Computing Services
Internet of Things
  • IoT Consulting
  • IoT App Development
Menu
  • IoT Consulting
  • IoT App Development
Company
  • About
  • Culture
  • Current Openings
Menu
  • About
  • Culture
  • Current Openings
We are Global
India  |  USA  | Australia
We are Social
Facebook
Twitter
Linkedin
Youtube
Subscribe to our Newsletter

We don't spam!

cuelogic

We are Hiring!

Blogs

Recent Posts

  • Low Code Platform: The Future of Software Development
  • BDD vs TDD : Highlighting the two important Quality Engineering Practices
  • Getting Started With Feature Flags
  • Data Mesh – Rethinking Enterprise Data Architecture
  • Top Technology Trends for 2021
cuelogic

We are Hiring!

Blogs

Recent Posts

  • Low Code Platform: The Future of Software Development
  • BDD vs TDD : Highlighting the two important Quality Engineering Practices
  • Getting Started With Feature Flags
  • Data Mesh – Rethinking Enterprise Data Architecture
  • Top Technology Trends for 2021
We are Global
India  |  USA  | Australia
We are Social
Facebook
Twitter
Linkedin
Youtube
Subscribe to our Newsletter

We don't spam!

Services
Product Engineering

Product Development

UX Consulting

Application Development

Application Modernization

Quality Assurance Services

Cloud Engineering

Cloud Services

DevOps Services

Cloud Migration

Cloud Optimization

Cloud Computing Services

Data & Machine Learning

Big Data Services

AI Consulting

Internet of Things

IoT Consulting

IoT Application Services

Innovation Lab As A Service
Cybersecurity Services
Healthcare IT Services
Company

About

Culture

Current Openings

Insights
Privacy Policy  
All Rights Reserved © Cuelogic 2021

Close

Do you have an app development challenge? We'd love to hear about it!

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.