Based on statistics published by DataBricks, the top 3 applications using Apache Spark are the following.
Business Intelligence (BI) is deriving presentable & actionable information to help corporate executives, business managers & other stack holders to make informed business decision. Benefits of BI includes - accelerating, improving decisions and finding new opportunities.Customer Intelligence (CI) is the information derived from customer data that an organization can use to understand customer needs & serve better.
Before Spark, accessing few days of data took 24 hours. And, after using Spark, a year’s data get processed in a 10 min coffee break.
Now, because of real quick BI & CI due to Spark, businesses have a real competitive edge over their rivals. A simplest example is – Knowing customers early is an unparalleled advantage over your rivals.
Traditional data warehouses are great for structured data. But, current trend of data consist of 4 V’s (Volume, Velocity, Variety, and Veracity). Data is coming from various sources like smart phones, sensors, social media, log, transactions etc.Your competitive edge is processing it faster. Data warehouses built using Spark-SQL provide capability to address 4 V’s & gives an edge over other competitors.
Organizations get data from various sources in real-time like sensors, mobile, IoT devices, twitter, online transaction. All these data needs to be monitored & processed. So, the need of the hour is large-scale, real-time data processing capability.Streaming ETL – Data is continuously cleaned and aggregated prior to pushing it to stores. Spark Streaming solutions is used by companies like Pinterest to provide live insight how users are engaging with Pins across the world. Based on this Pinterest’s recommendation engine show more related pins.
Other, applications that uses Apache Spark are - RECOMMENDATION ENGINES, LOG PROCESSING, USER FACING SERVICES & FRAUD DETECTION
Big Data is a problem statement & what it means is size of data under process has grown to 100's of petabytes ( 1 PB = 1000TB ). Yahoomail generates some 40-50 PB of data everyday. Yahoo has to read those 40-50 PB of data & filter out spans. E-commerence w...
Data needs computation to get some information out. Size of data can be really huge. Huge data is broken down into chunks & stored across different systems.
difference between big data and spark, relationship between big data & spark