Data aggregations using Spark and Python

Intro

Aggregations can be broadly categorized into totals and by key. Spark brings power of in memory as well as rich APIs to aggregate data sets. In this blog I will discussing about implementation of aggregations using different actions and transformations.

  • Pre-requisites
  • Spark Introduction
  • Aggregations – totals
  • Aggregations – by key
  • Difference between groupByKey, reduceByKey, aggregateByKey etc
  • Computing by key averages
  • Computing by key min/max
 

Add Comment

Leave a Reply

shares

Big Data Introduction - YouTube live video

Please click here

Subscribe!