Data aggregations using Spark and Python


Aggregations can be broadly categorized into totals and by key. Spark brings power of in memory as well as rich APIs to aggregate data sets. In this blog I will discussing about implementation of aggregations using different actions and transformations.

  • Pre-requisites
  • Spark Introduction
  • Aggregations – totals
  • Aggregations – by key
  • Difference between groupByKey, reduceByKey, aggregateByKey etc
  • Computing by key averages
  • Computing by key min/max

Add Comment

Leave a Reply


Big Data Introduction - YouTube live video

Please click here