spark2_dfanalysis

Dataframe analysis in PySpark

View on GitHub

Let’s explore Spark DataFrames

Dataframe analysis in PySpark!

Configs: Initial Configuration / Spark settings for your Jupyter notebook!

Basics: Read, write, generate a sample DF! In case you don’t have big data sets, build some quickly! Let’s get some standard reading, writing and partitioning examples down.

Analysis: Let’s do some common dataframe manipulations. I’ll show what I expect are some common column formatting issues, occurrences, and operations you’re likely to see. We’ll cover aggregations, grouping, and ordering.

Transformations: (coming soon!)

Joins: Join!