spark2_dfanalysis

Dataframe analysis in PySpark

View on GitHub

(home)

Basic / Routine / Common operations!

Building DataFrames

Example DataFrames

Read/Write

User Defined Functions for Date, Datetime join, udf, lambda, date, datetime, withColumn, drop, concat, unique identifier

# UDF's with lambdas!

# functions:
def nohyphens(z):
    return re.sub("-","",str(z))

# udf's:
udf_nohyphen_str = udf(lambda z: nohyphens(z),StringType())
udf_nohyphen_int = udf(lambda z: int(nohyphens(z)),IntegerType())

# remove the hyphens!
df_nohyphen = df_dated2\
.withColumn("int-date",udf_nohyphen_int(col("date-hyphen")))\
unique_id = [col('make'),lit('_'),col('vin'), lit('_'), col('year')]

df_id = df_sales.withColumn('key',concat(*unique_id))

(coming soon!)