Basic / Routine / Common operations!

Use DataFrameBuild.build_array to get some sample string, integer, or double columns

mystr = DFB.build_array("string",num=12,width=8)
myint = DFB.build_array("integer",num=12,nrange=(0,4))    # inclusive on range
mydoub = DFB.build_array("double",num=12,nrange=(10,10.1))

Example DataFrames

Some sample dataframes to copy.

Read/Write

User Defined Functions for Date, Datetime join, udf, lambda, date, datetime, withColumn, drop, concat, unique identifier

# UDF's with lambdas!

# functions:
def nohyphens(z):
    return re.sub("-","",str(z))

# udf's:
udf_nohyphen_str = udf(lambda z: nohyphens(z),StringType())
udf_nohyphen_int = udf(lambda z: int(nohyphens(z)),IntegerType())

# remove the hyphens!
df_nohyphen = df_dated2\
.withColumn("int-date",udf_nohyphen_int(col("date-hyphen")))\

unique_id = [col('make'),lit('_'),col('vin'), lit('_'), col('year')]

df_id = df_sales.withColumn('key',concat(*unique_id))

(coming soon!)

Read in dataframe, do a basic comparison.
Read/write for loop with some count/logical comparisons.

spark2_dfanalysis

Dataframe analysis in PySpark

Basic / Routine / Common operations!