I want to extend the read/write function of pyspark.sql.DataFrame for my own project needs. To that end I create the following
import pyspark.sql
class DataFrame(pyspark.sql.DataFrame):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def write(self, format="parquet", version=True):
if format=="some_format":
# do something
super().write.format(format).save(path)
The trouble is that in the code we instantiate the super class object like this data = spark.range(0, 5)
How do I go about converting a spark Dataframe to have my custom read/write method with minimal changes. Is this possible