0

I want to extend the read/write function of pyspark.sql.DataFrame for my own project needs. To that end I create the following

import pyspark.sql

class DataFrame(pyspark.sql.DataFrame):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def write(self, format="parquet", version=True):
        if format=="some_format":
            # do something
            super().write.format(format).save(path)

The trouble is that in the code we instantiate the super class object like this data = spark.range(0, 5) How do I go about converting a spark Dataframe to have my custom read/write method with minimal changes. Is this possible

Fizi
  • 1,749
  • 4
  • 29
  • 55

1 Answers1

0

You can change the class of an existing object like this:

data = spark.range(0, 5)
data.__class__ = DataFrame   # _your_ DataFrame

For straightforward extensions of the parent class, this ought to work fine. In general there are all sorts of caveats to hacking class membership like this; for example, your own initializer has not been called on this variable.

alexis
  • 48,685
  • 16
  • 101
  • 161