Python: Extend class method and use the substituted in instance of the class

Question

I want to extend the read/write function of pyspark.sql.DataFrame for my own project needs. To that end I create the following

import pyspark.sql

class DataFrame(pyspark.sql.DataFrame):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def write(self, format="parquet", version=True):
        if format=="some_format":
            # do something
            super().write.format(format).save(path)

The trouble is that in the code we instantiate the super class object like this data = spark.range(0, 5) How do I go about converting a spark Dataframe to have my custom read/write method with minimal changes. Is this possible

score 0 · Answer 1 · answered Jun 18 '20 at 12:02

You can change the class of an existing object like this:

data = spark.range(0, 5)
data.__class__ = DataFrame   # _your_ DataFrame

For straightforward extensions of the parent class, this ought to work fine. In general there are all sorts of caveats to hacking class membership like this; for example, your own initializer has not been called on this variable.

Python: Extend class method and use the substituted in instance of the class

1 Answers1