0
  • For example, I have a object A and B
object ObjectA{

  def funcA1(a:String):String = "#" + a + "#"

  def funcA2(a:String, b: Int): String = a * b
}

object ObjectB{

  def funcB1(a:String):String = "&" + a + "&"

  def funcB2(a:String, b: Int): String = a.sum + b
}
  • I want to define a method in other places, Function as follows:
def registeredAllMethod(className:String):Unit = {
    // How to implement ?
    // How to implement ?
}

  • I want the function of registeredallmethod to pass in a class name, and then register all the methods in this class into Spark's UDF.The usage is as follows:
// If I use: 
registeredAllMethod("ObjectA")
// I can in sparkSQL such use:
sparkSession.sql("SELECT funcA1('test'),funcA2('test', 5)").show


// If I use: 
registeredAllMethod("ObjectB")
// I can in sparkSQL such use:
sparkSession.sql("SELECT funcB1('test'),funcB2('test', 5)").show

Thank you can see here patiently If you can solve this problem, I would be grateful!

Dmytro Mitin
  • 48,194
  • 3
  • 28
  • 66
NickWick
  • 63
  • 4

1 Answers1

0

You can try to make registeredAllMethod a macro

import scala.language.experimental.macros
import scala.reflect.macros.blackbox

object Macros {
  def registeredAllMethod(className:String): Unit = macro registeredAllMethodImpl

  def registeredAllMethodImpl(c: blackbox.Context)(className:c.Tree): c.Tree = {
    import c.universe._
    val classNameStr = c.eval(c.Expr[String](className))

    val moduleSymbol = c.mirror.staticModule(classNameStr)

    val calls = moduleSymbol.typeSignature.decls.toList
      .filter(decl => decl.isMethod && !decl.isConstructor)
      .map(methodSymbol =>
        q"sparkSession.udf.register(${methodSymbol.name.toString}, $methodSymbol _)"
      )

    q"..$calls"
  }
}

https://gist.github.com/DmytroMitin/0f8d044d839756dd68ee901703e68ee6

Other options don't seem to work:

  • Scala toolbox produces java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF

https://gist.github.com/DmytroMitin/615e7420b7de5d209c0631f269129f9a

  • Real Scala compiler behaves similarly

https://gist.github.com/DmytroMitin/28936be58ba943d7771d7d4ede58abff

  • Java reflection (withLambdaMetafactory) produces org.apache.spark.SparkException: Task not serializable, Caused by: java.io.NotSerializableException: App$$$Lambda$994/768702707

https://gist.github.com/DmytroMitin/387e75ed39148fc8e70839584392d946

  • Scala reflection (with toolbox) also produces one of the above two exceptions depending on whether we feed to .register a lambda or an instance of anonymous class

https://gist.github.com/DmytroMitin/2a292d35f3c3ac5cf96d22dd81721366

Something in Spark reflection breaks. So macros seem to be the best option.


Actually I managed to fix "Java reflection" approach but it's not so easy

https://gist.github.com/DmytroMitin/68909e971141f442f75fa09c46f69b16

The trick is to create new FunctionN with Serializable {...}. But I didn't manage to do this with runtime compilation (e.g. with reflective toolbox; whatever I do I receive a lambda rather than an instance of a class), only with bytecode manipulation (with Javassist).

Macros seem to be easier.


Also you can make defs in your objects vals and then serialization issues should disappear

https://gist.github.com/DmytroMitin/4000bfc43cb1343578c4dc5d18acf6dc

Dmytro Mitin
  • 48,194
  • 3
  • 28
  • 66
  • https://stackoverflow.com/questions/28186607/java-lang-classcastexception-using-lambda-expressions-in-spark-job-on-remote-ser https://stackoverflow.com/questions/28079307/unable-to-deserialize-lambda https://stackoverflow.com/questions/25443655/possibility-to-explicit-remove-serialization-support-for-a-lambda https://stackoverflow.com/questions/28186607/java-lang-classcastexception-using-lambda-expressions-in-spark-job-on-remote-ser – Dmytro Mitin Oct 08 '22 at 22:29