Spark Nuggets
REPL
Print full classpath used to launch shell:
SPARK_PRINT_LAUNCH_COMMAND=1 bin/spark-shell
Scala REPL show defined terms
$intp.definedTerms.foreach(println)
Enter paste mode
:paste
Misc
User classpath
spark.yarn.user.classpath.first
spark.files.userClassPathFirst=true
Building Spark via Maven
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
Other Spark Libraries
- http://spark-packages.org/
Scala - rename import
import org.apache.spark.mllib.linalg.{Vector => SparkVector}