Skip to content

AnyBlox/spark-anyblox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark AnyBlox

This repository contains the AnyBlox plugin for Spark.

Building

You need Java 11, Maven 3.9, SBT 1.10, and Scala 2.12. We recommend SKDMan for managing those.

After that simply run sbt package. The .jar file will be produced in target/scala-2.12.

Installation

The plugin needs to be registered with Spark in spark-defaults.conf:

spark.plugins                                           org.anyblox.spark.AnyBloxPlugin

You will need the following Arrow jars to be plugged in as well:

You can then run spark-shell by passing required packages and jars:

/opt/spark/bin/spark-shell --packages org.scala-lang:toolkit_2.12:0.1.7 --jars "/anyblox/anyblox-spark_2.12-0.1.0-SNAPSHOT.jar,/arrow/arrow-c-data-18.1.0.jar,/arrow/arrow-vector-18.1.0.jar"

Usage

Open .any files as dataframes using standard Spark syntax:

val df = spark.read.format("anyblox").load("/path/to/data.any")

You can use the dataframe like any other Spark df, e.g. create a view and query it with SQL:

df.createTempView("myview")
spark.sql("SELECT * FROM myview").show

About

Spark plugin for AnyBlox

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published