Kotlin · Jolanrensen · Jun 3, 2025 · Jun 4, 2025 · Jun 5, 2025 · Jun 5, 2025
diff --git a/README.md b/README.md
@@ -11,14 +11,16 @@
 Kotlin DataFrame aims to reconcile Kotlin's static typing with the dynamic nature of data by utilizing both the full power of the Kotlin language and the opportunities provided by intermittent code execution in Jupyter notebooks and REPL.   
 
 * **Hierarchical** — represents hierarchical data structures, such as JSON or a tree of JVM objects.
-* **Functional** — data processing pipeline is organized in a chain of `DataFrame` transformation operations. Every operation returns a new instance of `DataFrame` reusing underlying storage wherever it's possible.
+* **Functional** — the data processing pipeline is organized in a chain of `DataFrame` transformation operations.
+* **Immutable** — every operation returns a new instance of `DataFrame` reusing underlying storage wherever it's possible.
 * **Readable** — data transformation operations are defined in DSL close to natural language.
 * **Practical** — provides simple solutions for common problems and the ability to perform complex tasks.
 * **Minimalistic** — simple, yet powerful data model of three column kinds.
-* **Interoperable** — convertable with Kotlin data classes and collections.
+* **Interoperable** — convertable with Kotlin data classes and collections. This also means conversion to/from other libraries' data structures is usually quite straightforward!
 * **Generic** — can store objects of any type, not only numbers or strings.
 * **Typesafe** — on-the-fly generation of extension properties for type safe data access with Kotlin-style care for null safety.
 * **Polymorphic** — type compatibility derives from column schema compatibility. You can define a function that requires a special subset of columns in a dataframe but doesn't care about other columns.
+  In notebooks this works out-of-the-box. In ordinary projects this requires casting (for now).
 
 Integrates with [Kotlin kernel for Jupyter](https://github.com/Kotlin/kotlin-jupyter). Inspired by [krangl](https://github.com/holgerbrandl/krangl), Kotlin Collections and [pandas](https://pandas.pydata.org/)
 

diff --git a/build.gradle.kts b/build.gradle.kts
@@ -196,7 +196,7 @@ allprojects {
             logger.warn("Could not set ktlint config on :${this.name}")
         }
 
-        // set the java toolchain version to 11 for all subprojects for CI stability
+        // set the java toolchain version to 21 for all subprojects for CI stability
         extensions.findByType<KotlinJvmProjectExtension>()?.jvmToolchain(21)
 
         // Attempts to configure buildConfig for each sub-project that uses it

diff --git a/docs/StardustDocs/topics/overview.md b/docs/StardustDocs/topics/overview.md
@@ -36,30 +36,32 @@ The goal of data wrangling is to assure quality and useful data.
 
 ## Main Features and Concepts
 
-* [**Hierarchical**](hierarchical.md) — the Kotlin DataFrame library provides an ability to read and present data from different sources including not only plain **CSV** but also **JSON** or **[SQL databases](readSqlDatabases.md)**.
-That’s why it has been designed hierarchical and allows nesting of columns and cells.
-
-* [**Interoperable**](collectionsInterop.md) — hierarchical data layout also opens a possibility of converting any objects 
-structure in application memory to a data frame and vice versa.
-
-* **Safe** — the Kotlin DataFrame library provides a mechanism of on-the-fly [**generation of extension properties**](extensionPropertiesApi.md) 
+* [**Hierarchical**](hierarchical.md) — the Kotlin DataFrame library provides an ability to read and present data from different sources, 
+including not only plain **CSV** but also **JSON** or **[SQL databases](readSqlDatabases.md)**.
+This is why it was designed to be hierarchical and allows nesting of columns and cells.
+* **Functional** — the data processing pipeline is organized in a chain of [`DataFrame`](DataFrame.md)  transformation operations.
+* **Immutable** — every operation returns a new instance of [`DataFrame`](DataFrame.md)  reusing underlying storage wherever it's possible.
+* **Readable** — data transformation operations are defined in DSL close to natural language.
+* **Practical** — provides simple solutions for common problems and the ability to perform complex tasks.
+* **Minimalistic** — simple, yet powerful data model of three [column kinds](DataColumn.md#column-kinds).
+* [**Interoperable**](collectionsInterop.md) — convertable with Kotlin data classes and collections.
+  This also means conversion to/from other libraries' data structures is usually quite straightforward!
+  See our examples for some conversions between DataFrame and [Apache Spark](TODO), [Multik](TODO), and [JetBrains Exposed](TODO).
+* **Generic** — can store objects of any type, not only numbers or strings.
+* **Typesafe** — the Kotlin DataFrame library provides a mechanism of on-the-fly [**generation of extension properties**](extensionPropertiesApi.md) 
 that correspond to the columns of a data frame. 
 In interactive notebooks like Jupyter or Datalore, the generation runs after each cell execution. 
 In IntelliJ IDEA there's a Gradle plugin for generation properties based on CSV file or JSON file. 
 Also, we’re working on a compiler plugin that infers and transforms [`DataFrame`](DataFrame.md) schema while typing.
 You can now clone this [project with many examples](https://github.com/koperagen/df-plugin-demo) showcasing how it allows you to reliably use our most convenient extension properties API.
 The generated properties ensure you’ll never misspell column name and don’t mess up with its type, and of course nullability is also preserved.
-
-* **Generic** — columns can store objects of any type, not only numbers or strings.
-
 * [**Polymorphic**](schemas.md) —
-  if all columns of [`DataFrame`](DataFrame.md) are presented in some other dataframes,
-  then the first one could be a superclass for latter. 
-Thus,
-  one can define a function on an interface with some set of columns
-  and then execute it in a safe way on any [`DataFrame`](DataFrame.md) which contains this set of columns.
-
-* **Immutable** — all operations on [`DataFrame`](DataFrame.md) produce new instance, while underlying data is reused wherever it's possible
+  if all columns of a [`DataFrame`](DataFrame.md) instance are presented in another dataframe,
+  then the first one will be seen as a superclass for the latter. 
+This means you can define a function on an interface with some set of columns
+  and then execute it safely on any [`DataFrame`](DataFrame.md) which contains this same set of columns.
+  In notebooks, this works out-of-the-box.
+  In ordinary projects, this requires casting (for now).
 
 ## Syntax
 

diff --git a/docs/StardustDocs/topics/schemasInheritance.md b/docs/StardustDocs/topics/schemasInheritance.md
@@ -18,7 +18,7 @@ New schema interface for `filtered` variable will be derived from previously gen
 interface DataFrameType1 : DataFrameType
 ```
 
-Extension properties for data access are generated only for new and overriden members of `DataFrameType1` interface:
+Extension properties for data access are generated only for new and overridden members of `DataFrameType1` interface:
 
 ```kotlin
 val ColumnsContainer<DataFrameType1>.age: DataColumn<Int> get() = this["age"] as DataColumn<Int>

diff --git a/examples/README.md b/examples/README.md
@@ -7,6 +7,18 @@
 * [json](idea-examples/json) Using OpenAPI support in DataFrame's Gradle and KSP plugins to access data from [API guru](https://apis.guru/) in a type-safe manner
 * [imdb sql database](https://github.com/zaleslaw/KotlinDataFrame-SQL-Examples) This project prominently showcases how to convert data from an SQL table to a Kotlin DataFrame 
 and how to transform the result of an SQL query into a DataFrame.
+* [unsupported-data-sources](idea-examples/unsupported-data-sources) Showcases of how to use DataFrame with
+  (momentarily) unsupported data libraries such as [Spark](https://spark.apache.org/) and [Exposed](https://github.com/JetBrains/Exposed).
+They show how to convert to and from Kotlin Dataframe and their respective tables.
+  * **JetBrains Exposed**: See the [exposed folder](./idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/exposed)
+    for an example of using Kotlin Dataframe with [Exposed](https://github.com/JetBrains/Exposed).
+  * **Apache Spark**: See the [spark folder](./idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/spark)
+    for an example of using Kotlin Dataframe with [Spark](https://spark.apache.org/).
+  * **Spark (with Kotlin Spark API)**: See the [kotlinSpark folder](./idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/kotlinSpark)
+    for an example of using Kotlin DataFrame with the [Kotlin Spark API](https://github.com/JetBrains/kotlin-spark-api).
+  * **Multik**: See the [multik folder](./idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/multik)
+    for an example of using Kotlin Dataframe with [Multik](https://github.com/Kotlin/multik).
+
 
 ### Notebook examples
 

diff --git a/examples/idea-examples/unsupported-data-sources/build.gradle.kts b/examples/idea-examples/unsupported-data-sources/build.gradle.kts
@@ -0,0 +1,73 @@
+plugins {
+    application
+    kotlin("jvm")
+
+    id("org.jetbrains.kotlinx.dataframe")
+
+    // only mandatory if `kotlin.dataframe.add.ksp=false` in gradle.properties
+    id("com.google.devtools.ksp")
+}
+
+repositories {
+    mavenLocal() // in case of local dataframe development
+    mavenCentral()
+}
+
+dependencies {
+    // implementation("org.jetbrains.kotlinx:dataframe:X.Y.Z")
+    implementation(project(":"))
+
+    // exposed + sqlite database support
+    implementation(libs.sqlite)
+    implementation(libs.exposed.core)
+    implementation(libs.exposed.kotlin.datetime)
+    implementation(libs.exposed.jdbc)
+    implementation(libs.exposed.json)
+    implementation(libs.exposed.money)
+
+    // (kotlin) spark support
+    implementation(libs.kotlin.spark)
+    compileOnly(libs.spark)
+    implementation(libs.log4j.core)
+    implementation(libs.log4j.api)
+
+    // multik support
+    implementation(libs.multik.core)
+    implementation(libs.multik.default)
+}
+
+/**
+ * Runs the kotlinSpark/typedDataset example with java 11.
+ */
+val runKotlinSparkTypedDataset by tasks.registering(JavaExec::class) {
+    classpath = sourceSets["main"].runtimeClasspath
+    javaLauncher = javaToolchains.launcherFor { languageVersion = JavaLanguageVersion.of(11) }
+    mainClass = "org.jetbrains.kotlinx.dataframe.examples.kotlinSpark.TypedDatasetKt"
+}
+
+/**
+ * Runs the kotlinSpark/untypedDataset example with java 11.
+ */
+val runKotlinSparkUntypedDataset by tasks.registering(JavaExec::class) {
+    classpath = sourceSets["main"].runtimeClasspath
+    javaLauncher = javaToolchains.launcherFor { languageVersion = JavaLanguageVersion.of(11) }
+    mainClass = "org.jetbrains.kotlinx.dataframe.examples.kotlinSpark.UntypedDatasetKt"
+}
+
+/**
+ * Runs the spark/typedDataset example with java 11.
+ */
+val runSparkTypedDataset by tasks.registering(JavaExec::class) {
+    classpath = sourceSets["main"].runtimeClasspath
+    javaLauncher = javaToolchains.launcherFor { languageVersion = JavaLanguageVersion.of(11) }
+    mainClass = "org.jetbrains.kotlinx.dataframe.examples.spark.TypedDatasetKt"
+}
+
+/**
+ * Runs the spark/untypedDataset example with java 11.
+ */
+val runSparkUntypedDataset by tasks.registering(JavaExec::class) {
+    classpath = sourceSets["main"].runtimeClasspath
+    javaLauncher = javaToolchains.launcherFor { languageVersion = JavaLanguageVersion.of(11) }
+    mainClass = "org.jetbrains.kotlinx.dataframe.examples.spark.UntypedDatasetKt"
+}
diff --git a/...es/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/exposed/compatibilityLayer.kt b/...es/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/exposed/compatibilityLayer.kt
@@ -0,0 +1,87 @@
+package org.jetbrains.kotlinx.dataframe.examples.exposed
+
+import org.jetbrains.exposed.v1.core.BiCompositeColumn
+import org.jetbrains.exposed.v1.core.Column
+import org.jetbrains.exposed.v1.core.Expression
+import org.jetbrains.exposed.v1.core.ExpressionAlias
+import org.jetbrains.exposed.v1.core.ResultRow
+import org.jetbrains.exposed.v1.core.Table
+import org.jetbrains.exposed.v1.jdbc.Query
+import org.jetbrains.kotlinx.dataframe.AnyFrame
+import org.jetbrains.kotlinx.dataframe.DataFrame
+import org.jetbrains.kotlinx.dataframe.api.convertTo
+import org.jetbrains.kotlinx.dataframe.api.toDataFrame
+import org.jetbrains.kotlinx.dataframe.codeGen.NameNormalizer
+import org.jetbrains.kotlinx.dataframe.impl.schema.DataFrameSchemaImpl
+import org.jetbrains.kotlinx.dataframe.schema.ColumnSchema
+import org.jetbrains.kotlinx.dataframe.schema.DataFrameSchema
+import kotlin.reflect.KProperty1
+import kotlin.reflect.full.isSubtypeOf
+import kotlin.reflect.full.memberProperties
+import kotlin.reflect.typeOf
+
+/**
+ * Retrieves all columns of any [Iterable][Iterable]`<`[ResultRow][ResultRow]`>`, like [Query][Query],
+ * from Exposed row by row and converts the resulting [Map] into a [DataFrame], cast to type [T].
+ *
+ * In notebooks, the untyped version works just as well due to runtime inference :)
+ */
+inline fun <reified T : Any> Iterable<ResultRow>.convertToDataFrame(): DataFrame<T> =
+    convertToDataFrame().convertTo<T>()
+
+/**
+ * Retrieves all columns of any [Iterable][Iterable]`<`[ResultRow][ResultRow]`>`, like [Query][Query],
+ * from Exposed row by row and converts the resulting [Map] into a [DataFrame].
+ */
+@JvmName("convertToAnyFrame")
+fun Iterable<ResultRow>.convertToDataFrame(): AnyFrame {
+    val map = mutableMapOf<String, MutableList<Any?>>()
+    for (row in this) {
+        for (expression in row.fieldIndex.keys) {
+            map.getOrPut(expression.readableName) {
+                mutableListOf()
+            } += row[expression]
+        }
+    }
+    return map.toDataFrame()
+}
+
+/**
+ * Retrieves a simple column name from [this] [Expression].
+ *
+ * Might need to be expanded with multiple types of [Expression].
+ */
+val Expression<*>.readableName: String
+    get() = when (this) {
+        is Column<*> -> name
+        is ExpressionAlias<*> -> alias
+        is BiCompositeColumn<*, *, *> -> getRealColumns().joinToString("_") { it.readableName }
+        else -> toString()
+    }
+
+/**
+ * Creates a [DataFrameSchema] from the declared [Table] instance.
+ *
+ * @param columnNameToAccessor Optional [MutableMap] which will be filled with entries mapping
+ *   the SQL column name to the accessor name from the [Table].
+ *   This can be used to define a [NameNormalizer] later.
+ */
+@Suppress("UNCHECKED_CAST")
+fun Table.toDataFrameSchema(columnNameToAccessor: MutableMap<String, String> = mutableMapOf()): DataFrameSchema {
+    val columns = this::class.memberProperties
+        .filter { it.returnType.isSubtypeOf(typeOf<Column<*>>()) }
+        .associate { prop ->
+            prop as KProperty1<Table, Column<*>>
+
+            // retrieve the actual column name
+            val columnName = prop.get(this).name
+            // store the actual column name together with the accessor name in the map
+            columnNameToAccessor[columnName] = prop.name
+
+            // get the column type from `val a: Column<Type>`
+            val type = prop.returnType.arguments.first().type!!
+
+            columnName to ColumnSchema.Value(type)
+        }
+    return DataFrameSchemaImpl(columns)
+}
diff --git a/...ted-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/exposed/main.kt b/...ted-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/exposed/main.kt
@@ -0,0 +1,75 @@
+package org.jetbrains.kotlinx.dataframe.examples.exposed
+
+import org.jetbrains.exposed.v1.core.Column
+import org.jetbrains.exposed.v1.core.StdOutSqlLogger
+import org.jetbrains.exposed.v1.jdbc.Database
+import org.jetbrains.exposed.v1.jdbc.SchemaUtils
+import org.jetbrains.exposed.v1.jdbc.addLogger
+import org.jetbrains.exposed.v1.jdbc.batchInsert
+import org.jetbrains.exposed.v1.jdbc.deleteAll
+import org.jetbrains.exposed.v1.jdbc.selectAll
+import org.jetbrains.exposed.v1.jdbc.transactions.transaction
+import org.jetbrains.kotlinx.dataframe.api.asSequence
+import org.jetbrains.kotlinx.dataframe.api.count
+import org.jetbrains.kotlinx.dataframe.api.describe
+import org.jetbrains.kotlinx.dataframe.api.groupBy
+import org.jetbrains.kotlinx.dataframe.api.print
+import org.jetbrains.kotlinx.dataframe.api.sortByDesc
+import org.jetbrains.kotlinx.dataframe.size
+import java.io.File
+
+/**
+ * Describes a simple bridge between [Exposed](https://www.jetbrains.com/exposed/) and DataFrame!
+ */
+fun main() {
+    // defining where to find our SQLite database for Exposed
+    val resourceDb = "chinook.db"
+    val dbPath = File(object {}.javaClass.classLoader.getResource(resourceDb)!!.toURI()).absolutePath
+    val db = Database.connect(url = "jdbc:sqlite:$dbPath", driver = "org.sqlite.JDBC")
+
+    // let's read the database!
+    val df = transaction(db) {
+        addLogger(StdOutSqlLogger)
+
+        // tables in Exposed need to be defined, see tables.kt
+        SchemaUtils.create(Customers, Artists, Albums)
+
+        // Perform the specific query you want to read into the DataFrame.
+        // Note: DataFrames are in-memory structures, so don't make it too large if you don't have the RAM ;)
+        val query = Customers.selectAll() // .where { Customers.company.isNotNull() }
+
+        // read and convert the query to a typed DataFrame
+        // see compatibilityLayer.kt for how we created convertToDataFrame<>()
+        // and see tables.kt for how we created CustomersDf!
+        query.convertToDataFrame<CustomersDf>()
+    }
+
+    println(df.size())
+
+    // now we have a DataFrame, we can perform DataFrame operations,
+    // like seeing how often a country is represented
+    df.groupBy { country }.count()
+        .sortByDesc { "count"<Int>() }
+        .print(columnTypes = true, borders = true)
+
+    // or just general statistics
+    df.describe()
+        .print(columnTypes = true, borders = true)
+
+    // or make plots using Kandy! It's all up to you
+
+    // writing a DataFrame back into an SQL database with Exposed can also be done!
+    transaction(db) {
+        addLogger(StdOutSqlLogger)
+
+        // first delete the original contents
+        Customers.deleteAll()
+
+        // batch insert our rows back into the SQL database
+        Customers.batchInsert(df.asSequence()) { dfRow ->
+            for (column in Customers.columns) {
+                this[column as Column<Any?>] = dfRow[column.name]
+            }
+        }
+    }
+}