Skip to content

Commit eddb3b7

Browse files
committed
Add SQL-to-Kotlin DataFrame transition guide for backend developers
Includes a comprehensive guide to help SQL and ORM users adapt to Kotlin DataFrame. Covers key concepts, equivalents for SQL/ORM operations, and practical examples. Updated TOC to include the new guide.
1 parent bfc4a25 commit eddb3b7

File tree

2 files changed

+226
-0
lines changed

2 files changed

+226
-0
lines changed

docs/StardustDocs/d.tree

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
<toc-element topic="Kotlin-DataFrame-Features-in-Kotlin-Notebook.md">
1313
<toc-element topic="Trobleshooting.md"/>
1414
</toc-element>
15+
<toc-element topic="Guide-for-backend-SQL-developers.md"/>
1516
</toc-element>
1617

1718
<toc-element topic="Setup.md" accepts-web-file-names="gettingstarted">
Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
# Kotlin DataFrame for SQL & Backend Developers
2+
3+
<web-summary>
4+
Quickly transition from SQL to Kotlin DataFrame: load your datasets, perform essential transformations, and visualize your results — directly within a Kotlin Notebook.
5+
</web-summary>
6+
7+
<card-summary>
8+
Switching from SQL? Kotlin DataFrame makes it easy to load, process, analyze, and visualize your data — fully interactive and type-safe!
9+
</card-summary>
10+
11+
<link-summary>
12+
Explore Kotlin DataFrame as a SQL or ORM user: read your data, transform columns, group or join tables, and build insightful visualizations with Kotlin Notebook.
13+
</link-summary>
14+
15+
This guide helps Kotlin backend developers with SQL experience quickly adapt to **Kotlin DataFrame**, mapping familiar SQL and ORM operations to DataFrame concepts.
16+
17+
We recommend starting with [**Kotlin Notebook**](SetupKotlinNotebook.md) — an IDE-integrated tool similar to Jupyter Notebook.
18+
19+
It lets you explore data interactively, render DataFrames, create plots, and use all your IDE features within the JVM ecosystem.
20+
21+
If you plan to work on a Gradle project without a notebook, we recommend installing the library together with our [**experimental Kotlin compiler plugin**](Compiler-Plugin.md) (available since version 2.2.*).
22+
This plugin generates type-safe schemas at compile time, tracking schema changes throughout your data pipeline.
23+
24+
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.guides.QuickStartGuide-->
25+
26+
## Quick Setup
27+
28+
To start working with Kotlin DataFrame in a Kotlin Notebook, run the cell with the next code:
29+
30+
```kotlin
31+
%useLatestDescriptors
32+
%use dataframe
33+
```
34+
35+
This will load all necessary DataFrame dependencies (of the latest stable version) and all imports, as well as DataFrame
36+
rendering. Learn more [here](SetupKotlinNotebook.md#integrate-kotlin-dataframe).
37+
38+
---
39+
40+
## 1. What is a DataFrame?
41+
42+
If you’re used to SQL, a **DataFrame** is conceptually like a **table**:
43+
44+
- **Rows**: ordered records of data
45+
- **Columns**: named, typed fields
46+
- **Schema**: a mapping of column names to types
47+
48+
Kotlin DataFrame also supports [**hierarchical, JSON-like data**](hierarchical.md) — columns can contain *nested DataFrames* or *column groups*, allowing you to represent and transform tree-like structures without flattening.
49+
50+
Unlike a relational DB table:
51+
52+
- A DataFrame **lives in memory** — there’s no storage engine or transaction log
53+
- It’s **immutable** — each operation produces a *new* DataFrame
54+
- There is **no concept of foreign keys or relations** between DataFrames
55+
- It can be created from *any* [source](Data-Sources.md): [CSV](CSV-TSV.md), [JSON](JSON.md), [SQL tables](SQL.md), [Apache Arrow](ApacheArrow.md), in-memory objects
56+
57+
---
58+
59+
## 2. Reading Data From SQL
60+
61+
Kotlin DataFrame integrates with JDBC, so you can bring SQL data into memory for analysis.
62+
63+
| Approach | Example |
64+
|------------------------------------|---------|
65+
| **From a table** | `val df = DataFrame.readSqlTable(dbConfig, "customers")` |
66+
| **From a SQL query** | `val df = DataFrame.readSqlQuery(dbConfig, "SELECT * FROM orders")` |
67+
| **From a JDBC Connection** | `val df = connection.readDataFrame("SELECT * FROM orders")` |
68+
| **From a ResultSet (extension)** | `val df = resultSet.readDataFrame(connection)` |
69+
70+
```kotlin
71+
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
72+
73+
val dbConfig = DbConnectionConfig(
74+
url = "jdbc:postgresql://localhost:5432/mydb",
75+
user = "postgres",
76+
password = "secret"
77+
)
78+
79+
// Table
80+
val customers = DataFrame.readSqlTable(dbConfig, "customers")
81+
82+
// Query
83+
val salesByRegion = DataFrame.readSqlQuery(dbConfig, """
84+
SELECT region, SUM(amount) AS total
85+
FROM sales
86+
GROUP BY region
87+
""")
88+
89+
// From JDBC connection
90+
connection.readDataFrame("SELECT * FROM orders")
91+
92+
// From ResultSet
93+
val rs = connection.createStatement().executeQuery("SELECT * FROM orders")
94+
rs.readDataFrame(connection)
95+
```
96+
97+
More information could be found [here](readSqlDatabases.md).
98+
99+
## 3. Why It’s Not an ORM
100+
101+
Frameworks like **Hibernate** or **Exposed**:
102+
- Map DB tables to Kotlin objects (entities)
103+
- Track object changes and sync them back to the database
104+
- Focus on **persistence** and **transactions**
105+
106+
Kotlin DataFrame:
107+
- Has no persistence layer
108+
- Doesn’t try to map rows to mutable entities
109+
- Focuses on **in-memory analytics**, **transformations**, and **type-safe pipelines**
110+
- The **main idea** is that the schema *changes together with your transformations* — and the [**Compiler Plugin**](Compiler-Plugin.md) updates the type-safe API automatically under the hood.
111+
- You don’t have to manually define or recreate schemas every time — the plugin infers them dynamically from data or transformations.
112+
- In ORMs, the mapping layer is **frozen** — schema changes require manual model edits and migrations.
113+
114+
Think of Kotlin DataFrame as a **data analysis/ETL tool**, not an ORM.
115+
116+
---
117+
118+
## 4. Key Differences from SQL & ORMs
119+
120+
| Feature / Concept | SQL Databases (PostgreSQL, MySQL…) | ORM (Hibernate, Exposed…) | Kotlin DataFrame |
121+
|------------------------------------|-------------------------------------|---------------------------|------------------|
122+
| **Storage** | Persistent | Persistent | In-memory only |
123+
| **Schema definition** | `CREATE TABLE` DDL | Defined in entity classes | Derived from data or transformations |
124+
| **Schema change** | `ALTER TABLE` | Manual migration of entity classes | Automatic via transformations + Compiler Plugin |
125+
| **Relations** | Foreign keys | Mapped via annotations | Not applicable |
126+
| **Transactions** | Yes | Yes | Not applicable |
127+
| **Indexes** | Yes | Yes (via DB) | Not applicable |
128+
| **Data manipulation** | SQL DML (`INSERT`, `UPDATE`) | CRUD mapped to DB | Transformations only (immutable) |
129+
| **Joins** | `JOIN` keyword | Eager/lazy loading | `.join()` / `.leftJoin()` DSL |
130+
| **Grouping & aggregation** | `GROUP BY` | DB query with groupBy | `.groupBy().aggregate()` |
131+
| **Filtering** | `WHERE` | Criteria API / query DSL | `.filter { ... }` |
132+
| **Permissions** | `GRANT` / `REVOKE` | DB-level permissions | Not applicable |
133+
| **Execution** | On DB engine | On DB engine | In JVM process |
134+
135+
---
136+
137+
## 5. SQL → Kotlin DataFrame Cheatsheet
138+
139+
### DDL Analogues
140+
141+
| SQL DDL Command / Example | Kotlin DataFrame Equivalent |
142+
|---------------------------|-----------------------------|
143+
| **Create table:**<br>`CREATE TABLE person (name text, age int);` | `@DataSchema`<br>`interface Person {`<br>` val name: String`<br>` val age: Int`<br>`}` |
144+
| **Add column:**<br>`ALTER TABLE sales ADD COLUMN profit numeric GENERATED ALWAYS AS (revenue - cost) STORED;` | `.add("profit") { revenue - cost }` |
145+
| **Rename column:**<br>`ALTER TABLE sales RENAME COLUMN old_name TO new_name;` | `.rename { old_name }.into("new_name")` |
146+
| **Drop column:**<br>`ALTER TABLE sales DROP COLUMN old_col;` | `.remove { old_col }` |
147+
| **Modify column type:**<br>`ALTER TABLE sales ALTER COLUMN amount TYPE numeric;` | `.convert { amount }.to<Double>()` |
148+
149+
### DDL Analogues (TODO: decide to remove first DDL section or this)
150+
151+
| SQL DDL Command | Kotlin DataFrame Equivalent |
152+
|--------------------------------|------------------------------------------------------------------|
153+
| `CREATE TABLE` | Define `@DataSchema` interface or class <br>`@DataSchema`<br>`interface Person {`<br>` val name: String`<br>` val age: Int`<br>`}` |
154+
| `ALTER TABLE ADD COLUMN` | `.add("newCol") { ... }` |
155+
| `ALTER TABLE DROP COLUMN` | `.remove("colName")` |
156+
| `ALTER TABLE RENAME COLUMN` | `.rename { oldName }.into("newName")` |
157+
| `ALTER TABLE MODIFY COLUMN` | `.convert { colName }.to<NewType>()` |
158+
159+
---
160+
161+
### DML Analogues
162+
163+
| SQL DML Command / Example | Kotlin DataFrame Equivalent |
164+
|----------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------|
165+
| `SELECT col1, col2` | `df.select { col1 and col2 }` |
166+
| `WHERE amount > 100` | `df.filter { amount > 100 }` |
167+
| `ORDER BY amount DESC` | `df.sortByDesc { amount }` |
168+
| `GROUP BY region` | `df.groupBy { region }` |
169+
| `SUM(amount)` | `.aggregate { sum(amount) }` |
170+
| `JOIN` | `.join(otherDf) { id match right.id }` |
171+
| `LIMIT 5` | `.take(5)` |
172+
| **Pivot:** <br>`SELECT * FROM crosstab('SELECT region, year, SUM(amount) FROM sales GROUP BY region, year') AS ct(region text, y2023 int, y2024 int);` | `.pivot(region, year) { sum(amount) }` |
173+
| **Explode array column:** <br>`SELECT id, unnest(tags) AS tag FROM products;` | `.explode { tags }` |
174+
| **Update column:** <br>`UPDATE sales SET amount = amount * 1.2;` | `.update { amount }.with { it * 1.2 }` |
175+
176+
177+
## 6. Example: SQL vs DataFrame Side-by-Side
178+
179+
**SQL (PostgreSQL):**
180+
```sql
181+
SELECT region, SUM(amount) AS total
182+
FROM sales
183+
WHERE amount > 0
184+
GROUP BY region
185+
ORDER BY total DESC
186+
LIMIT 5;
187+
```
188+
189+
```kotlin
190+
sales.filter { amount > 0 }
191+
.groupBy { region }
192+
.aggregate { sum(amount).into("total") }
193+
.sortByDesc { total }
194+
.take(5)
195+
```
196+
197+
## In conclusion
198+
199+
- Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe** and fully integrated into Kotlin.
200+
- The main focus is **readability**, schema change safety, and evolving API support via the [Compiler Plugin](Compiler-Plugin.md).
201+
- It is neither a database nor an ORM — a DataFrame does not store data or manage transactions but works as an in-memory layer for analytics and transformations.
202+
- It does not provide some SQL features (permissions, transactions, indexes) — but offers convenient tools for working with JSON-like structures and combining multiple data sources.
203+
- Use Kotlin DataFrame as a **type-safe DSL** for post-processing, merging data sources, and analytics directly on the JVM, while keeping your code easily refactorable and IDE-assisted.
204+
205+
## What's Next?
206+
If you're ready to go through a complete example, we recommend our [Quickstart Guide](quickstart.md)
207+
— you'll learn the basics of reading data, transforming it, and creating visualization step-by-step.
208+
209+
Ready to go deeper? Check out what’s next:
210+
211+
- 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets,
212+
API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame.
213+
214+
- 🛠️ **[Browse the operations overview](operations.md)** to learn what Kotlin DataFrame can do.
215+
216+
- 🧠 **Understand the design** and core concepts in the [library overview](concepts.md).
217+
218+
- 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)**
219+
and make working with your data both convenient and type-safe.
220+
221+
- 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)**
222+
for auto-generated column access in your IntelliJ IDEA projects.
223+
224+
- 📊 **Master Kandy** for stunning and expressive DataFrame visualizations learning
225+
[Kandy Documentation](https://kotlin.github.io/kandy).

0 commit comments

Comments
 (0)