-
Notifications
You must be signed in to change notification settings - Fork 7
lint dataframe access validation
This document was generated from 'src/documentation/print-linter-wiki.ts' on 2025-07-21, 11:41:52 UTC presenting an overview of flowR's linter (v2.2.16, using R v4.5.0). Please do not edit this file/wiki page directly.
Dataframe Access Validation [overview]
Validates the existance of accessed columns and rows of dataframes.
This linting rule is implemented in src/linter/rules/dataframe-access-validation.ts.
Linting rules can be configured by passing a configuration object to the linter query as shown in the example below.
The dataframe-access-validation
rule accepts the following configuration options:
-
readLoadedData
Whether data frame shapes should be extracted from loaded external data files, such as CSV files (defaults to the option in the flowR config ifundefined
)
df <- data.frame(id = 1:5, name = 6:10)
df[6, "value"]
The linting query can be used to run this rule on the above example:
[ { "type": "linter", "rules": [ { "name": "dataframe-access-validation", "config": {} } ] } ]
Results (prettified and summarized):
Query: linter (4 ms)
╰ Dataframe Access Validation (dataframe-access-validation):
╰ definitely:
╰ Access of row 6 of df
at 3.1-14
╰ Access of column "value" of df
at 3.1-14
╰ Metadata: {"numOperations":1,"numAccesses":2,"totalAccessed":2,"searchTimeMs":0,"processTimeMs":4}
All queries together required ≈4 ms (1ms accuracy, total 8 ms)
Show Detailed Results as Json
The analysis required 8.1 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"linter": {
"results": {
"dataframe-access-validation": {
"results": [
{
"type": "row",
"accessed": 6,
"access": "[",
"operand": "df",
"range": [
3,
1,
3,
14
],
"certainty": "definitely"
},
{
"type": "column",
"accessed": "value",
"access": "[",
"operand": "df",
"range": [
3,
1,
3,
14
],
"certainty": "definitely"
}
],
".meta": {
"numOperations": 1,
"numAccesses": 2,
"totalAccessed": 2,
"searchTimeMs": 0,
"processTimeMs": 4
}
}
},
".meta": {
"timing": 4
}
},
".meta": {
"timing": 4
}
}
These examples are synthesized from the test cases in: test/functionality/linter/lint-dataframe-access-validation.test.ts
We expect the linter to report an issue, if a column is accessed by name via
$
that does not exist in the data frame.
Given the following input:
df <- data.frame(id = 1:3, name = c("Alice", "Bob", "Charlie"), score = c(90, 65, 75))
print(df$skill)
We expect the linter to report the following:
[{ type: 'column', accessed: 'skill', access: '$', operand: 'df', range: [2, 7, 2, 14], certainty: LintingCertainty.Definitely }]
See here for the test-case implementation.
We expect the linter to report an issue, if a column is accessed by index via
[
or[[
that does not exist in the data frame.
Given the following input:
df <- data.frame(id = 1:3, name = c("Alice", "Bob", "Charlie"), score = c(90, 65, 75))
print(df[3:4])
We expect the linter to report the following:
[{ type: 'column', accessed: 4, access: '[', operand: 'df', range: [2, 7, 2, 13], certainty: LintingCertainty.Definitely }]
See here for the test-case implementation.
We expect the linter to report an issue, if a row is accessed by index via
[
or[[
that does not exist in the data frame.
Given the following input:
df <- data.frame(id = 1:3, name = c("Alice", "Bob", "Charlie"), score = c(90, 65, 75))
print(df[[5, "score"]])
We expect the linter to report the following:
[{ type: 'row', accessed: 5, access: '[[', operand: 'df', range: [2, 7, 2, 22], certainty: LintingCertainty.Definitely }]
See here for the test-case implementation.
We expect the linter to report an issue, if a column is used in a
filter
function call that does not exist in the data frame.
Given the following input:
df <- data.frame(id = 1:3, name = c("Alice", "Bob", "Charlie"), score = c(90, 65, 75))
df <- dplyr::filter(df, level > 70)
We expect the linter to report the following:
[{ type: 'column', accessed: 'level', access: 'dplyr::filter', operand: 'df', range: [2, 7, 2, 35], certainty: LintingCertainty.Definitely }]
See here for the test-case implementation.
We expect the linter to report an issue, if a column is selected or unselected in a
select
function that does not exist in the data frame.
Given the following input:
df <- data.frame(id = 1:3, name = c("Alice", "Bob", "Charlie"), score = c(90, 65, 75))
df <- dplyr::select(df, id, age, score)
We expect the linter to report the following:
[{ type: 'column', accessed: 'age', access: 'dplyr::select', operand: 'df', range: [2, 7, 2, 39], certainty: LintingCertainty.Definitely }]
See here for the test-case implementation.
We expect the linter to report an issue for all non-existent columns accessed in transformation functions like
filter
,mutate
, andselect
, as well as all non-existent columns and rows accessed via access operators like[
.
Given the following input:
library(dplyr)
df1 <- data.frame(
id = c(1, 2, 3, 4),
score = c(65, 85, 40, 90)
)
df2 <- df1 %>%
filter(age > 50) %>%
mutate(level = skill^2) %>%
select(-name)
print(df2[1:5, 3])
We expect the linter to report the following:
type: 'column', accessed: 'age', access: 'filter', operand: 'df1', range: [9, 5, 9, 20], certainty: LintingCertainty.Definitely },
{ type: 'column', accessed: 'skill', access: 'mutate', range: [10, 5, 10, 27], certainty: LintingCertainty.Definitely },
{ type: 'column', accessed: 'name', access: 'select', range: [11, 5, 11, 17], certainty: LintingCertainty.Definitely },
{ type: 'row', accessed: 5, access: '[', operand: 'df2', range: [13, 7, 13, 17], certainty: LintingCertainty.Definitely },
{ type: 'column', accessed: 3, access: '[', operand: 'df2', range: [13, 7, 13, 17], certainty: LintingCertainty.Definitely
See here for the test-case implementation.
Currently maintained by Florian Sihler at Ulm University
Email | GitHub | Penguins | Portfolio