lint dataframe access validation

This document was generated from 'src/documentation/print-linter-wiki.ts' on 2025-07-21, 11:41:52 UTC presenting an overview of flowR's linter (v2.2.16, using R v4.5.0). Please do not edit this file/wiki page directly.

Dataframe Access Validation ^[overview]

Validates the existance of accessed columns and rows of dataframes.
This linting rule is implemented in src/linter/rules/dataframe-access-validation.ts.

Configuration

Linting rules can be configured by passing a configuration object to the linter query as shown in the example below. The dataframe-access-validation rule accepts the following configuration options:

readLoadedData
Whether data frame shapes should be extracted from loaded external data files, such as CSV files (defaults to the option in the flowR config if undefined)

Examples

df <- data.frame(id = 1:5, name = 6:10)
df[6, "value"]

The linting query can be used to run this rule on the above example:

[ { "type": "linter",   "rules": [ { "name": "dataframe-access-validation",     "config": {} } ] } ]

Results (prettified and summarized):

Query: linter (4 ms)
   ╰ Dataframe Access Validation (dataframe-access-validation):
       ╰ definitely:
           ╰ Access of row 6 of df at 3.1-14
           ╰ Access of column "value" of df at 3.1-14
       ╰ Metadata: {"numOperations":1,"numAccesses":2,"totalAccessed":2,"searchTimeMs":0,"processTimeMs":4}
All queries together required ≈4 ms (1ms accuracy, total 8 ms)

Show Detailed Results as Json

The analysis required 8.1 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "linter": {
    "results": {
      "dataframe-access-validation": {
        "results": [
          {
            "type": "row",
            "accessed": 6,
            "access": "[",
            "operand": "df",
            "range": [
              3,
              1,
              3,
              14
            ],
            "certainty": "definitely"
          },
          {
            "type": "column",
            "accessed": "value",
            "access": "[",
            "operand": "df",
            "range": [
              3,
              1,
              3,
              14
            ],
            "certainty": "definitely"
          }
        ],
        ".meta": {
          "numOperations": 1,
          "numAccesses": 2,
          "totalAccessed": 2,
          "searchTimeMs": 0,
          "processTimeMs": 4
        }
      }
    },
    ".meta": {
      "timing": 4
    }
  },
  ".meta": {
    "timing": 4
  }
}

Additional Examples

These examples are synthesized from the test cases in: test/functionality/linter/lint-dataframe-access-validation.test.ts

Test Case: Column access by name

We expect the linter to report an issue, if a column is accessed by name via $ that does not exist in the data frame.

Given the following input:

df <- data.frame(id = 1:3, name = c("Alice", "Bob", "Charlie"), score = c(90, 65, 75))
print(df$skill)

We expect the linter to report the following:

[{ type: 'column', accessed: 'skill', access: '$', operand: 'df', range: [2, 7, 2, 14], certainty: LintingCertainty.Definitely }]

See here for the test-case implementation.

Test Case: Column access by index

We expect the linter to report an issue, if a column is accessed by index via [ or [[ that does not exist in the data frame.

Given the following input:

df <- data.frame(id = 1:3, name = c("Alice", "Bob", "Charlie"), score = c(90, 65, 75))
print(df[3:4])

We expect the linter to report the following:

[{ type: 'column', accessed: 4, access: '[', operand: 'df', range: [2, 7, 2, 13], certainty: LintingCertainty.Definitely }]

See here for the test-case implementation.

Test Case: Row access by index

We expect the linter to report an issue, if a row is accessed by index via [ or [[ that does not exist in the data frame.

Given the following input:

df <- data.frame(id = 1:3, name = c("Alice", "Bob", "Charlie"), score = c(90, 65, 75))
print(df[[5, "score"]])

We expect the linter to report the following:

[{ type: 'row', accessed: 5, access: '[[', operand: 'df', range: [2, 7, 2, 22], certainty: LintingCertainty.Definitely }]

See here for the test-case implementation.

Test Case: Filter access

We expect the linter to report an issue, if a column is used in a filter function call that does not exist in the data frame.

Given the following input:

df <- data.frame(id = 1:3, name = c("Alice", "Bob", "Charlie"), score = c(90, 65, 75))
df <- dplyr::filter(df, level > 70)

We expect the linter to report the following:

[{ type: 'column', accessed: 'level', access: 'dplyr::filter', operand: 'df', range: [2, 7, 2, 35], certainty: LintingCertainty.Definitely }]

See here for the test-case implementation.

Test Case: Select access

We expect the linter to report an issue, if a column is selected or unselected in a select function that does not exist in the data frame.

Given the following input:

df <- data.frame(id = 1:3, name = c("Alice", "Bob", "Charlie"), score = c(90, 65, 75))
df <- dplyr::select(df, id, age, score)

We expect the linter to report the following:

[{ type: 'column', accessed: 'age', access: 'dplyr::select', operand: 'df', range: [2, 7, 2, 39], certainty: LintingCertainty.Definitely }]

See here for the test-case implementation.

Test Case: Code example

We expect the linter to report an issue for all non-existent columns accessed in transformation functions like filter, mutate, and select, as well as all non-existent columns and rows accessed via access operators like [.

Given the following input:

library(dplyr)

df1 <- data.frame(
    id = c(1, 2, 3, 4),
    score = c(65, 85, 40, 90)
)

df2 <- df1 %>%
    filter(age > 50) %>%
    mutate(level = skill^2) %>%
    select(-name)

print(df2[1:5, 3])

We expect the linter to report the following:

type: 'column', accessed: 'age', access: 'filter', operand: 'df1', range: [9, 5, 9, 20], certainty: LintingCertainty.Definitely },
{ type: 'column', accessed: 'skill', access: 'mutate', range: [10, 5, 10, 27], certainty: LintingCertainty.Definitely },
{ type: 'column', accessed: 'name', access: 'select', range: [11, 5, 11, 17], certainty: LintingCertainty.Definitely },
{ type: 'row', accessed: 5, access: '[', operand: 'df2', range: [13, 7, 13, 17], certainty: LintingCertainty.Definitely },
{ type: 'column', accessed: 3, access: '[', operand: 'df2', range: [13, 7, 13, 17], certainty: LintingCertainty.Definitely

See here for the test-case implementation.

Currently maintained by Florian Sihler at Ulm University
Email | GitHub | Penguins | Portfolio

💮 flowR Home

lint dataframe access validation

Dataframe Access Validation [overview]

Configuration

Examples

Additional Examples

Test Case: Column access by name

Test Case: Column access by index

Test Case: Row access by index

Test Case: Filter access

Test Case: Select access

Test Case: Code example

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Dataframe Access Validation ^[overview]