Skip to content

Setting a column to NULL in a schema should check for its existence #645

@hfrick

Description

@hfrick

The "reference in new issue" feature seems to not work right now 🦄 but here's the issue following on from this comment
https://github.com/rstudio/pointblank/pull/642/files#r2293299337

Essentially, it'd be great if specifying b = NULL in a schema would mean a check on existence without the type check. Currently, you can skip the type check but that could also pass because the column is missing.

library(pointblank)

# baseline: passes
data.frame(a = 1:2) |>
  col_schema_match(col_schema(a = "integer"))
#>   a
#> 1 1
#> 2 2

# add b to data frame and to schema as NULL and strict check fails as it should
data.frame(a = 1:2, b = 1:2) |> 
  col_schema_match(col_schema(a = "integer", b = NULL))
#> Error: Failure to validate that column schemas match.
#> The `col_schema_match()` validation failed beyond the absolute threshold level (1).
#> * failure level (1) >= failure threshold (1)

# relaxing `is_exact` allows the check to pass
data.frame(a = 1:2, b = 1:2) |>
  col_schema_match(col_schema(a = "integer", b = NULL), is_exact = FALSE)
#>   a b
#> 1 1 1
#> 2 2 2

# but it still passes when b is missing from the data frame
# i.e. it's not a check for existence
data.frame(a = 1:2) |>
  col_schema_match(col_schema(a = "integer", b = NULL), is_exact = FALSE)
#>   a
#> 1 1
#> 2 2

Created on 2025-08-26 with reprex v2.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions