Arnej/xgboost ubj import #35508

arnej27959 · 2025-12-12T14:46:50Z

This commit adds the ability to import XGBoost models saved in Universal Binary JSON (.ubj) format, in addition to the existing JSON format support. Key changes: - Add ubjson library dependency for parsing UBJ binary format - Create XGBoostUbjParser to handle UBJ model files - Extract common tree-to-expression logic into AbstractXGBoostParser base class - Convert flat UBJ array representation to hierarchical tree structure - Extract and apply base_score logit transformation from model metadata - Add test case comparing JSON and UBJ model imports - Add utility tools for UBJ-to-JSON conversion and debugging Enables base score extraction with logistic transformation

Add the ubjson library (com.dev-smart:ubjson) to the allowed dependencies lists across all Maven enforcer configurations. This is required for the XGBoost UBJ format import feature added in the previous commit.

Add a probe method to validate UBJ file structure before parsing, and precompute the base_score logit transformation instead of generating it as a runtime expression string.

Separates feature indices from feature name formatting to enable flexible feature naming in ranking expressions. This allows models to use meaningful feature names (e.g., "mean_radius") instead of generic indexed names, improving readability of generated ranking expressions.

When loading an XGBoost UBJ model, automatically checks for and loads feature names from an optional companion text file. For example, when reading "model.ubj", will look for "model-features.txt" and use those names if present. Key features: - Automatically loads model-features.txt alongside model.ubj - One feature name per line, supports # comments and blank lines - Feature names from file override any names in the UBJ file - Graceful fallback to xgboost_input_X format if file missing or invalid - No-arg toRankingExpression() automatically uses loaded names when valid This enables easy customization of feature names without modifying model files, improving readability of generated ranking expressions.

The importer now extracts and tracks the model's objective function type (e.g., reg:squarederror, binary:logistic) to correctly handle base_score: - Apply logit transformation only for logistic objectives - Use base_score directly for regression objectives - Use objective-specific defaults (0.5 for logistic, 0.0 for regression) - Relax feature name validation to require "at least N" instead of "exactly N"

arnej27959 added 9 commits December 12, 2025 14:05

Allow ubjson dependency in Maven enforcer configurations

79d595b

Add the ubjson library (com.dev-smart:ubjson) to the allowed dependencies lists across all Maven enforcer configurations. This is required for the XGBoost UBJ format import feature added in the previous commit.

Improve XGBoost UBJ import

964646a

Add a probe method to validate UBJ file structure before parsing, and precompute the base_score logit transformation instead of generating it as a runtime expression string.

Clean up XGBoost feature filename handling

2425760

minimize visibility of ubjson library

92013a9

use standard mechanism for ubjson version management

b55f4f5

bjorncs approved these changes Dec 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Arnej/xgboost ubj import #35508

Arnej/xgboost ubj import #35508

Uh oh!

arnej27959 commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Arnej/xgboost ubj import #35508

Are you sure you want to change the base?

Arnej/xgboost ubj import #35508

Uh oh!

Conversation

arnej27959 commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants