arviz-devs · aloctavodia · Dec 20, 2025 · Dec 22, 2025
diff --git a/paper/figures/figure_0.png b/paper/figures/figure_0.png
diff --git a/paper/paper.md b/paper/paper.md
@@ -0,0 +1,117 @@
+---
+title: 'ArviZ: a modular and flexible library for exploratory analysis of Bayesian models'
+tags:
+  - Python
+  - Bayesian statistics
+  - Bayesian workflow
+authors:
+  - name: Osvaldo A Martin
+    orcid: 0000-0001-7419-8978
+    equal-contrib: true
+    corresponding: true
+    affiliation: 1
+  - name: Oriol Abril-Pla
+    orcid: 0000-0002-1847-9481
+    equal-contrib: true
+    corresponding: true 
+    affiliation: 2
+  - name: Jordan Deklerk
+    affiliation: 3
+  - name: Colin Carroll
+    orcid: 0000-0001-6977-0861
+    affiliation: 2
+  - name: Ari Hartikainen
+    orcid: 0000-0002-4569-569X
+    affiliation: 2
+  - name: Aki Vehtari
+    orcid: 0000-0003-2164-9469
+    affiliation: 1
+affiliations:
+ - name:  Aalto University, Espoo, Finland
+   index: 1
+ - name: arviz-devs
+   index: 2
+ - name: DICK's Sporting Goods, Coraopolis, Pennsylvania
+   index: 3
+date: 22 December 2025
+bibliography: references.bib
+---
+
+# Summary
+
+
+`ArviZ` [@Kumar_2019] is a Python package for exploratory analysis of Bayesian models that has been widely used in academia and industry since its introduction in 2019. It's goal is to integrate seamlessly with established probabilistic programming languages and statistical interfaces, such as PyMC [@Abril-pla_2023], Stan (via the cmdstanpy interface) [@stan], Pyro, and NumPyro [@Phan_2019; @Bingham_2019], emcee [@emcee], and Bambi [@Capretto_2022], among others.
+
+`ArviZ` is part of the broader ArviZ-project, which develops tools for Exploratory Analysis of Bayesian Models. The organization also maintains other initiatives, including arviz.jl (for Julia), PreliZ [@icazatti_2023], educational resources [@eabm_2025], and additional packages that are still in an experimental phase.
+
+In this work, we present a redesigned version of `ArviZ` that emphasizes greater user control and modularity. This redesign delivers a more flexible and efficient toolkit for exploratory analysis of Bayesian models. With its renewed focus on modularity and usability, `ArviZ` is well-positioned to remain an essential tool for Bayesian modelers in both research and applied settings.
+
+# Statement of need
+
+Probabilistic programming has emerged as a powerful paradigm for statistical modeling, accompanied by a growing ecosystem of tools for model specification and inference. However, effective modeling requires robust support for sampling diagnostics, model comparison, and model checking [@Gelman_2020; @Martin_2024; @Guo_2024]. `ArviZ` addresses this gap by providing a unified, backend-agnostic library to perform these tasks.
+
+The methods implemented in `ArviZ` are grounded in well-established statistical principles and provide robust, interpretable diagnostics and visualizations [@Vehtari_2017; @Gelman_2019; @Paananen_2020; @Vehtari_2021; @Dimitriadis_2021; @Sailynoja_2022; @Kallioinen_2023; @Sailynoja_2025]. The redesigned version furthers these goals by introducing an easier-to-use interface for regular users and more powerful tooling for power users and developers of Bayesian tools. These updates align with recent developments in the probabilistic programming field. Additionally, the new design facilitates the use of components as modular building blocks for custom analyses. This frequent user request was difficult to accommodate under the old framework.
+
+# Description
+
+We present a redesigned version of `ArviZ` emphasizing greater user control and modularity. The new architecture enables users to customize the installation and use of specific components. The previous `ArviZ` design divided the package into three submodules, which are now available as three independent installable packages with improved design as described next.
+
+General functionality, data processing, and data input/output have been streamlined and enhanced for greater versatility. Previously, `ArviZ` used the custom `InferenceData` class to organize and store the high-dimensional outputs of Bayesian inference in a structured, labeled format, enabling efficient analysis, metadata persistence, and serialization. These have been replaced with the `DataTree` class from xarray [@Hoyer_2017]. Additionally, converters allow more flexibility in dimensionality, naming, and indexing of their generated outputs.
+
+Statistical functions are now accessible through two distinct interfaces: 
+
+* A low-level array interface with minimal dependencies, intended for advanced users and developers of third-party libraries.
+* A higher-level xarray interface designed for end users, which simplifies usage by automating common tasks and handling metadata. 
+
+Plotting functions have also been redesigned to support modularity at multiple levels:
+
+* At a high level, `ArviZ` offers a collection of “batteries-included” plots. These are built-in plotting functions providing sensible defaults for common tasks like MCMC sampling diagnostics, predictive checks, and model comparison.
+* At an intermediate level, the API enables easier customization of batteries-included plots and simplifies the creation of new plots. This is achieved through the `PlotCollection` class, which enables developers and advanced users to focus solely on the plotting logic, without needing to handle faceting or aesthetics. 
+* At a lower level, we have improved the separation between computational and plotting logic, reducing code duplication and enhancing modular design. These changes also facilitate support for multiple plotting backends, improving extensibility and maintainability. Currently, `ArviZ` supports three plotting backends: matplotlib [@Hunter_2007], Bokeh [@Bokeh_2018], and plotly [@plotly_2015].
+
+
+## Examples
+
+For the first example, we construct an array resembling data from MCMC sampling. We have 4 chains and 1000 draws for two posterior variables. We can compute the effective sample size for this array using the stats interface. For this, we need to specify which axes represent the chains and which the draws
+
+    import numpy as np
+    from arviz import array_stats
+
+    rng = np.random.default_rng()
+    samples = rng.normal(size=(4, 1000, 2))
+    array_stats.ess(samples, chain_axis=0, draw_axis=1)
+
+We now contrast the array interface with the xarray interface, as we see there is no need to specify the chain and draw information, as this information is already encoded in the `DataTree` object.
+
+    import arviz as az
+    dt_samples = az.convert_to_datatree(samples)
+    az.ess(dt_samples)
+
+The only required argument for battery-included plots is the input data, typically a `DataTree` (`dt`), but in the following example we also apply optional customizations. 
+
+    az.style.use('arviz-variat')
+    dt = az.load_arviz_data("centered_eight")
+    pc = az.plot_dist(
+        dt,
+        kind="hist",
+        visuals={"hist":{"alpha": 0.3}},
+        aes={"color": ["school"]}
+    );
+    pc.add_legend("school", loc="outside right upper")
+
+![plot_dist with color mapped to school dimension.](figures/figure_0.png "`plot_dist` is a built-in plot. Here we show an example of further customization. The color is mapped to the school dimension. A neutral color is automatically assigned to the variables without the school dimension (mu and tau). The histograms have been made translucent"){width=4.5in}
+
+We have shown two small examples. For a more comprehensive overview, see the [`ArviZ` documentation](https://python.arviz.org/en/latest/) and the [EABM guide](https://arviz-devs.github.io/EABM/) [@eabm_2025]. These resources include a wide range of examples designed for all types of users, from casual users to advanced analysts and developers looking to use `ArviZ` in their projects or libraries.
+
+
+## Acknowledgements
+
+We thank our fiscal sponsor, NumFOCUS, a nonprofit 501(c)(3) public charity, for their operational and financial support.
+
+This research was supported by:
+
+* The Research Council of Finland Flagship Program "Finnish Center for Artificial Intelligence" (FCAI)
+* Essential Open Source Software Round 4 grant by the Chan Zuckerberg Initiative (CZI)
+* ...
+
+# References
diff --git a/paper/references.bib b/paper/references.bib
@@ -0,0 +1,260 @@
+@article{Kumar_2019,
+doi = {10.21105/joss.01143},
+url = {https://doi.org/10.21105/joss.01143},
+year = {2019}, publisher = {The Open Journal},
+volume = {4},
+number = {33},
+pages = {1143},
+author = {Ravin Kumar and Colin Carroll and Ari Hartikainen and Osvaldo Martin},
+title = {ArviZ a unified library for exploratory analysis of Bayesian models in Python},
+journal = {Journal of Open Source Software}
+} 
+
+@article{Abril-pla_2023,
+	title = {{PyMC}: a modern, and comprehensive probabilistic programming framework in {Python}},
+	volume = {9},
+	issn = {2376-5992},
+	shorttitle = {{PyMC}},
+	url = {https://peerj.com/articles/cs-1516},
+	doi = {10.7717/peerj-cs.1516},
+	language = {en},
+	urldate = {2023-10-26},
+	journal = {PeerJ Computer Science},
+	author = {Abril-Pla, Oriol and Andreani, Virgile and Carroll, Colin and Dong, Larry and Fonnesbeck, Christopher J. and Kochurov, Maxim and Kumar, Ravin and Lao, Junpeng and Luhmann, Christian C. and Martin, Osvaldo A. and Osthege, Michael and Vieira, Ricardo and Wiecki, Thomas and Zinkov, Robert},
+	month = sep,
+	year = {2023},
+	note = {Publisher: PeerJ Inc.},
+	pages = {e1516},
+}
+
+@article{stan,
+  title = {Stan: {{A Probabilistic Programming Language}} | {{Carpenter}} | {{Journal}} of {{Statistical Software}}},
+	shorttitle = {{}},
+  doi = {10.18637/jss.v076.i01},
+  language = {en-US},
+  keywords = {Bayesian inference,algorithmic differentiation,probabilistic programming,Stan},
+}
+
+@article{Phan_2019,
+  title={Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro},
+  author={Phan, Du and Pradhan, Neeraj and Jankowiak, Martin},
+  journal={arXiv preprint arXiv:1912.11554},
+  year={2019}
+}
+
+@article{Bingham_2019,
+  author    = {Eli Bingham and
+               Jonathan P. Chen and
+               Martin Jankowiak and
+               Fritz Obermeyer and
+               Neeraj Pradhan and
+               Theofanis Karaletsos and
+               Rohit Singh and
+               Paul A. Szerlip and
+               Paul Horsfall and
+               Noah D. Goodman},
+  title     = {Pyro: Deep Universal Probabilistic Programming},
+  journal   = {J. Mach. Learn. Res.},
+  volume    = {20},
+  pages     = {28:1--28:6},
+  year      = {2019},
+  url       = {http://jmlr.org/papers/v20/18-403.html}
+}
+
+@misc{Gelman_2020,
+      title={Bayesian Workflow}, 
+      author={Andrew Gelman and Aki Vehtari and Daniel Simpson and Charles C. Margossian and Bob Carpenter and Yuling Yao and Lauren Kennedy and Jonah Gabry and Paul-Christian Bürkner and Martin Modrák},
+      year={2020},
+      eprint={2011.01808},
+      archivePrefix={arXiv},
+      doi={10.48550/arXiv.2011.01808},
+      primaryClass={stat.ME}
+}
+
+@article{Sailynoja_2025,
+  title={Recommendations for visual predictive checks in Bayesian workflow},
+  author={S{\"a}ilynoja, Teemu and Johnson, Andrew R and Martin, Osvaldo A and Vehtari, Aki},
+  journal={arXiv:2503.01509},
+  doi={10.48550/arXiv.2503.01509},
+  year={2025},
+}
+
+@article{Vehtari_2017,
+  title={Practical {Bayesian} model evaluation using leave-one-out cross-validation and {WAIC}},
+  author={Vehtari, Aki and Gelman, Andrew and Gabry, Jonah},
+  journal={Stat Comp},
+  doi={10.1007/s11222-016-9696-4},
+  volume={27},
+  pages={1413--1432},
+  year={2017},
+}
+
+@article{Vehtari_2021,
+  title={Rank-normalization, folding, and localization: An improved $\widehat{R}$ for assessing convergence of {MCMC}},
+  author={Vehtari, Aki and Gelman, Andrew and Simpson, Daniel and Carpenter, Bob and B{\"u}rkner, Paul Christian},
+  journal={Bayes Anal},
+  year={2021},
+  volume={16},
+  doi={10.1214/20-BA1221},
+ pages={667--718}
+}
+
+@article{Sailynoja_2022,
+	title = {Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison},
+	volume = {32},
+	pages = {1573--1375},
+	journal = {Stat Comp},
+	author = {Säilynoja, Teemu and Bürkner, Paul Christian and Vehtari, Aki},
+  doi = {10.1007/s11222-022-10090-6},
+	year = {2022}
+}
+
+@article{Kallioinen_2023,
+title = {Detecting and diagnosing prior and likelihood sensitivity with power-scaling},
+author = {Noa Kallioinen and Topi Paananen and Paul Christian Bürkner and Aki Vehtari},
+year = {2023},
+journal = {Stat Comp},
+volume = {34},
+issue = {57},
+doi = {10.1007/s11222-023-10366-5},
+encoding = {UTF-8},
+}
+
+@Article{Paananen_2020,
+  author = 	{Topi Paananen and Juho Piironen and Paul Christian B{\"u}rkner and Aki Vehtari},
+  title = {Implicitly adaptive importance sampling},
+  journal = {arXiv:1906.08850},
+  doi={10.1007/s11222-020-09982-2},
+  year = {2020}
+}
+
+@article{Gelman_2019,
+author = {Andrew Gelman and Ben Goodrich and Jonah Gabry and Aki Vehtari},
+title = {R-squared for {Bayesian} regression models},
+journal = {Am Stat},
+doi={10.1080/00031305.2018.1549100},
+volume = {73},
+number = {3},
+pages = {307-309},
+year  = {2019}
+}
+
+@article{Dimitriadis_2021,
+	title = {Stable reliability diagrams for probabilistic classifiers},
+	volume = {118},
+	issn = {0027-8424, 1091-6490},
+	url = {https://pnas.org/doi/full/10.1073/pnas.2016191118},
+	doi = {10.1073/pnas.2016191118},
+	language = {en},
+	number = {8},
+	urldate = {2023-04-12},
+	journal = {Proceedings of the National Academy of Sciences},
+	author = {Dimitriadis, Timo and Gneiting, Tilmann and Jordan, Alexander I.},
+	month = feb,
+	year = {2021},
+	pages = {e2016191118},
+}
+
+@article{Capretto_2022,
+ title={Bambi: A Simple Interface for Fitting Bayesian Linear Models in Python},
+ volume={103},
+ number={15},
+ journal={Journal of Statistical Software},
+ author={Capretto, Tomás and Piho, Camen and Kumar, Ravin and Westfall, Jacob and Yarkoni, Tal and Martin, Osvaldo A},
+ year={2022},
+ pages={1–29}
+}
+
+@article{Hoyer_2017,
+  title     = {xarray: {N-D} labeled arrays and datasets in {Python}},
+  author    = {Hoyer, S. and J. Hamman},
+  journal   = {Journal of Open Research Software},
+  volume    = {5},
+  number    = {1},
+  year      = {2017},
+  publisher = {Ubiquity Press},
+  doi       = {10.5334/jors.148},
+  url       = {https://doi.org/10.5334/jors.148}
+}
+
+@article{Hunter_2007,
+  Author    = {Hunter, J. D.},
+  Title     = {Matplotlib: A 2D graphics environment},
+  Journal   = {Computing in Science \& Engineering},
+  Volume    = {9},
+  Number    = {3},
+  Pages     = {90--95},
+  abstract  = {Matplotlib is a 2D graphics package used for Python for
+  application development, interactive scripting, and publication-quality
+  image generation across user interfaces and operating systems.},
+  publisher = {IEEE COMPUTER SOC},
+  doi       = {10.1109/MCSE.2007.55},
+  year      = 2007
+}
+
+@manual{Bokeh_2018,
+title = {Bokeh: Python library for interactive visualization},
+author = {{Bokeh Development Team}},
+year = {2018},
+url = {https://bokeh.pydata.org/en/latest/},
+}
+
+@online{plotly_2015,
+author = {Plotly Technologies Inc.},
+title = {Collaborative data science},
+publisher = {Plotly Technologies Inc.},
+address = {Montreal, QC},
+year = {2015},
+url = {https://plot.ly},
+}
+
+@misc{Guo_2024,
+      title={VMC: A Grammar for Visualizing Statistical Model Checks}, 
+      author={Ziyang Guo and Alex Kale and Matthew Kay and Jessica Hullman},
+      year={2024},
+      eprint={2408.16702},
+      archivePrefix={arXiv},
+      primaryClass={cs.HC},
+      url={https://arxiv.org/abs/2408.16702}, 
+}
+
+@book{Martin_2024,
+    title = {Bayesian {Analysis} with {Python}: {A} {Practical} {Guide} to probabilistic modeling, 3rd {Edition}},
+    isbn = {978-1-80512-716-1},
+    shorttitle = {Bayesian {Analysis} with {Python}},
+    language = {English},
+    publisher = {Packt Publishing},
+    author = {Martin, Osvaldo A},
+    month = feb,
+    year = {2024},
+}
+
+
+@article{emcee, doi = {10.21105/joss.01864}, url = {https://doi.org/10.21105/joss.01864}, year = {2019}, publisher = {The Open Journal}, volume = {4}, number = {43}, pages = {1864}, author = {Daniel Foreman-Mackey and Will M. Farr and Manodeep Sinha and Anne M. Archibald and David W. Hogg and Jeremy S. Sanders and Joe Zuntz and Peter K. g. Williams and Andrew R. j. Nelson and Miguel de Val-Borro and Tobias Erhardt and Ilya Pashchenko and Oriol Abril Pla}, title = {emcee v3: A Python ensemble sampling toolkit for affine-invariant MCMC}, journal = {Journal of Open Source Software} } 
+
+
+@book{eabm_2025,
+  author       = {Osvaldo A Martin and Oriol Abril-Pla and Jordan Deklerk},
+  title        = {Exploratory analysis of Bayesian models},
+  month        = nov,
+  year         = 2025,
+  publisher    = {Zenodo},
+  version      = {v0.3.0},
+  doi          = {10.5281/zenodo.15127548},
+  url          = {https://doi.org/10.5281/zenodo.15127548},
+                  },
+
+
+
+@article{icazatti_2023,
+author = {Icazatti, Alejandro and Abril-Pla, Oriol and Klami, Arto and Martin, Osvaldo A},
+doi = {10.21105/joss.05499},
+journal = {Journal of Open Source Software},
+month = sep,
+number = {89},
+pages = {5499},
+title = {{PreliZ: A tool-box for prior elicitation}},
+url = {https://joss.theoj.org/papers/10.21105/joss.05499},
+volume = {8},
+year = {2023}
+}