Skip to content

dswah/pyGAM

Repository files navigation

pyGAM

Generalized Additive Models in Python.

🚀 Version 0.10.1 out now! See release notes here.

pyGAM is a package for building Generalized Additive Models in Python, with an emphasis on modularity and performance.

The API is designed for users of scikit-learn or scipy.

Documentation · Tutorials · Medium article
Open Source Apache 2.0 GC.OS Sponsored
Community !discord !slack
CI/CD github-actions readthedocs
Code !pypi !conda !python-versions !black
Downloads PyPI - Downloads PyPI - Downloads Downloads
Citation !zenodo

Documentation

Installation

pip install pygam

scikit-sparse

To speed up optimization on large models with constraints, it helps to have scikit-sparse installed because it contains a slightly faster, sparse version of Cholesky factorization. The import from scikit-sparse references nose, so you'll need that too.

The easiest way is to use Conda: conda install -c conda-forge scikit-sparse nose

scikit-sparse project

Contributing - HELP REQUESTED

Contributions are most welcome!

You can help pyGAM in many ways including:

  • Working on a known bug.
  • Trying it out and reporting bugs or what was difficult.
  • Helping improve the documentation.
  • Writing new distributions, and link functions.
  • If you need some ideas, please take a look at the issues.

To start:

  • fork the project and cut a new branch
  • install pygam, editable with developer dependencies (in a new python environment)
pip install --upgrade pip
pip install -e ".[dev]"

Make some changes and write a test...

  • Test your contribution (eg from the .../pyGAM): py.test -s
  • When you are happy with your changes, make a pull request into the master branch of the main project.

About

Generalized Additive Models (GAMs) are smooth semi-parametric models of the form:

$$g\left(\mathbb{E}[y|X]\right)=\beta_0+f_1(X_1)+f_2(X_2)+\dots+f_p(X_p)$$

where $X = [X_1, X_2, ..., X_p]$ are independent variables, $y$ is the dependent variable, and $g$ is a link function that relates our predictor variables to the expected value of the dependent variable.

The feature functions $f_i$ are built using penalized B-splines, which allow us to automatically model non-linear relationships without having to manually try out many different transformations on each variable.

GAMs extend generalized linear models by allowing non-linear functions of features while maintaining additivity.

Since GAMs are additive, it is easy to examine the effect of each $X_i$ on $y$ individually while holding all other predictors constant.

As a result, GAMs are a class of very flexible and interpretable models, which also make it is easy to incorporate prior knowledge and control overfitting.

Citing pyGAM

Please consider citing pyGAM if it has helped you in your research or work:

Daniel Servén, & Charlie Brummitt. (2018, March 27). pyGAM: Generalized Additive Models in Python. Zenodo. DOI: 10.5281/zenodo.1208723

BibTex:

@misc{daniel\_serven\_2018_1208723,
  author       = {Daniel Servén and
                  Charlie Brummitt},
  title        = {pyGAM: Generalized Additive Models in Python},
  month        = mar,
  year         = 2018,
  doi          = {10.5281/zenodo.1208723},
  url          = {https://doi.org/10.5281/zenodo.1208723}
}

References

  1. Simon N. Wood, 2006 Generalized Additive Models: an introduction with R

  2. Hastie, Tibshirani, Friedman The Elements of Statistical Learning https://www.sas.upenn.edu/~fdiebold/NoHesitations/BookAdvanced.pdf

  3. James, Witten, Hastie and Tibshirani An Introduction to Statistical Learning http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.pdf

  4. Paul Eilers & Brian Marx, 1996 Flexible Smoothing with B-splines and Penalties https://sites.stat.washington.edu/courses/stat527/s14/readings/EilersMarx_StatSci_1996.pdf

  5. Kim Larsen, 2015 GAM: The Predictive Modeling Silver Bullet http://multithreaded.stitchfix.com/assets/files/gam.pdf

  6. Deva Ramanan, 2008 UCI Machine Learning: Notes on IRLS http://www.ics.uci.edu/~dramanan/teaching/ics273a_winter08/homework/irls_notes.pdf

  7. Paul Eilers & Brian Marx, 2015 International Biometric Society: A Crash Course on P-splines https://multithreaded.stitchfix.com/assets/files/gam.pdf

  8. Keiding, Niels, 1991 Age-specific incidence and prevalence: a statistical perspective https://academic.oup.com/jrsssa/article-abstract/154/3/371/7106499