pyrxiv is a Python package for retrieving arXiv papers, storing their metadata in pydantic-like classes, and optionally filtering some of them out based on the specific content of the papers (matching a regex pattern).
While originally developed for the Strongly Correlated Electron Systems community in Condensed Matter Physics (cond-mat.str-el
), it's designed to be flexible and applicable to any arXiv category.
Install the core package:
pip install pyrxiv
pyrxiv main objective is to provide an easy command line interface (CLI) to search and download arXiv papers which contain a specific content string matched against a regex pattern. You can use the CLI and print the options after installing the package using:
pyrxiv --help
or directly:
pyrxiv search_and_download --help
For example:
pyrxiv search_and_download --category cond-mat.str-el --regex-pattern "DMFT|Hubbard" --n-papers 5
To contribute to pyrxiv
or run it locally, follow these steps:
git clone https://github.com/JosePizarro3/pyrxiv.git
cd pyrxiv
We recommend Python ≥ 3.10:
python3 -m venv .venv
source .venv/bin/activate
Use uv
(faster than pip) to install the package in editable mode with dev
extras:
pip install --upgrade pip
pip install uv
uv pip install -e .[dev]
Use pytest
with verbosity to run all tests:
python -m pytest -sv tests
To check code coverage:
python -m pytest --cov=pyrxiv tests
We use Ruff
for formatting and linting (configured via pyproject.toml
).
Check linting issues:
ruff check .
Auto-format code:
ruff format .
Manually fix anything Ruff cannot handle automatically.