Python implementation of the IRanges Bioconductor package.
To get started, install the package from PyPI
pip install iranges
# To install optional dependencies
pip install iranges[optional]
An IRanges
holds a start position and a width, and is most typically used to represent coordinates along some genomic sequence. The interpretation of the start position depends on the application; for sequences, the start is usually a 1-based position, but other use cases may allow zero or even negative values.
Note
ends
are expected to be inclusive to be consistent with Bioconductor representations. If they are not, we recommend subtracting 1 from the ends
.
from iranges import IRanges
starts = [1, 2, 3, 4]
widths = [4, 5, 6, 7]
x = IRanges(starts, widths)
print(x)
## output
IRanges object with 4 ranges and 0 metadata columns
start end width
<ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
[0] 1 4 4
[1] 2 6 5
[2] 3 9 6
[3] 4 10 7
IRanges
supports most interval based operations. For example to compute gaps
x = IRanges([-2, 6, 9, -4, 1, 0, -6, 10], [5, 0, 6, 1, 4, 3, 2, 3])
gaps = x.gaps()
print(gaps)
## output
IRanges object with 2 ranges and 0 metadata columns
start end width
<ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
[0] -3 -3 1
[1] 5 8 4
Or Perform interval set operations
x = IRanges([1, 5, -2, 0, 14], [10, 5, 6, 12, 4])
y = IRanges([14, 0, -5, 6, 18], [7, 3, 8, 3, 3])
intersection = x.intersect(y)
print(intersection)
## output
IRanges object with 3 ranges and 0 metadata columns
start end width
<ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
[0] -2 2 5
[1] 6 9 3
[2] 14 17 4
IRanges uses LTLA/nclist-cpp under the hood to perform fast overlap and search based operations. These methods typically return a hits-like BiocFrame.
subject = IRanges([2, 2, 10], [1, 2, 3])
query = IRanges([1, 4, 9], [5, 4, 2])
overlap = subject.find_overlaps(query)
print(overlap)
## output
BiocFrame with 3 rows and 2 columns
self_hits query_hits
<ndarray[int64]> <ndarray[int64]>
[0] 1 0
[1] 0 0
[2] 2 2
Similarly one can perform search operations like follow, precede or nearest.
query = IRanges([1, 3, 9], [2, 5, 2])
subject = IRanges([3, 5, 12], [1, 2, 1])
nearest = subject.nearest(query, select="all")
print(nearest)
## output
BiocFrame with 4 rows and 2 columns
query_hits self_hits
<ndarray[int64]> <ndarray[int64]>
[0] 0 0
[1] 0 1
[2] 1 1
[3] 2 2
This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.