Skip to content

BiocPy/IRanges

Repository files navigation

Project generated with PyScaffold PyPI-Server Unit tests

Integer ranges in Python

Python implementation of the IRanges Bioconductor package.

To get started, install the package from PyPI

pip install iranges

# To install optional dependencies
pip install iranges[optional]

IRanges

An IRanges holds a start position and a width, and is most typically used to represent coordinates along some genomic sequence. The interpretation of the start position depends on the application; for sequences, the start is usually a 1-based position, but other use cases may allow zero or even negative values.

Note

ends are expected to be inclusive to be consistent with Bioconductor representations. If they are not, we recommend subtracting 1 from the ends.

from iranges import IRanges

starts = [1, 2, 3, 4]
widths = [4, 5, 6, 7]
x = IRanges(starts, widths)

print(x)
 ## output
 IRanges object with 4 ranges and 0 metadata columns
                start              end            width
      <ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
 [0]                1                4                4
 [1]                2                6                5
 [2]                3                9                6
 [3]                4               10                7

Interval Operations

IRanges supports most interval based operations. For example to compute gaps

x = IRanges([-2, 6, 9, -4, 1, 0, -6, 10], [5, 0, 6, 1, 4, 3, 2, 3])

gaps = x.gaps()
print(gaps)
 ## output
 IRanges object with 2 ranges and 0 metadata columns
                start              end            width
      <ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
 [0]               -3               -3                1
 [1]                5                8                4

Or Perform interval set operations

x = IRanges([1, 5, -2, 0, 14], [10, 5, 6, 12, 4])
y = IRanges([14, 0, -5, 6, 18], [7, 3, 8, 3, 3])

intersection = x.intersect(y)
print(intersection)
 ## output
 IRanges object with 3 ranges and 0 metadata columns
                start              end            width
      <ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
 [0]               -2                2                5
 [1]                6                9                3
 [2]               14               17                4

Overlap operations

IRanges uses LTLA/nclist-cpp under the hood to perform fast overlap and search based operations. These methods typically return a hits-like BiocFrame.

subject = IRanges([2, 2, 10], [1, 2, 3])
query = IRanges([1, 4, 9], [5, 4, 2])

overlap = subject.find_overlaps(query)
print(overlap)
 ## output
 BiocFrame with 3 rows and 2 columns
           self_hits       query_hits
      <ndarray[int64]> <ndarray[int64]>
 [0]                1                0
 [1]                0                0
 [2]                2                2

Similarly one can perform search operations like follow, precede or nearest.

query = IRanges([1, 3, 9], [2, 5, 2])
subject = IRanges([3, 5, 12], [1, 2, 1])

nearest = subject.nearest(query, select="all")
print(nearest)
 ## output
 BiocFrame with 4 rows and 2 columns
           query_hits        self_hits
      <ndarray[int64]> <ndarray[int64]>
 [0]                0                0
 [1]                0                1
 [2]                1                1
 [3]                2                2

Further Information

Note

This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.

Packages

No packages published

Contributors 3

  •  
  •  
  •