RXN utilities package
This repository contains general Python utilities commonly used in the RXN universe.
For utilities related to chemistry, see our other repository rxn-chemutils
.
Links:
System Requirements
This package is supported on all operating systems. It has been tested on the following systems:
macOS: Big Sur (11.1)
Linux: Ubuntu 18.04.4
A Python version of 3.6 or greater is recommended.
Installation guide
The package can be installed from Pypi:
pip install rxn-utils
For local development, the package can be installed with:
pip install -e ".[dev]"
Package highlights
Stable shuffling
For reproducible shuffling, or for shuffling two files of identical length so that the same permutation is obtained, one can use the stable_shuffle
function.
The executable rxn-stable-shuffle
is also provided for this purpose.
Both also work with CSV files if the appropriate flag is provided.
chunker
and remove_duplicates
For batching an iterable into lists of a specified size, chunker
comes in handy.
It also does so in a memory-efficient way.
>>> from rxn.utilities.containers import chunker
>>> for chunk in chunker(range(1, 10), chunk_size=4):
... print(chunk)
[1, 2, 3, 4]
[5, 6, 7, 8]
[9]
remove_duplicates
(or iterate_unique_values
, its memory-efficient variant) removes duplicates from a container, possibly based on a callable instead of the values:
>>> from rxn.utilities.containers import remove_duplicates
>>> remove_duplicates([3, 6, 9, 2, 3, 1, 9])
[3, 6, 9, 2, 1]
>>> remove_duplicates(["ab", "cd", "efg", "hijk", "", "lmn"], key=lambda x: len(x))
['ab', 'efg', 'hijk', '']
Regex utilities
regex.py
provides a few functions that make it easier to build regex strings (considering whether segments should be optional, capturing, etc.).
Others
A custom, more general enum class,
RxnEnum
.remove_prefix
,remove_postfix
.Initialization of loggers, in a
logging
-compatible way:logging.py
.sandboxed_random_context
andtemporary_random_seed
, to create a context with a specific random state that will not have side effects. Especially useful for testing purposes (unit tests).… and others.