RXN utilities package

Actions tests

This repository contains general Python utilities commonly used in the RXN universe. For utilities related to chemistry, see our other repository rxn-chemutils.

Links:

System Requirements

This package is supported on all operating systems. It has been tested on the following systems:

  • macOS: Big Sur (11.1)

  • Linux: Ubuntu 18.04.4

A Python version of 3.6 or greater is recommended.

Installation guide

The package can be installed from Pypi:

pip install rxn-utils

For local development, the package can be installed with:

pip install -e ".[dev]"

Package highlights

Stable shuffling

For reproducible shuffling, or for shuffling two files of identical length so that the same permutation is obtained, one can use the stable_shuffle function. The executable rxn-stable-shuffle is also provided for this purpose.

Both also work with CSV files if the appropriate flag is provided.

chunker and remove_duplicates

For batching an iterable into lists of a specified size, chunker comes in handy. It also does so in a memory-efficient way.

>>> from rxn.utilities.containers import chunker
>>> for chunk in chunker(range(1, 10), chunk_size=4):
...     print(chunk)
[1, 2, 3, 4]
[5, 6, 7, 8]
[9]

remove_duplicates (or iterate_unique_values, its memory-efficient variant) removes duplicates from a container, possibly based on a callable instead of the values:

>>> from rxn.utilities.containers import remove_duplicates
>>> remove_duplicates([3, 6, 9, 2, 3, 1, 9])
[3, 6, 9, 2, 1]
>>> remove_duplicates(["ab", "cd", "efg", "hijk", "", "lmn"], key=lambda x: len(x))
['ab', 'efg', 'hijk', '']

Regex utilities

regex.py provides a few functions that make it easier to build regex strings (considering whether segments should be optional, capturing, etc.).

Others

  • A custom, more general enum class, RxnEnum.

  • remove_prefix, remove_postfix.

  • Initialization of loggers, in a logging-compatible way: logging.py.

  • sandboxed_random_context and temporary_random_seed, to create a context with a specific random state that will not have side effects. Especially useful for testing purposes (unit tests).

  • … and others.