rxn.reaction_preprocessing.molecule_standardizer.MoleculeStandardizer

class rxn.reaction_preprocessing.molecule_standardizer.MoleculeStandardizer(annotations=None, discard_missing_annotations=False, canonicalize=True)[source]

Bases: object

Class to standardize standalone molecules (reactions are standardized with the Standardizer class).

Note that the standardization of one molecule may lead to a combination of molecules, hence the functions return lists of strings.

Parameters
  • annotations (Optional[List[MoleculeAnnotation]], default: None) –

  • discard_missing_annotations (bool, default: False) –

  • canonicalize (bool, default: True) –

__init__(annotations=None, discard_missing_annotations=False, canonicalize=True)[source]
Parameters
  • annotations (Optional[List[MoleculeAnnotation]], default: None) – A list of MoleculeAnnotation objects used to perform the substitutions /rejections. Defaults to an empty list.

  • discard_missing_annotations (bool, default: False) – whether reactions containing unannotated molecules that should be must be rejected.

  • canonicalize (bool, default: True) – whether to canonicalize the compounds.

Methods

__init__([annotations, ...])

type annotations

Optional[List[MoleculeAnnotation]], default: None

standardize(smiles)

Standardize a molecule.

standardize_in_equation(reaction)

Do the molecule-wise standardization for a reaction equation.

standardize_in_equation_with_errors(reaction)

Do the molecule-wise standardization for a reaction equation, and get the reasons for potential failures.

standardize(smiles)[source]

Standardize a molecule.

The returned value is a list, because in some cases standardization returns two independent molecules.

Parameters

smiles (str) – SMILES string to standardize. Use dots for fragment bonds!

Raises
  • SanitizationError of one of its subclasses – error in sanitization.

  • InvalidSmiles – Invalid SMILES.

  • ValueError – “~” being used for fragment bonds.

Return type

List[str]

Returns

Standardized SMILES string.

standardize_in_equation(reaction)[source]

Do the molecule-wise standardization for a reaction equation.

Relies on standardize_in_equation_with_errors(), for modularity purposes. Will propagate the exceptions raised in that function.

Parameters

reaction (ReactionEquation) –

Return type

ReactionEquation

standardize_in_equation_with_errors(reaction, propagate_exceptions=False)[source]

Do the molecule-wise standardization for a reaction equation, and get the reasons for potential failures.

This function was originally implemented in Standardizer, and then moved here for more modularity.

Parameters
  • reaction (ReactionEquation) – reaction to standardize.

  • propagate_exceptions (bool, default: False) – if True, will stop execution and raise directly instead of collecting the SMILES leading to the failure. Not ideal, but probably the only way (?) to not have duplicated code in the function standardize_in_equation().

Returns

  • the standardized reaction equation (or an empty one if there was a failure).

  • list of invalid SMILES in the reaction.

  • list of rejected SMILES in the reaction.

  • list of missing annotations in the reaction.

Return type

Tuple