rxn.reaction_preprocessing.augmenter.Augmenter

class rxn.reaction_preprocessing.augmenter.Augmenter(df, reaction_column_name, fragment_bond='.')[source]

Bases: object

Augmenter.

Note: Unlike the other classes, which are memory-efficient, this one loads the whole data in a pandas DataFrame for processing.

Parameters
  • df (DataFrame) –

  • reaction_column_name (str) –

  • fragment_bond (str, default: '.') –

__init__(df, reaction_column_name, fragment_bond='.')[source]

Creates a new instance of the Augmenter class.

Parameters
  • df (pd.DataFrame) – A pandas DataFrame containing the molecules SMILES.

  • reaction_column_name (str) – The name of the DataFrame column containing the reaction SMILES.

  • fragment_bond (str) – The fragment bond token contained in the SMILES.

Methods

__init__(df, reaction_column_name[, ...])

Creates a new instance of the Augmenter class.

augment([random_type, ...])

Creates samples for the augmentation.

read_csv(filepath, reaction_column_name[, ...])

A helper function to read a list or csv of SMILES.

augment(random_type=RandomType.unrestricted, rxn_section_to_augment=ReactionSection.precursors, permutations=1)[source]

Creates samples for the augmentation. Returns a a pandas Series containing the augmented samples.

Parameters
  • random_type (RandomType) – The string identifying the type of randomization to apply. “molecules” for randomization of the molecules (canonical SMILES kept) “unrestricted” for unrestricted randomization “restricted” for restricted randomization “rotated” for rotated randomization For details on the differences: https://github.com/undeadpixel/reinvent-randomized and https://github.com/GLambard/SMILES-X

  • rxn_section_to_augment (ReactionSection) – The section of the rxn SMILES to augment. “precursors” for augmenting only the precursors “products” for augmenting only the products

  • permutations (int) – The number of permutations to generate for each SMILES

Returns

A pandas Series containing the augmented samples.

Return type

pd.DataFrame

static read_csv(filepath, reaction_column_name, fragment_bond='.')[source]

A helper function to read a list or csv of SMILES.

Parameters
  • filepath (str) – The path to the text file containing the molecules SMILES.

  • reaction_column_name (str) – The name of the reaction column (or the name that wil be given to the reaction column if the input file has no headers).

  • fragment_bond (str) – The fragment token in the reaction SMILES

Returns

A new augmenter instance.

Return type

Augmenter