rxn.reaction_preprocessing.augmenter.Augmenter
- class rxn.reaction_preprocessing.augmenter.Augmenter(df, reaction_column_name, fragment_bond='.')[source]
Bases:
object
Augmenter.
Note: Unlike the other classes, which are memory-efficient, this one loads the whole data in a pandas DataFrame for processing.
- Parameters
df (
DataFrame
) –reaction_column_name (
str
) –fragment_bond (
str
, default:'.'
) –
- __init__(df, reaction_column_name, fragment_bond='.')[source]
Creates a new instance of the Augmenter class.
- Parameters
df (pd.DataFrame) – A pandas DataFrame containing the molecules SMILES.
reaction_column_name (
str
) – The name of the DataFrame column containing the reaction SMILES.fragment_bond (str) – The fragment bond token contained in the SMILES.
Methods
__init__
(df, reaction_column_name[, ...])Creates a new instance of the Augmenter class.
augment
([random_type, ...])Creates samples for the augmentation.
read_csv
(filepath, reaction_column_name[, ...])A helper function to read a list or csv of SMILES.
- augment(random_type=RandomType.unrestricted, rxn_section_to_augment=ReactionSection.precursors, permutations=1)[source]
Creates samples for the augmentation. Returns a a pandas Series containing the augmented samples.
- Parameters
random_type (RandomType) – The string identifying the type of randomization to apply. “molecules” for randomization of the molecules (canonical SMILES kept) “unrestricted” for unrestricted randomization “restricted” for restricted randomization “rotated” for rotated randomization For details on the differences: https://github.com/undeadpixel/reinvent-randomized and https://github.com/GLambard/SMILES-X
rxn_section_to_augment (ReactionSection) – The section of the rxn SMILES to augment. “precursors” for augmenting only the precursors “products” for augmenting only the products
permutations (int) – The number of permutations to generate for each SMILES
- Returns
A pandas Series containing the augmented samples.
- Return type
pd.DataFrame
- static read_csv(filepath, reaction_column_name, fragment_bond='.')[source]
A helper function to read a list or csv of SMILES.
- Parameters
filepath (str) – The path to the text file containing the molecules SMILES.
reaction_column_name (
str
) – The name of the reaction column (or the name that wil be given to the reaction column if the input file has no headers).fragment_bond (str) – The fragment token in the reaction SMILES
- Returns
A new augmenter instance.
- Return type