rxn.reaction_preprocessing.config.AugmentConfig

class rxn.reaction_preprocessing.config.AugmentConfig(input_file_path='${preprocess.output_file_path}', output_file_path='${data.proc_dir}/${data.name}.augmented.csv', tokenize=True, random_type=RandomType.unrestricted, permutations=1, reaction_column_name='${common.reaction_column_name}', rxn_section_to_augment=ReactionSection.precursors, fragment_bond='${common.fragment_bond}', keep_intermediate_columns='${common.keep_intermediate_columns}')[source]

Bases: object

Configuration for the augmentation transformation step.

Fields:

input_file_path: The input file path (one SMILES per line). output_file_path: The output file path. tokenize: if tokenization is to be performed random_type: The randomization type to be applied permutations: number of randomic permutations for input SMILES reaction_column_name: Name of the reaction column for the data file. rxn_section_to_augment: The section of the rxn SMILES to augment.

“precursors” for augmenting only the precursors “products” for augmenting only the products

fragment_bond: Token used to denote a fragment bond in the reaction SMILES. keep_intermediate_columns: Whether the columns generated during preprocessing should be kept.

Parameters

input_file_path (str, default: '${preprocess.output_file_path}') –
output_file_path (str, default: '${data.proc_dir}/${data.name}.augmented.csv') –
tokenize (bool, default: True) –
random_type (RandomType, default: <RandomType.unrestricted: 2>) –
permutations (int, default: 1) –
reaction_column_name (str, default: '${common.reaction_column_name}') –
rxn_section_to_augment (ReactionSection, default: <ReactionSection.precursors: 1>) –
fragment_bond (FragmentBond, default: '${common.fragment_bond}') –
keep_intermediate_columns (bool, default: '${common.keep_intermediate_columns}') –

__init__(input_file_path='${preprocess.output_file_path}', output_file_path='${data.proc_dir}/${data.name}.augmented.csv', tokenize=True, random_type=RandomType.unrestricted, permutations=1, reaction_column_name='${common.reaction_column_name}', rxn_section_to_augment=ReactionSection.precursors, fragment_bond='${common.fragment_bond}', keep_intermediate_columns='${common.keep_intermediate_columns}')

Parameters

input_file_path (str, default: '${preprocess.output_file_path}') –
output_file_path (str, default: '${data.proc_dir}/${data.name}.augmented.csv') –
tokenize (bool, default: True) –
random_type (RandomType, default: <RandomType.unrestricted: 2>) –
permutations (int, default: 1) –
reaction_column_name (str, default: '${common.reaction_column_name}') –
rxn_section_to_augment (ReactionSection, default: <ReactionSection.precursors: 1>) –
fragment_bond (FragmentBond, default: '${common.fragment_bond}') –
keep_intermediate_columns (bool, default: '${common.keep_intermediate_columns}') –

Return type

None

Methods

__init__([input_file_path, ...])

param input_file_path

Attributes

`fragment_bond`
`input_file_path`
`keep_intermediate_columns`
`output_file_path`
`permutations`
`random_type`
`reaction_column_name`
`rxn_section_to_augment`
`tokenize`