rxn.reaction_preprocessing.config.AugmentConfig

class rxn.reaction_preprocessing.config.AugmentConfig(input_file_path='${preprocess.output_file_path}', output_file_path='${data.proc_dir}/${data.name}.augmented.csv', tokenize=True, random_type=RandomType.unrestricted, permutations=1, reaction_column_name='${common.reaction_column_name}', rxn_section_to_augment=ReactionSection.precursors, fragment_bond='${common.fragment_bond}', keep_intermediate_columns='${common.keep_intermediate_columns}')[source]

Bases: object

Configuration for the augmentation transformation step.

Fields:

input_file_path: The input file path (one SMILES per line). output_file_path: The output file path. tokenize: if tokenization is to be performed random_type: The randomization type to be applied permutations: number of randomic permutations for input SMILES reaction_column_name: Name of the reaction column for the data file. rxn_section_to_augment: The section of the rxn SMILES to augment.

“precursors” for augmenting only the precursors “products” for augmenting only the products

fragment_bond: Token used to denote a fragment bond in the reaction SMILES. keep_intermediate_columns: Whether the columns generated during preprocessing should be kept.

Parameters
  • input_file_path (str, default: '${preprocess.output_file_path}') –

  • output_file_path (str, default: '${data.proc_dir}/${data.name}.augmented.csv') –

  • tokenize (bool, default: True) –

  • random_type (RandomType, default: <RandomType.unrestricted: 2>) –

  • permutations (int, default: 1) –

  • reaction_column_name (str, default: '${common.reaction_column_name}') –

  • rxn_section_to_augment (ReactionSection, default: <ReactionSection.precursors: 1>) –

  • fragment_bond (FragmentBond, default: '${common.fragment_bond}') –

  • keep_intermediate_columns (bool, default: '${common.keep_intermediate_columns}') –

__init__(input_file_path='${preprocess.output_file_path}', output_file_path='${data.proc_dir}/${data.name}.augmented.csv', tokenize=True, random_type=RandomType.unrestricted, permutations=1, reaction_column_name='${common.reaction_column_name}', rxn_section_to_augment=ReactionSection.precursors, fragment_bond='${common.fragment_bond}', keep_intermediate_columns='${common.keep_intermediate_columns}')
Parameters
  • input_file_path (str, default: '${preprocess.output_file_path}') –

  • output_file_path (str, default: '${data.proc_dir}/${data.name}.augmented.csv') –

  • tokenize (bool, default: True) –

  • random_type (RandomType, default: <RandomType.unrestricted: 2>) –

  • permutations (int, default: 1) –

  • reaction_column_name (str, default: '${common.reaction_column_name}') –

  • rxn_section_to_augment (ReactionSection, default: <ReactionSection.precursors: 1>) –

  • fragment_bond (FragmentBond, default: '${common.fragment_bond}') –

  • keep_intermediate_columns (bool, default: '${common.keep_intermediate_columns}') –

Return type

None

Methods

__init__([input_file_path, ...])

param input_file_path

Attributes

fragment_bond

input_file_path

keep_intermediate_columns

output_file_path

permutations

random_type

reaction_column_name

rxn_section_to_augment

tokenize