rxn.reaction_preprocessing.config.AugmentConfig
- class rxn.reaction_preprocessing.config.AugmentConfig(input_file_path='${preprocess.output_file_path}', output_file_path='${data.proc_dir}/${data.name}.augmented.csv', tokenize=True, random_type=RandomType.unrestricted, permutations=1, reaction_column_name='${common.reaction_column_name}', rxn_section_to_augment=ReactionSection.precursors, fragment_bond='${common.fragment_bond}', keep_intermediate_columns='${common.keep_intermediate_columns}')[source]
Bases:
object
Configuration for the augmentation transformation step.
- Fields:
input_file_path: The input file path (one SMILES per line). output_file_path: The output file path. tokenize: if tokenization is to be performed random_type: The randomization type to be applied permutations: number of randomic permutations for input SMILES reaction_column_name: Name of the reaction column for the data file. rxn_section_to_augment: The section of the rxn SMILES to augment.
“precursors” for augmenting only the precursors “products” for augmenting only the products
fragment_bond: Token used to denote a fragment bond in the reaction SMILES. keep_intermediate_columns: Whether the columns generated during preprocessing should be kept.
- Parameters
input_file_path (
str
, default:'${preprocess.output_file_path}'
) –output_file_path (
str
, default:'${data.proc_dir}/${data.name}.augmented.csv'
) –tokenize (
bool
, default:True
) –random_type (
RandomType
, default:<RandomType.unrestricted: 2>
) –permutations (
int
, default:1
) –reaction_column_name (
str
, default:'${common.reaction_column_name}'
) –rxn_section_to_augment (
ReactionSection
, default:<ReactionSection.precursors: 1>
) –fragment_bond (
FragmentBond
, default:'${common.fragment_bond}'
) –keep_intermediate_columns (
bool
, default:'${common.keep_intermediate_columns}'
) –
- __init__(input_file_path='${preprocess.output_file_path}', output_file_path='${data.proc_dir}/${data.name}.augmented.csv', tokenize=True, random_type=RandomType.unrestricted, permutations=1, reaction_column_name='${common.reaction_column_name}', rxn_section_to_augment=ReactionSection.precursors, fragment_bond='${common.fragment_bond}', keep_intermediate_columns='${common.keep_intermediate_columns}')
- Parameters
input_file_path (
str
, default:'${preprocess.output_file_path}'
) –output_file_path (
str
, default:'${data.proc_dir}/${data.name}.augmented.csv'
) –tokenize (
bool
, default:True
) –random_type (
RandomType
, default:<RandomType.unrestricted: 2>
) –permutations (
int
, default:1
) –reaction_column_name (
str
, default:'${common.reaction_column_name}'
) –rxn_section_to_augment (
ReactionSection
, default:<ReactionSection.precursors: 1>
) –fragment_bond (
FragmentBond
, default:'${common.fragment_bond}'
) –keep_intermediate_columns (
bool
, default:'${common.keep_intermediate_columns}'
) –
- Return type
None
Methods
__init__
([input_file_path, ...])- param input_file_path
Attributes
fragment_bond
input_file_path
keep_intermediate_columns
output_file_path
permutations
random_type
reaction_column_name
rxn_section_to_augment
tokenize