rxn.reaction_preprocessing.config.PreprocessConfig

class rxn.reaction_preprocessing.config.PreprocessConfig(input_file_path='${standardize.output_file_path}', output_file_path='${data.proc_dir}/${data.name}.processed.csv', min_reactants=2, max_reactants=10, max_reactants_tokens=300, min_agents=0, max_agents=0, max_agents_tokens=0, min_products=1, max_products=1, max_products_tokens=200, max_absolute_formal_charge=2, fragment_bond='${common.fragment_bond}', reaction_column_name='${common.reaction_column_name}', keep_intermediate_columns='${common.keep_intermediate_columns}')[source]

Bases: object

Configuration for the preprocess transformation step.

Fields:

input_file_path: The input file path (one reaction SMARTS per line). output_file_path: The output file path containing the result after preprocessing. min_reactants: The minimum number of reactants. max_reactants: The maximum number of reactants. max_reactants_tokens: The maximum number of reactants tokens. min_agents: The minimum number of agents. max_agents: The maximum number of agents. max_agents_tokens: The maximum number of agents tokens. min_products: The minimum number of products. max_products: The maximum number of products. max_products_tokens: The maximum number of products tokens. max_absolute_formal_charge: The maximum absolute formal charge. fragment_bond: Token used to denote a fragment bond in the reaction SMILES. reaction_column_name: Name of the reaction column for the data file. keep_intermediate_columns: Whether the columns generated during preprocessing should be kept.

Parameters
  • input_file_path (str, default: '${standardize.output_file_path}') –

  • output_file_path (str, default: '${data.proc_dir}/${data.name}.processed.csv') –

  • min_reactants (int, default: 2) –

  • max_reactants (int, default: 10) –

  • max_reactants_tokens (int, default: 300) –

  • min_agents (int, default: 0) –

  • max_agents (int, default: 0) –

  • max_agents_tokens (int, default: 0) –

  • min_products (int, default: 1) –

  • max_products (int, default: 1) –

  • max_products_tokens (int, default: 200) –

  • max_absolute_formal_charge (int, default: 2) –

  • fragment_bond (FragmentBond, default: '${common.fragment_bond}') –

  • reaction_column_name (str, default: '${common.reaction_column_name}') –

  • keep_intermediate_columns (bool, default: '${common.keep_intermediate_columns}') –

__init__(input_file_path='${standardize.output_file_path}', output_file_path='${data.proc_dir}/${data.name}.processed.csv', min_reactants=2, max_reactants=10, max_reactants_tokens=300, min_agents=0, max_agents=0, max_agents_tokens=0, min_products=1, max_products=1, max_products_tokens=200, max_absolute_formal_charge=2, fragment_bond='${common.fragment_bond}', reaction_column_name='${common.reaction_column_name}', keep_intermediate_columns='${common.keep_intermediate_columns}')
Parameters
  • input_file_path (str, default: '${standardize.output_file_path}') –

  • output_file_path (str, default: '${data.proc_dir}/${data.name}.processed.csv') –

  • min_reactants (int, default: 2) –

  • max_reactants (int, default: 10) –

  • max_reactants_tokens (int, default: 300) –

  • min_agents (int, default: 0) –

  • max_agents (int, default: 0) –

  • max_agents_tokens (int, default: 0) –

  • min_products (int, default: 1) –

  • max_products (int, default: 1) –

  • max_products_tokens (int, default: 200) –

  • max_absolute_formal_charge (int, default: 2) –

  • fragment_bond (FragmentBond, default: '${common.fragment_bond}') –

  • reaction_column_name (str, default: '${common.reaction_column_name}') –

  • keep_intermediate_columns (bool, default: '${common.keep_intermediate_columns}') –

Return type

None

Methods

__init__([input_file_path, ...])

param input_file_path

Attributes

fragment_bond

input_file_path

keep_intermediate_columns

max_absolute_formal_charge

max_agents

max_agents_tokens

max_products

max_products_tokens

max_reactants

max_reactants_tokens

min_agents

min_products

min_reactants

output_file_path

reaction_column_name