rxn.reaction_preprocessing.config.PreprocessConfig
- class rxn.reaction_preprocessing.config.PreprocessConfig(input_file_path='${standardize.output_file_path}', output_file_path='${data.proc_dir}/${data.name}.processed.csv', min_reactants=2, max_reactants=10, max_reactants_tokens=300, min_agents=0, max_agents=0, max_agents_tokens=0, min_products=1, max_products=1, max_products_tokens=200, max_absolute_formal_charge=2, fragment_bond='${common.fragment_bond}', reaction_column_name='${common.reaction_column_name}', keep_intermediate_columns='${common.keep_intermediate_columns}')[source]
Bases:
object
Configuration for the preprocess transformation step.
- Fields:
input_file_path: The input file path (one reaction SMARTS per line). output_file_path: The output file path containing the result after preprocessing. min_reactants: The minimum number of reactants. max_reactants: The maximum number of reactants. max_reactants_tokens: The maximum number of reactants tokens. min_agents: The minimum number of agents. max_agents: The maximum number of agents. max_agents_tokens: The maximum number of agents tokens. min_products: The minimum number of products. max_products: The maximum number of products. max_products_tokens: The maximum number of products tokens. max_absolute_formal_charge: The maximum absolute formal charge. fragment_bond: Token used to denote a fragment bond in the reaction SMILES. reaction_column_name: Name of the reaction column for the data file. keep_intermediate_columns: Whether the columns generated during preprocessing should be kept.
- Parameters
input_file_path (
str
, default:'${standardize.output_file_path}'
) –output_file_path (
str
, default:'${data.proc_dir}/${data.name}.processed.csv'
) –min_reactants (
int
, default:2
) –max_reactants (
int
, default:10
) –max_reactants_tokens (
int
, default:300
) –min_agents (
int
, default:0
) –max_agents (
int
, default:0
) –max_agents_tokens (
int
, default:0
) –min_products (
int
, default:1
) –max_products (
int
, default:1
) –max_products_tokens (
int
, default:200
) –max_absolute_formal_charge (
int
, default:2
) –fragment_bond (
FragmentBond
, default:'${common.fragment_bond}'
) –reaction_column_name (
str
, default:'${common.reaction_column_name}'
) –keep_intermediate_columns (
bool
, default:'${common.keep_intermediate_columns}'
) –
- __init__(input_file_path='${standardize.output_file_path}', output_file_path='${data.proc_dir}/${data.name}.processed.csv', min_reactants=2, max_reactants=10, max_reactants_tokens=300, min_agents=0, max_agents=0, max_agents_tokens=0, min_products=1, max_products=1, max_products_tokens=200, max_absolute_formal_charge=2, fragment_bond='${common.fragment_bond}', reaction_column_name='${common.reaction_column_name}', keep_intermediate_columns='${common.keep_intermediate_columns}')
- Parameters
input_file_path (
str
, default:'${standardize.output_file_path}'
) –output_file_path (
str
, default:'${data.proc_dir}/${data.name}.processed.csv'
) –min_reactants (
int
, default:2
) –max_reactants (
int
, default:10
) –max_reactants_tokens (
int
, default:300
) –min_agents (
int
, default:0
) –max_agents (
int
, default:0
) –max_agents_tokens (
int
, default:0
) –min_products (
int
, default:1
) –max_products (
int
, default:1
) –max_products_tokens (
int
, default:200
) –max_absolute_formal_charge (
int
, default:2
) –fragment_bond (
FragmentBond
, default:'${common.fragment_bond}'
) –reaction_column_name (
str
, default:'${common.reaction_column_name}'
) –keep_intermediate_columns (
bool
, default:'${common.keep_intermediate_columns}'
) –
- Return type
None
Methods
__init__
([input_file_path, ...])- param input_file_path
Attributes
fragment_bond
input_file_path
keep_intermediate_columns
max_absolute_formal_charge
max_agents
max_agents_tokens
max_products
max_products_tokens
max_reactants
max_reactants_tokens
min_agents
min_products
min_reactants
output_file_path
reaction_column_name