rxn.reaction_preprocessing.config.PreprocessConfig
- class rxn.reaction_preprocessing.config.PreprocessConfig(input_file_path='${standardize.output_file_path}', output_file_path='${data.proc_dir}/${data.name}.processed.csv', min_reactants=2, max_reactants=10, max_reactants_tokens=300, min_agents=0, max_agents=0, max_agents_tokens=0, min_products=1, max_products=1, max_products_tokens=200, max_absolute_formal_charge=2, fragment_bond='${common.fragment_bond}', reaction_column_name='${common.reaction_column_name}', keep_intermediate_columns='${common.keep_intermediate_columns}')[source]
Bases:
objectConfiguration for the preprocess transformation step.
- Fields:
input_file_path: The input file path (one reaction SMARTS per line). output_file_path: The output file path containing the result after preprocessing. min_reactants: The minimum number of reactants. max_reactants: The maximum number of reactants. max_reactants_tokens: The maximum number of reactants tokens. min_agents: The minimum number of agents. max_agents: The maximum number of agents. max_agents_tokens: The maximum number of agents tokens. min_products: The minimum number of products. max_products: The maximum number of products. max_products_tokens: The maximum number of products tokens. max_absolute_formal_charge: The maximum absolute formal charge. fragment_bond: Token used to denote a fragment bond in the reaction SMILES. reaction_column_name: Name of the reaction column for the data file. keep_intermediate_columns: Whether the columns generated during preprocessing should be kept.
- Parameters
input_file_path (
str, default:'${standardize.output_file_path}') –output_file_path (
str, default:'${data.proc_dir}/${data.name}.processed.csv') –min_reactants (
int, default:2) –max_reactants (
int, default:10) –max_reactants_tokens (
int, default:300) –min_agents (
int, default:0) –max_agents (
int, default:0) –max_agents_tokens (
int, default:0) –min_products (
int, default:1) –max_products (
int, default:1) –max_products_tokens (
int, default:200) –max_absolute_formal_charge (
int, default:2) –fragment_bond (
FragmentBond, default:'${common.fragment_bond}') –reaction_column_name (
str, default:'${common.reaction_column_name}') –keep_intermediate_columns (
bool, default:'${common.keep_intermediate_columns}') –
- __init__(input_file_path='${standardize.output_file_path}', output_file_path='${data.proc_dir}/${data.name}.processed.csv', min_reactants=2, max_reactants=10, max_reactants_tokens=300, min_agents=0, max_agents=0, max_agents_tokens=0, min_products=1, max_products=1, max_products_tokens=200, max_absolute_formal_charge=2, fragment_bond='${common.fragment_bond}', reaction_column_name='${common.reaction_column_name}', keep_intermediate_columns='${common.keep_intermediate_columns}')
- Parameters
input_file_path (
str, default:'${standardize.output_file_path}') –output_file_path (
str, default:'${data.proc_dir}/${data.name}.processed.csv') –min_reactants (
int, default:2) –max_reactants (
int, default:10) –max_reactants_tokens (
int, default:300) –min_agents (
int, default:0) –max_agents (
int, default:0) –max_agents_tokens (
int, default:0) –min_products (
int, default:1) –max_products (
int, default:1) –max_products_tokens (
int, default:200) –max_absolute_formal_charge (
int, default:2) –fragment_bond (
FragmentBond, default:'${common.fragment_bond}') –reaction_column_name (
str, default:'${common.reaction_column_name}') –keep_intermediate_columns (
bool, default:'${common.keep_intermediate_columns}') –
- Return type
None
Methods
__init__([input_file_path, ...])- param input_file_path
Attributes
fragment_bondinput_file_pathkeep_intermediate_columnsmax_absolute_formal_chargemax_agentsmax_agents_tokensmax_productsmax_products_tokensmax_reactantsmax_reactants_tokensmin_agentsmin_productsmin_reactantsoutput_file_pathreaction_column_name