rxn.reaction_preprocessing.config.SplitConfig
- class rxn.reaction_preprocessing.config.SplitConfig(input_file_path='${preprocess.output_file_path}', output_directory='${data.proc_dir}', split_ratio=0.05, reaction_column_name='${common.reaction_column_name}', index_column='products', hash_seed=42, shuffle_seed=42)[source]
Bases:
object
Configuration for the split transformation step.
- Fields:
input_file_path: The input file path. output_directory: The directory containing the files after splitting. split_ratio: The split ratio between training, and test and validation sets. reaction_column_name: Name of the reaction column for the data file. index_column: The name of the column used to generate the hash which ensures
stable splitting. “products” and “precursors” are also allowed even if they do not exist as columns.
hash_seed: Seed for the hashing function used for splitting. shuffle_seed: Seed for shuffling the train split.
- Parameters
input_file_path (
str
, default:'${preprocess.output_file_path}'
) –output_directory (
str
, default:'${data.proc_dir}'
) –split_ratio (
float
, default:0.05
) –reaction_column_name (
str
, default:'${common.reaction_column_name}'
) –index_column (
str
, default:'products'
) –hash_seed (
int
, default:42
) –shuffle_seed (
int
, default:42
) –
- __init__(input_file_path='${preprocess.output_file_path}', output_directory='${data.proc_dir}', split_ratio=0.05, reaction_column_name='${common.reaction_column_name}', index_column='products', hash_seed=42, shuffle_seed=42)
- Parameters
input_file_path (
str
, default:'${preprocess.output_file_path}'
) –output_directory (
str
, default:'${data.proc_dir}'
) –split_ratio (
float
, default:0.05
) –reaction_column_name (
str
, default:'${common.reaction_column_name}'
) –index_column (
str
, default:'products'
) –hash_seed (
int
, default:42
) –shuffle_seed (
int
, default:42
) –
- Return type
None
Methods
__init__
([input_file_path, ...])- param input_file_path
Attributes
hash_seed
index_column
input_file_path
output_directory
reaction_column_name
shuffle_seed
split_ratio