rxn.reaction_preprocessing.config.SplitConfig

class rxn.reaction_preprocessing.config.SplitConfig(input_file_path='${preprocess.output_file_path}', output_directory='${data.proc_dir}', split_ratio=0.05, reaction_column_name='${common.reaction_column_name}', index_column='products', hash_seed=42, shuffle_seed=42)[source]

Bases: object

Configuration for the split transformation step.

Fields:

input_file_path: The input file path. output_directory: The directory containing the files after splitting. split_ratio: The split ratio between training, and test and validation sets. reaction_column_name: Name of the reaction column for the data file. index_column: The name of the column used to generate the hash which ensures

stable splitting. “products” and “precursors” are also allowed even if they do not exist as columns.

hash_seed: Seed for the hashing function used for splitting. shuffle_seed: Seed for shuffling the train split.

Parameters
  • input_file_path (str, default: '${preprocess.output_file_path}') –

  • output_directory (str, default: '${data.proc_dir}') –

  • split_ratio (float, default: 0.05) –

  • reaction_column_name (str, default: '${common.reaction_column_name}') –

  • index_column (str, default: 'products') –

  • hash_seed (int, default: 42) –

  • shuffle_seed (int, default: 42) –

__init__(input_file_path='${preprocess.output_file_path}', output_directory='${data.proc_dir}', split_ratio=0.05, reaction_column_name='${common.reaction_column_name}', index_column='products', hash_seed=42, shuffle_seed=42)
Parameters
  • input_file_path (str, default: '${preprocess.output_file_path}') –

  • output_directory (str, default: '${data.proc_dir}') –

  • split_ratio (float, default: 0.05) –

  • reaction_column_name (str, default: '${common.reaction_column_name}') –

  • index_column (str, default: 'products') –

  • hash_seed (int, default: 42) –

  • shuffle_seed (int, default: 42) –

Return type

None

Methods

__init__([input_file_path, ...])

param input_file_path

Attributes

hash_seed

index_column

input_file_path

output_directory

reaction_column_name

shuffle_seed

split_ratio