rxn.reaction_preprocessing.stable_data_splitter.StableDataSplitter

class rxn.reaction_preprocessing.stable_data_splitter.StableDataSplitter(reaction_column_name, index_column, split_ratio=0.05, hash_seed=0, shuffle_seed=42)[source]

Bases: object

Parameters
  • reaction_column_name (str) –

  • index_column (str) –

  • split_ratio (float, default: 0.05) –

  • hash_seed (int, default: 0) –

  • shuffle_seed (int, default: 42) –

__init__(reaction_column_name, index_column, split_ratio=0.05, hash_seed=0, shuffle_seed=42)[source]
Parameters
  • reaction_column_name (str) – Name of the reaction column for the data file.

  • index_column (str) – The name of the column used to generate the hash which ensures stable splitting. “products” and “precursors” are also allowed even if they do not exist as columns.

  • split_ratio (float, default: 0.05) – The split ratio. Defaults to 0.05.

  • hash_seed (int, default: 0) – seed to use for hashing. The default of 0 corresponds to the default value in the xxhash implementation.

  • shuffle_seed (int, default: 42) – Seed for shuffling the train split.

Methods

__init__(reaction_column_name, index_column)

type reaction_column_name

str

split_file(input_csv, train_csv, valid_csv, ...)

Split an input file into train, validation, and test CSVs.

split_file(input_csv, train_csv, valid_csv, test_csv)[source]

Split an input file into train, validation, and test CSVs.

Parameters
  • input_csv (Union[str, PathLike]) –

  • train_csv (Union[str, PathLike]) –

  • valid_csv (Union[str, PathLike]) –

  • test_csv (Union[str, PathLike]) –

Return type

None