rxn.reaction_preprocessing.stable_data_splitter.StableDataSplitter

class rxn.reaction_preprocessing.stable_data_splitter.StableDataSplitter(reaction_column_name, index_column, split_ratio=0.05, hash_seed=0, shuffle_seed=42)[source]

Bases: object

Parameters

reaction_column_name (str) –
index_column (str) –
split_ratio (float, default: 0.05) –
hash_seed (int, default: 0) –
shuffle_seed (int, default: 42) –

__init__(reaction_column_name, index_column, split_ratio=0.05, hash_seed=0, shuffle_seed=42)[source]

Parameters

reaction_column_name (str) – Name of the reaction column for the data file.
index_column (str) – The name of the column used to generate the hash which ensures stable splitting. “products” and “precursors” are also allowed even if they do not exist as columns.
split_ratio (float, default: 0.05) – The split ratio. Defaults to 0.05.
hash_seed (int, default: 0) – seed to use for hashing. The default of 0 corresponds to the default value in the xxhash implementation.
shuffle_seed (int, default: 42) – Seed for shuffling the train split.

Methods

__init__(reaction_column_name, index_column)

type reaction_column_name: str

split_file(input_csv, train_csv, valid_csv, ...)

Split an input file into train, validation, and test CSVs.

split_file(input_csv, train_csv, valid_csv, test_csv)[source]

Split an input file into train, validation, and test CSVs.

Parameters

input_csv (Union[str, PathLike]) –
train_csv (Union[str, PathLike]) –
valid_csv (Union[str, PathLike]) –
test_csv (Union[str, PathLike]) –

Return type

None