rxn.reaction_preprocessing.stable_data_splitter.StableDataSplitter
- class rxn.reaction_preprocessing.stable_data_splitter.StableDataSplitter(reaction_column_name, index_column, split_ratio=0.05, hash_seed=0, shuffle_seed=42)[source]
Bases:
object
- Parameters
reaction_column_name (
str
) –index_column (
str
) –split_ratio (
float
, default:0.05
) –hash_seed (
int
, default:0
) –shuffle_seed (
int
, default:42
) –
- __init__(reaction_column_name, index_column, split_ratio=0.05, hash_seed=0, shuffle_seed=42)[source]
- Parameters
reaction_column_name (
str
) – Name of the reaction column for the data file.index_column (
str
) – The name of the column used to generate the hash which ensures stable splitting. “products” and “precursors” are also allowed even if they do not exist as columns.split_ratio (
float
, default:0.05
) – The split ratio. Defaults to 0.05.hash_seed (
int
, default:0
) – seed to use for hashing. The default of 0 corresponds to the default value in the xxhash implementation.shuffle_seed (
int
, default:42
) – Seed for shuffling the train split.
Methods
__init__
(reaction_column_name, index_column)- type reaction_column_name
str
split_file
(input_csv, train_csv, valid_csv, ...)Split an input file into train, validation, and test CSVs.