rxn.onmt_models.training_files.RxnPreprocessingFiles

class rxn.onmt_models.training_files.RxnPreprocessingFiles(processed_data_dir)[source]

Bases: object

Class to make it easy to get the names/paths of the files generated during data preprocessing.

This assumes that the default paths were used when calling rxn-data-pipeline.

Parameters

processed_data_dir (Union[str, PathLike]) –

__init__(processed_data_dir)[source]
Parameters

processed_data_dir (Union[str, PathLike]) –

Methods

__init__(processed_data_dir)

param processed_data_dir

augmented(data_path)

Get the path for the augmented version of a data file.

get_context_src_for_split(split)

param split

get_context_tags_for_split(split)

param split

get_context_tgt_for_split(split)

param split

get_precursors_for_split(split)

param split

get_processed_csv_for_split(split)

param split

get_products_for_split(split)

param split

get_src_file(split, model_task)

Get the source file for the given task.

get_tgt_file(split, model_task)

Get the target file for the given task.

Attributes

FILENAME_ROOT

processed_csv

rtype

Path

processed_test_csv

rtype

Path

processed_train_csv

rtype

Path

processed_validation_csv

rtype

Path

standardized_csv

rtype

Path

test_precursors

rtype

Path

test_products

rtype

Path

train_precursors

rtype

Path

train_products

rtype

Path

validation_precursors

rtype

Path

validation_products

rtype

Path

static augmented(data_path)[source]

Get the path for the augmented version of a data file.

Parameters

data_path (Path) –

Return type

Path

get_src_file(split, model_task)[source]

Get the source file for the given task.

Note: the file is tokenized for the forward and retro tasks, but not for the context task.

Parameters
  • split (str) –

  • model_task (str) –

Return type

Path

get_tgt_file(split, model_task)[source]

Get the target file for the given task.

Note: the file is tokenized for the forward and retro tasks, but not for the context task.

Parameters
  • split (str) –

  • model_task (str) –

Return type

Path