rxn.chemutils.tokenization.ensure_tokenized_file

rxn.chemutils.tokenization.ensure_tokenized_file(file, postfix='.tokenized', fallback_value='')[source]

Ensure that a file is tokenized: do nothing if the file is already tokenized, create a tokenized copy otherwise.

Parameters
  • file (Union[str, PathLike]) – path to the file that we want to ensure is tokenized.

  • postfix (str, default: '.tokenized') – postfix to add to the tokenized copy (if applicable).

  • fallback_value (str, default: '') – placeholder for strings that cannot be tokenized (if applicable).

Return type

str

Returns

The path to the tokenized file (original path, or path to new file).