rxn.utilities.csv.streaming_csv_editor.StreamingCsvEditor

class rxn.utilities.csv.streaming_csv_editor.StreamingCsvEditor(columns_in, columns_out, transformation, line_terminator='\\n')[source]

Bases: object

Edit the content of a CSV with a specified transformation, line-by-line.

This class avoids loading the whole file into memory as would be done with a pandas DataFrame.

Parameters
  • columns_in (List[str]) –

  • columns_out (List[str]) –

  • transformation (Callable[..., Any]) –

  • line_terminator (str, default: '\\n') –

__init__(columns_in, columns_out, transformation, line_terminator='\\n')[source]
Parameters
  • columns_in (List[str]) – names for the columns acting as input for the transformation.

  • columns_out (List[str]) – names for the columns where to write the result of the transformation.

  • transformation (Callable[..., Any]) –

    function to call on the values from the input columns, with the results being written to the output columns. The function should be annotated, and the following are admissible:

    • For the parameters:
      • one or several strings

      • a list of strings (with one or more elements)

      • a tuple of strings (with one or more elements)

    • For the return type:
      • one string

      • a list of strings (with one or more elements)

      • a tuple of strings (with one or more elements)

  • line_terminator (str, default: '\\n') – line terminator to use for writing the CSV.

Methods

__init__(columns_in, columns_out, transformation)

type columns_in

List[str]

process(csv_iterator)

Process and edit a CSV file.

process_paths(path_in, path_out[, verbose])

Process and edit a CSV file.

process(csv_iterator)[source]

Process and edit a CSV file.

Parameters

csv_iterator (CsvIterator) – Input CSV iterator.

Return type

CsvIterator

Returns

an edited instance of a CsvIterator.

process_paths(path_in, path_out, verbose=False)[source]

Process and edit a CSV file.

Parameters
  • path_in (Union[str, PathLike]) – path to the existing CSV.

  • path_out (Union[str, PathLike]) – path to the edited CSV (to be saved).

  • verbose (bool, default: False) – whether to write the progress with tqdm.

Return type

None