RXN OpenNMT utilities
  • README
    • System Requirements
    • Installation guide
    • Package highlights
      • Translate with OpenNMT from Python code
      • Stripping models
      • Extending the model vocabulary
  • Python API
RXN OpenNMT utilities
  • »
  • RXN package for OpenNMT-related utilities
  • View page source

RXN package for OpenNMT-related utilities

Actions tests

This repository contains OpenNMT-related utilities used in the RXN universe. For general utilities not related to OpenNMT, see our other repository rxn-utilities.

Links:

  • GitHub repository

  • Documentation

  • PyPI package

System Requirements

This package is supported on all operating systems. It has been tested on the following systems:

  • macOS: Big Sur (11.1)

  • Linux: Ubuntu 18.04.4

A Python version of 3.6, 3.7, or 3.8 is recommended. Python versions 3.9 and above are not expected to work due to compatibility with the selected version of OpenNMT.

Installation guide

The package can be installed from Pypi:

pip install rxn-onmt-utils

For local development, the package can be installed with:

pip install -e ".[dev]"

Package highlights

Translate with OpenNMT from Python code

By importing the following,

from rxn.onmt_utils import Translator, translate

you can do OpenNMT translations directly in Python. The translate function acts on input/output files, while the Translator class gives you flexibility for getting translation results on strings directly.

Stripping models

The script rxn-strip-opennmt-model installed the package allows you to strip ~2/3 of the size of model checkpoints by removing the state of the optimizer. You can do this safely if you don’t need to continue training on these checkpoints.

Extending the model vocabulary

If you finetune a model on a dataset with additional tokens compared to the base model, you will need to extend the model weights. The script rxn-extend-model-with-vocab does that for you.

Previous Next

© Copyright IBM RXN team 2023.

Built with Sphinx using a theme provided by Read the Docs.