rxnmapper package

Submodules

rxnmapper.attention module

Attention handling.

class rxnmapper.attention.AttentionScorer(rxn_smiles: str, tokens: List[str], attentions: numpy.ndarray, special_tokens: List[str] = ['[CLS]', '[SEP]'], attention_multiplier: float = 90.0, mask_mapped_product_atoms: bool = True, mask_mapped_reactant_atoms: bool = True)

Bases: object

property adjacency_matrix_products
property adjacency_matrix_reactants
property adjacent_atom_attentions
property atom_attentions
atom_num(token_ind) → int

Get the atom number corresponding to a token

property atom_type_masked_attentions
property combined_attentions

Combine pxr and rxp

property combined_attentions_filt

Combine pxr_filt and rxp_filt

property combined_attentions_filt_atoms

Combine pxr_filt and rxp_filt

property combined_attentions_filt_atoms_same_type

Combine pxr_filt and rxp_filt

generate_attention_guided_pxr_atom_mapping()

Generate attention guided product to reactant atom mapping. :param zero_set_p: If True, the attention from product atoms that are already

assigned to an atom in the reactants is set to 0.

Parameters

zero_set_r – If True, the attention to reactant atoms that are already assigned to an atom in the reactants is set to 0.

get_atom_type_mask()
get_neighboring_atoms(atom_num)

Get the atom_nums neighboring the desired atom

get_neighboring_attentions(atom_num) → numpy.ndarray

Get a vector of shape (n_atoms,) representing the neighboring attentions to an atom number.

Non-zero attentions are the attentions of neighboring atoms

get_precursors_atom_types()
get_product_atom_types()
is_atom(token_ind) → int

Check whether token is an atom

property pnums

Get atom numbers for just the product tokens

property pnums_atoms

Get atom numbers for just the product tokens, without the SEP

property pnums_filt

Get atom numbers for just the product tokens, without the SEP

property ptokens
property ptokens_filt

Product tokens without special tokens

property pxr
property pxr_filt

PXR without the special tokens

property pxr_filt_atoms

PXR only the atoms

property rnums

Get atom numbers for just the reactant tokens

property rnums_atoms

Get atom numbers for just the product tokens, without the SEP

property rnums_filt

Get atom numbers for just the reactant tokens, without the CLS

property rtokens
property rtokens_filt

Reactant tokens without special tokens

property rxp
property rxp_filt

RXP without the special tokens

property rxp_filt_atoms

RXP only the atoms

token_ind(atom_num) → int

Get token index from an atom number

Note that this is not a lossless mapping. -1 is mapped to the (length of the original tokens) - 1

rxnmapper.core module

Core RXN Attention Mapper module.

class rxnmapper.core.RXNMapper(config: Dict = {}, logger: Optional[logging.Logger] = None)

Bases: object

Mapping product and reactant atoms using attention weights

convert_batch_to_attns(rxn_smiles_list: List[str])
get_attention_guided_atom_maps(rxns: List[str], zero_set_p: bool = True, zero_set_r: bool = True, detailed_output: bool = False)

rxnmapper.smiles_utils module

Smiles uitls.

exception rxnmapper.smiles_utils.NotCanonicalizableSmilesException

Bases: ValueError

rxnmapper.smiles_utils.canonicalize_and_atom_map(smi, with_replacements=False, return_equivalent_atoms=False)

Remove atom mapping, canonicalize and return mapping numbers in order of canonicalization.

rxnmapper.smiles_utils.canonicalize_smi(smi, remove_atom_mapping=False)
rxnmapper.smiles_utils.generate_atom_mapped_reaction_atoms(rxn, product_atom_maps, expected_atom_maps=None)
rxnmapper.smiles_utils.get_adjacency_matrix(smiles)

Compute adjacency matrix between atoms. Only works for single molecules atm and not for rxns

Parameters

{[type]} -- [description] (smiles) –

rxnmapper.smiles_utils.get_atom_tokens_mask(smiles, special_tokens: List[str] = [])

Return a mask for a smiles, where atom tokens are converted to 1 and other tokens to 0.

e.g. c1ccncc1 would give [1, 0, 1, 1, 1, 1, 1, 0]

Parameters
  • smiles – Smiles string of reaction

  • special_tokens – Any special tokens to explicitly not call an atom. E.g. “[CLS]” or “[SEP]”

rxnmapper.smiles_utils.get_atom_types(smiles)

Generate a list of the atom types in

rxnmapper.smiles_utils.get_atom_types_smiles(smiles)
rxnmapper.smiles_utils.get_graph_distance_matrix(smiles)

Compute graph distance matrix between atoms. Only works for single molecules atm and not for rxns

Parameters

{[type]} -- [description] (smiles) –

rxnmapper.smiles_utils.get_mask_for_tokens(tokens, special_tokens: List[str] = [])

Return a mask for a tokenized smiles, where atom tokens are converted to 1 and other tokens to 0.

e.g. c1ccncc1 would give [1, 0, 1, 1, 1, 1, 1, 0]

Parameters
  • smiles – Smiles string of reaction

  • special_tokens – Any special tokens to explicitly not call an atom. E.g. “[CLS]” or “[SEP]”

rxnmapper.smiles_utils.is_atom(token: str, special_tokens: List[str] = [])

Determine whether a token is an atom.

Parameters
  • token – Token fed into the transformer model

  • special_tokens – List of tokens to consider as non-atoms (often introduced by tokenizer)

Returns

True if atom, False if not

Return type

bool

rxnmapper.smiles_utils.is_mol_end(a: str, b: str)
rxnmapper.smiles_utils.number_tokens(tokens: List[str], special_tokens=['[CLS]', '[SEP]'])

Map list of tokens to a list of numbered atoms

E.g., [‘[CLS]’, ‘C’, ‘.’, ‘C’, ‘C’, ‘C’, ‘C’, ‘C’, ‘C’,] => [-1, 0, -1, 1, 2, 3, 4, 5, 6]

rxnmapper.smiles_utils.process_reaction(rxn, fragments='', fragment_bond='~')
rxnmapper.smiles_utils.process_reaction_with_product_maps_atoms(rxn, with_replacements=False, skip_if_not_in_precursors=False)
rxnmapper.smiles_utils.split_into_mols(tokens: List[str])

Split a reaction smiles into molecules

rxnmapper.smiles_utils.tok_mask(tokens)
rxnmapper.smiles_utils.tokenize(smiles)

Tokenize a SMILES molecule or reaction

rxnmapper.smiles_utils.tokens_to_adjacency(tokens: List[str])
rxnmapper.smiles_utils.tokens_to_smiles(tokens: List[str], special_tokens: List[str])

Combine tokens into valid SMILES string, filtering out special tokens

rxnmapper.tokenization_smiles module

class rxnmapper.tokenization_smiles.BasicSmilesTokenizer(regex_pattern='(\[[^\]]+]|Br?|Cl?|N|O|S|P|F|I|b|c|n|o|s|p|\(|\)|\.|=|#|-|\+|\\|\/|:|~|@|\?|>>?|\*|\$|\%[0-9]{2}|[0-9])')

Bases: object

Run basic SMILES tokenization

tokenize(text)

Basic Tokenization of a SMILES.

class rxnmapper.tokenization_smiles.SmilesTokenizer(vocab_file, **kwargs)

Bases: transformers.tokenization_bert.BertTokenizer

Constructs a SmilesTokenizer. Mostly copied from https://github.com/huggingface/transformers

Parameters

vocab_file – Path to a SMILES character per line vocabulary file

add_padding_tokens(token_ids, length, right=True)

Adds padding tokens to return a sequence of length max_length. By default padding tokens are added to the right of the sequence.

add_special_tokens_ids_sequence_pair(token_ids_0, token_ids_1)

Adds special tokens to a sequence pair for sequence classification tasks. A BERT sequence pair has the following format: [CLS] A [SEP] B [SEP]

add_special_tokens_ids_single_sequence(token_ids)

Adds special tokens to the a sequence for sequence classification tasks. A BERT sequence has the following format: [CLS] X [SEP]

add_special_tokens_sequence_pair(token_0, token_1)

Adds special tokens to a sequence pair for sequence classification tasks. A BERT sequence pair has the following format: [CLS] A [SEP] B [SEP]

add_special_tokens_single_sequence(tokens)

Adds special tokens to the a sequence for sequence classification tasks. A BERT sequence has the following format: [CLS] X [SEP]

convert_tokens_to_string(tokens)

Converts a sequence of tokens (string) in a single string.

save_vocabulary(vocab_path)

Save the tokenizer vocabulary to a file.

property vocab_list
property vocab_size

Size of the base vocabulary (without the added tokens)

rxnmapper.tokenization_smiles.load_vocab(vocab_file)

Loads a vocabulary file into a dictionary.

Module contents

rxnmapper initialization.