rxnmapper package¶

Submodules¶

rxnmapper.attention module¶

Attention handling.

class rxnmapper.attention.AttentionScorer(rxn_smiles: str, tokens: List[str], attentions: numpy.ndarray, special_tokens: List[str] = ['[CLS]', '[SEP]'], attention_multiplier: float = 90.0, mask_mapped_product_atoms: bool = True, mask_mapped_reactant_atoms: bool = True)¶

Bases: object

property adjacency_matrix_products¶

property adjacency_matrix_reactants¶

property adjacent_atom_attentions¶

property atom_attentions¶

atom_num(token_ind) → int¶: Get the atom number corresponding to a token

property atom_type_masked_attentions¶

property combined_attentions¶: Combine pxr and rxp

property combined_attentions_filt¶: Combine pxr_filt and rxp_filt

property combined_attentions_filt_atoms¶: Combine pxr_filt and rxp_filt

property combined_attentions_filt_atoms_same_type¶: Combine pxr_filt and rxp_filt

generate_attention_guided_pxr_atom_mapping()¶

Generate attention guided product to reactant atom mapping. :param zero_set_p: If True, the attention from product atoms that are already

assigned to an atom in the reactants is set to 0.

Parameters: zero_set_r – If True, the attention to reactant atoms that are already assigned to an atom in the reactants is set to 0.

get_atom_type_mask()¶

get_neighboring_atoms(atom_num)¶: Get the atom_nums neighboring the desired atom

get_neighboring_attentions(atom_num) → numpy.ndarray¶

Get a vector of shape (n_atoms,) representing the neighboring attentions to an atom number.

Non-zero attentions are the attentions of neighboring atoms

get_precursors_atom_types()¶

get_product_atom_types()¶

is_atom(token_ind) → int¶: Check whether token is an atom

property pnums¶: Get atom numbers for just the product tokens

property pnums_atoms¶: Get atom numbers for just the product tokens, without the SEP

property pnums_filt¶: Get atom numbers for just the product tokens, without the SEP

property ptokens¶

property ptokens_filt¶: Product tokens without special tokens

property pxr¶

property pxr_filt¶: PXR without the special tokens

property pxr_filt_atoms¶: PXR only the atoms

property rnums¶: Get atom numbers for just the reactant tokens

property rnums_atoms¶: Get atom numbers for just the product tokens, without the SEP

property rnums_filt¶: Get atom numbers for just the reactant tokens, without the CLS

property rtokens¶

property rtokens_filt¶: Reactant tokens without special tokens

property rxp¶

property rxp_filt¶: RXP without the special tokens

property rxp_filt_atoms¶: RXP only the atoms

token_ind(atom_num) → int¶

Get token index from an atom number

Note that this is not a lossless mapping. -1 is mapped to the (length of the original tokens) - 1

rxnmapper.core module¶

Core RXN Attention Mapper module.

class rxnmapper.core.RXNMapper(config: Dict = {}, logger: Optional[logging.Logger] = None)¶

Bases: object

Mapping product and reactant atoms using attention weights

convert_batch_to_attns(rxn_smiles_list: List[str])¶

get_attention_guided_atom_maps(rxns: List[str], zero_set_p: bool = True, zero_set_r: bool = True, detailed_output: bool = False)¶

rxnmapper.smiles_utils module¶

Smiles uitls.

exception rxnmapper.smiles_utils.NotCanonicalizableSmilesException¶: Bases: ValueError

rxnmapper.smiles_utils.canonicalize_and_atom_map(smi, with_replacements=False, return_equivalent_atoms=False)¶: Remove atom mapping, canonicalize and return mapping numbers in order of canonicalization.

rxnmapper.smiles_utils.canonicalize_smi(smi, remove_atom_mapping=False)¶

rxnmapper.smiles_utils.generate_atom_mapped_reaction_atoms(rxn, product_atom_maps, expected_atom_maps=None)¶

rxnmapper.smiles_utils.get_adjacency_matrix(smiles)¶

Compute adjacency matrix between atoms. Only works for single molecules atm and not for rxns

Parameters: {[type]} -- [description] (smiles) –

rxnmapper.smiles_utils.get_atom_tokens_mask(smiles, special_tokens: List[str] = [])¶

Return a mask for a smiles, where atom tokens are converted to 1 and other tokens to 0.

e.g. c1ccncc1 would give [1, 0, 1, 1, 1, 1, 1, 0]

Parameters

smiles – Smiles string of reaction
special_tokens – Any special tokens to explicitly not call an atom. E.g. “[CLS]” or “[SEP]”

rxnmapper.smiles_utils.get_atom_types(smiles)¶: Generate a list of the atom types in

rxnmapper.smiles_utils.get_atom_types_smiles(smiles)¶

rxnmapper.smiles_utils.get_graph_distance_matrix(smiles)¶

Compute graph distance matrix between atoms. Only works for single molecules atm and not for rxns

Parameters: {[type]} -- [description] (smiles) –

rxnmapper.smiles_utils.get_mask_for_tokens(tokens, special_tokens: List[str] = [])¶

Return a mask for a tokenized smiles, where atom tokens are converted to 1 and other tokens to 0.

e.g. c1ccncc1 would give [1, 0, 1, 1, 1, 1, 1, 0]

Parameters

smiles – Smiles string of reaction
special_tokens – Any special tokens to explicitly not call an atom. E.g. “[CLS]” or “[SEP]”

rxnmapper.smiles_utils.is_atom(token: str, special_tokens: List[str] = [])¶

Determine whether a token is an atom.

Parameters

token – Token fed into the transformer model
special_tokens – List of tokens to consider as non-atoms (often introduced by tokenizer)

Returns

True if atom, False if not

Return type

bool

rxnmapper.smiles_utils.is_mol_end(a: str, b: str)¶

rxnmapper.smiles_utils.number_tokens(tokens: List[str], special_tokens=['[CLS]', '[SEP]'])¶

Map list of tokens to a list of numbered atoms

E.g., [‘[CLS]’, ‘C’, ‘.’, ‘C’, ‘C’, ‘C’, ‘C’, ‘C’, ‘C’,] => [-1, 0, -1, 1, 2, 3, 4, 5, 6]

rxnmapper.smiles_utils.process_reaction(rxn, fragments='', fragment_bond='~')¶

rxnmapper.smiles_utils.process_reaction_with_product_maps_atoms(rxn, with_replacements=False, skip_if_not_in_precursors=False)¶

rxnmapper.smiles_utils.split_into_mols(tokens: List[str])¶: Split a reaction smiles into molecules

rxnmapper.smiles_utils.tok_mask(tokens)¶

rxnmapper.smiles_utils.tokenize(smiles)¶: Tokenize a SMILES molecule or reaction

rxnmapper.smiles_utils.tokens_to_adjacency(tokens: List[str])¶

rxnmapper.smiles_utils.tokens_to_smiles(tokens: List[str], special_tokens: List[str])¶: Combine tokens into valid SMILES string, filtering out special tokens

rxnmapper.tokenization_smiles module¶

class rxnmapper.tokenization_smiles.BasicSmilesTokenizer(regex_pattern='(\[[^\]]+]|Br?|Cl?|N|O|S|P|F|I|b|c|n|o|s|p|$|$|\.|=|#|-|\+|\\|\/|:|~|@|\?|>>?|\*|\$|\%[0-9]{2}|[0-9])')¶

Bases: object

Run basic SMILES tokenization

tokenize(text)¶: Basic Tokenization of a SMILES.

class rxnmapper.tokenization_smiles.SmilesTokenizer(vocab_file, **kwargs)¶

Bases: transformers.tokenization_bert.BertTokenizer

Constructs a SmilesTokenizer. Mostly copied from https://github.com/huggingface/transformers

Parameters: vocab_file – Path to a SMILES character per line vocabulary file

add_padding_tokens(token_ids, length, right=True)¶: Adds padding tokens to return a sequence of length max_length. By default padding tokens are added to the right of the sequence.

add_special_tokens_ids_sequence_pair(token_ids_0, token_ids_1)¶: Adds special tokens to a sequence pair for sequence classification tasks. A BERT sequence pair has the following format: [CLS] A [SEP] B [SEP]

add_special_tokens_ids_single_sequence(token_ids)¶: Adds special tokens to the a sequence for sequence classification tasks. A BERT sequence has the following format: [CLS] X [SEP]

add_special_tokens_sequence_pair(token_0, token_1)¶: Adds special tokens to a sequence pair for sequence classification tasks. A BERT sequence pair has the following format: [CLS] A [SEP] B [SEP]

add_special_tokens_single_sequence(tokens)¶: Adds special tokens to the a sequence for sequence classification tasks. A BERT sequence has the following format: [CLS] X [SEP]

convert_tokens_to_string(tokens)¶: Converts a sequence of tokens (string) in a single string.

save_vocabulary(vocab_path)¶: Save the tokenizer vocabulary to a file.

property vocab_list¶

property vocab_size¶: Size of the base vocabulary (without the added tokens)

rxnmapper.tokenization_smiles.load_vocab(vocab_file)¶: Loads a vocabulary file into a dictionary.

Module contents¶

rxnmapper initialization.