rxnmapper package¶
Submodules¶
rxnmapper.attention module¶
Attention handling.
-
class
rxnmapper.attention.AttentionScorer(rxn_smiles: str, tokens: List[str], attentions: numpy.ndarray, special_tokens: List[str] = ['[CLS]', '[SEP]'], attention_multiplier: float = 90.0, mask_mapped_product_atoms: bool = True, mask_mapped_reactant_atoms: bool = True)¶ Bases:
object-
property
adjacency_matrix_products¶
-
property
adjacency_matrix_reactants¶
-
property
adjacent_atom_attentions¶
-
property
atom_attentions¶
-
atom_num(token_ind) → int¶ Get the atom number corresponding to a token
-
property
atom_type_masked_attentions¶
-
property
combined_attentions¶ Combine pxr and rxp
-
property
combined_attentions_filt¶ Combine pxr_filt and rxp_filt
-
property
combined_attentions_filt_atoms¶ Combine pxr_filt and rxp_filt
-
property
combined_attentions_filt_atoms_same_type¶ Combine pxr_filt and rxp_filt
-
generate_attention_guided_pxr_atom_mapping()¶ Generate attention guided product to reactant atom mapping. :param zero_set_p: If True, the attention from product atoms that are already
assigned to an atom in the reactants is set to 0.
- Parameters
zero_set_r – If True, the attention to reactant atoms that are already assigned to an atom in the reactants is set to 0.
-
get_atom_type_mask()¶
-
get_neighboring_atoms(atom_num)¶ Get the atom_nums neighboring the desired atom
-
get_neighboring_attentions(atom_num) → numpy.ndarray¶ Get a vector of shape (n_atoms,) representing the neighboring attentions to an atom number.
Non-zero attentions are the attentions of neighboring atoms
-
get_precursors_atom_types()¶
-
get_product_atom_types()¶
-
is_atom(token_ind) → int¶ Check whether token is an atom
-
property
pnums¶ Get atom numbers for just the product tokens
-
property
pnums_atoms¶ Get atom numbers for just the product tokens, without the SEP
-
property
pnums_filt¶ Get atom numbers for just the product tokens, without the SEP
-
property
ptokens¶
-
property
ptokens_filt¶ Product tokens without special tokens
-
property
pxr¶
-
property
pxr_filt¶ PXR without the special tokens
-
property
pxr_filt_atoms¶ PXR only the atoms
-
property
rnums¶ Get atom numbers for just the reactant tokens
-
property
rnums_atoms¶ Get atom numbers for just the product tokens, without the SEP
-
property
rnums_filt¶ Get atom numbers for just the reactant tokens, without the CLS
-
property
rtokens¶
-
property
rtokens_filt¶ Reactant tokens without special tokens
-
property
rxp¶
-
property
rxp_filt¶ RXP without the special tokens
-
property
rxp_filt_atoms¶ RXP only the atoms
-
token_ind(atom_num) → int¶ Get token index from an atom number
Note that this is not a lossless mapping. -1 is mapped to the (length of the original tokens) - 1
-
property
rxnmapper.core module¶
Core RXN Attention Mapper module.
-
class
rxnmapper.core.RXNMapper(config: Dict = {}, logger: Optional[logging.Logger] = None)¶ Bases:
objectMapping product and reactant atoms using attention weights
-
convert_batch_to_attns(rxn_smiles_list: List[str])¶
-
get_attention_guided_atom_maps(rxns: List[str], zero_set_p: bool = True, zero_set_r: bool = True, detailed_output: bool = False)¶
-
rxnmapper.smiles_utils module¶
Smiles uitls.
-
exception
rxnmapper.smiles_utils.NotCanonicalizableSmilesException¶ Bases:
ValueError
-
rxnmapper.smiles_utils.canonicalize_and_atom_map(smi, with_replacements=False, return_equivalent_atoms=False)¶ Remove atom mapping, canonicalize and return mapping numbers in order of canonicalization.
-
rxnmapper.smiles_utils.canonicalize_smi(smi, remove_atom_mapping=False)¶
-
rxnmapper.smiles_utils.generate_atom_mapped_reaction_atoms(rxn, product_atom_maps, expected_atom_maps=None)¶
-
rxnmapper.smiles_utils.get_adjacency_matrix(smiles)¶ Compute adjacency matrix between atoms. Only works for single molecules atm and not for rxns
- Parameters
{[type]} -- [description] (smiles) –
-
rxnmapper.smiles_utils.get_atom_tokens_mask(smiles, special_tokens: List[str] = [])¶ Return a mask for a smiles, where atom tokens are converted to 1 and other tokens to 0.
e.g. c1ccncc1 would give [1, 0, 1, 1, 1, 1, 1, 0]
- Parameters
smiles – Smiles string of reaction
special_tokens – Any special tokens to explicitly not call an atom. E.g. “[CLS]” or “[SEP]”
-
rxnmapper.smiles_utils.get_atom_types(smiles)¶ Generate a list of the atom types in
-
rxnmapper.smiles_utils.get_atom_types_smiles(smiles)¶
-
rxnmapper.smiles_utils.get_graph_distance_matrix(smiles)¶ Compute graph distance matrix between atoms. Only works for single molecules atm and not for rxns
- Parameters
{[type]} -- [description] (smiles) –
-
rxnmapper.smiles_utils.get_mask_for_tokens(tokens, special_tokens: List[str] = [])¶ Return a mask for a tokenized smiles, where atom tokens are converted to 1 and other tokens to 0.
e.g. c1ccncc1 would give [1, 0, 1, 1, 1, 1, 1, 0]
- Parameters
smiles – Smiles string of reaction
special_tokens – Any special tokens to explicitly not call an atom. E.g. “[CLS]” or “[SEP]”
-
rxnmapper.smiles_utils.is_atom(token: str, special_tokens: List[str] = [])¶ Determine whether a token is an atom.
- Parameters
token – Token fed into the transformer model
special_tokens – List of tokens to consider as non-atoms (often introduced by tokenizer)
- Returns
True if atom, False if not
- Return type
bool
-
rxnmapper.smiles_utils.is_mol_end(a: str, b: str)¶
-
rxnmapper.smiles_utils.number_tokens(tokens: List[str], special_tokens=['[CLS]', '[SEP]'])¶ Map list of tokens to a list of numbered atoms
E.g., [‘[CLS]’, ‘C’, ‘.’, ‘C’, ‘C’, ‘C’, ‘C’, ‘C’, ‘C’,] => [-1, 0, -1, 1, 2, 3, 4, 5, 6]
-
rxnmapper.smiles_utils.process_reaction(rxn, fragments='', fragment_bond='~')¶
-
rxnmapper.smiles_utils.process_reaction_with_product_maps_atoms(rxn, with_replacements=False, skip_if_not_in_precursors=False)¶
-
rxnmapper.smiles_utils.split_into_mols(tokens: List[str])¶ Split a reaction smiles into molecules
-
rxnmapper.smiles_utils.tok_mask(tokens)¶
-
rxnmapper.smiles_utils.tokenize(smiles)¶ Tokenize a SMILES molecule or reaction
-
rxnmapper.smiles_utils.tokens_to_adjacency(tokens: List[str])¶
-
rxnmapper.smiles_utils.tokens_to_smiles(tokens: List[str], special_tokens: List[str])¶ Combine tokens into valid SMILES string, filtering out special tokens
rxnmapper.tokenization_smiles module¶
-
class
rxnmapper.tokenization_smiles.BasicSmilesTokenizer(regex_pattern='(\[[^\]]+]|Br?|Cl?|N|O|S|P|F|I|b|c|n|o|s|p|\(|\)|\.|=|#|-|\+|\\|\/|:|~|@|\?|>>?|\*|\$|\%[0-9]{2}|[0-9])')¶ Bases:
objectRun basic SMILES tokenization
-
tokenize(text)¶ Basic Tokenization of a SMILES.
-
-
class
rxnmapper.tokenization_smiles.SmilesTokenizer(vocab_file, **kwargs)¶ Bases:
transformers.tokenization_bert.BertTokenizerConstructs a SmilesTokenizer. Mostly copied from https://github.com/huggingface/transformers
- Parameters
vocab_file – Path to a SMILES character per line vocabulary file
-
add_padding_tokens(token_ids, length, right=True)¶ Adds padding tokens to return a sequence of length max_length. By default padding tokens are added to the right of the sequence.
-
add_special_tokens_ids_sequence_pair(token_ids_0, token_ids_1)¶ Adds special tokens to a sequence pair for sequence classification tasks. A BERT sequence pair has the following format: [CLS] A [SEP] B [SEP]
-
add_special_tokens_ids_single_sequence(token_ids)¶ Adds special tokens to the a sequence for sequence classification tasks. A BERT sequence has the following format: [CLS] X [SEP]
-
add_special_tokens_sequence_pair(token_0, token_1)¶ Adds special tokens to a sequence pair for sequence classification tasks. A BERT sequence pair has the following format: [CLS] A [SEP] B [SEP]
-
add_special_tokens_single_sequence(tokens)¶ Adds special tokens to the a sequence for sequence classification tasks. A BERT sequence has the following format: [CLS] X [SEP]
-
convert_tokens_to_string(tokens)¶ Converts a sequence of tokens (string) in a single string.
-
save_vocabulary(vocab_path)¶ Save the tokenizer vocabulary to a file.
-
property
vocab_list¶
-
property
vocab_size¶ Size of the base vocabulary (without the added tokens)
-
rxnmapper.tokenization_smiles.load_vocab(vocab_file)¶ Loads a vocabulary file into a dictionary.
Module contents¶
rxnmapper initialization.