Use fingerprints to classify 50k reaction data set by Schneider et al.. This notebook is based on the notebooks and code found in the SI of Development of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and Similarity
train_df = schneider_df[schneider_df.split=='train']
test_df = schneider_df[schneider_df.split=='test']
print(len(train_df), len(test_df))
10000 40000

RXNFP ft 10k

This reaction fingerprint was finetuned for reaction classification on the 10k train reactions from the Schneider 50k data set.

Y-scrambling - sanity check

lr_cls =  LogisticRegression(max_iter=1000)

scrambled_train_rxn_ids = [rxnclass2id[c] for c in train_df.rxn_class]
test_rxn_class_ids = [rxnclass2id[c] for c in test_df.rxn_class]


random.seed(42)
random.shuffle(scrambled_train_rxn_ids)
lr_classifier_scrambled = lr_cls.fit(train_df.ft_10k.values.tolist(), scrambled_train_rxn_ids)
confusion_matrix_scrambled = evaluate_model(lr_classifier_scrambled, test_df.ft_10k.values.tolist(), test_rxn_class_ids, all_classes, rxnclass2name)
fig = labelled_cmat(confusion_matrix_scrambled, 
                    all_classes,
                    figsize=(16,12), label_extras=rxnclass2name)
 & recall & prec & F-score  &   reaction class &  \\ 
 0 & 0.0075 & 0.0065 &0.0070 & Aldehyde reductive amination &1.2.1 \\
 1 & 0.0013 & 0.0015 &0.0014 & Eschweiler-Clarke methylation &1.2.4 \\
 2 & 0.0150 & 0.0188 &0.0167 & Ketone reductive amination &1.2.5 \\
 3 & 0.0450 & 0.0516 &0.0481 & Bromo N-arylation &1.3.6 \\
 4 & 0.0000 & 0.0000 &0.0000 & Chloro N-arylation &1.3.7 \\
 5 & 0.0300 & 0.0305 &0.0303 & Fluoro N-arylation &1.3.8 \\
 6 & 0.0013 & 0.0016 &0.0014 & Bromo N-alkylation &1.6.2 \\
 7 & 0.0187 & 0.0208 &0.0197 & Chloro N-alkylation &1.6.4 \\
 8 & 0.0013 & 0.0012 &0.0012 & Iodo N-alkylation &1.6.8 \\
 9 & 0.0000 & 0.0000 &0.0000 & Hydroxy to methoxy &1.7.4 \\
10 & 0.0000 & 0.0000 &0.0000 & Methyl esterification &1.7.6 \\
11 & 0.0075 & 0.0074 &0.0074 & Mitsunobu aryl ether synthesis &1.7.7 \\
12 & 0.0187 & 0.0169 &0.0178 & Williamson ether synthesis &1.7.9 \\
13 & 0.0037 & 0.0035 &0.0036 & Thioether synthesis &1.8.5 \\
14 & 0.0088 & 0.0074 &0.0080 & Bromination &10.1.1 \\
15 & 0.0075 & 0.0085 &0.0080 & Chlorination &10.1.2 \\
16 & 0.0663 & 0.1216 &0.0858 & Wohl-Ziegler bromination &10.1.5 \\
17 & 0.0238 & 0.0194 &0.0214 & Nitration &10.2.1 \\
18 & 0.0138 & 0.0160 &0.0148 & Methylation &10.4.2 \\
19 & 0.0088 & 0.0097 &0.0092 & Amide Schotten-Baumann &2.1.1 \\
20 & 0.0000 & 0.0000 &0.0000 & Carboxylic acid + amine reaction &2.1.2 \\
21 & 0.0025 & 0.0024 &0.0025 & N-acetylation &2.1.7 \\
22 & 0.0037 & 0.0040 &0.0039 & Sulfonamide Schotten-Baumann &2.2.3 \\
23 & 0.0013 & 0.0015 &0.0014 & Isocyanate + amine reaction &2.3.1 \\
24 & 0.0000 & 0.0000 &0.0000 & Ester Schotten-Baumann &2.6.1 \\
25 & 0.0488 & 0.0500 &0.0494 & Fischer-Speier esterification &2.6.3 \\
26 & 0.0500 & 0.0619 &0.0553 & Sulfonic ester Schotten-Baumann &2.7.2 \\
27 & 0.0000 & 0.0000 &0.0000 & Bromo Suzuki coupling &3.1.1 \\
28 & 0.0262 & 0.0333 &0.0294 & Bromo Suzuki-type coupling &3.1.5 \\
29 & 0.0112 & 0.0124 &0.0118 & Chloro Suzuki-type coupling &3.1.6 \\
30 & 0.0163 & 0.0174 &0.0168 & Sonogashira coupling &3.3.1 \\
31 & 0.0050 & 0.0040 &0.0045 & Stille reaction &3.4.1 \\
32 & 0.0063 & 0.0064 &0.0063 & N-Boc protection &5.1.1 \\
33 & 0.0000 & 0.0000 &0.0000 & N-Boc deprotection &6.1.1 \\
34 & 0.0400 & 0.0376 &0.0388 & N-Cbz deprotection &6.1.3 \\
35 & 0.0862 & 0.0870 &0.0866 & N-Bn deprotection &6.1.5 \\
36 & 0.0063 & 0.0078 &0.0069 & CO2H-Et deprotection &6.2.1 \\
37 & 0.0475 & 0.0532 &0.0502 & CO2H-Me deprotection &6.2.2 \\
38 & 0.0037 & 0.0046 &0.0041 & CO2H-tBu deprotection &6.2.3 \\
39 & 0.0000 & 0.0000 &0.0000 & O-Bn deprotection &6.3.1 \\
40 & 0.0063 & 0.0072 &0.0067 & Methoxy to hydroxy &6.3.7 \\
41 & 0.0112 & 0.0100 &0.0106 & Nitro to amino &7.1.1 \\
42 & 0.0013 & 0.0010 &0.0011 & Amide to amine reduction &7.2.1 \\
43 & 0.0000 & 0.0000 &0.0000 & Nitrile reduction &7.3.1 \\
44 & 0.0000 & 0.0000 &0.0000 & Carboxylic acid to alcohol reduction &7.9.2 \\
45 & 0.0000 & 0.0000 &0.0000 & Alcohol to aldehyde oxidation &8.1.4 \\
46 & 0.0063 & 0.0066 &0.0064 & Alcohol to ketone oxidation &8.1.5 \\
47 & 0.0462 & 0.0313 &0.0374 & Sulfanyl to sulfinyl &8.2.1 \\
48 & 0.0050 & 0.0042 &0.0046 & Hydroxy to chloro &9.1.6 \\
49 & 0.0000 & 0.0000 &0.0000 & Carboxylic acid to acid chloride &9.3.1 \\
 &  0.01 & 0.02 & 0.01 & Average & \\