Deriving general structure–activity/selectivity relationship patterns for different subfamilies of cyclin-dependent kinase inhibitors using machine learning methods
doi.org/10.1038/s41598-024-66173-z
The importance of explainable AI in guiding medicinal chemistry design is becoming increasingly evident. This month, we choose to highlight a study by Kaveh et al. that used machine learning to extract SAR knowledge from cyclin-dependent kinase (CDK) inhibitors. The CDKs are involved in a range of essential regulatory roles in the cell cycle, with the different subfamilies being associated with different disease mechanisms. When designing inhibitors for a given CDK, therefore, it is essential to consider the selectivity of the molecules to reduce off-target interactions to the different subfamilies. Kaveh et al. curated 8592 small molecules with measured binding affinities against CDK1, CDK2, CDK4, CDK5 and CDK9 and employed two types of classification machine learning models on descriptors selected using the variable importance projection (VIP) approach. The models were built on each subfamily to predict active/inactive compounds, as well as on the combined set of active CDK compounds to classify the CDK inhibitors into their subfamilies. Usefully, the article presents the important differentiating descriptors identified for both the active/inactive models and the CDK classification model, providing a valuable resource for the community to aid in medicinal chemistry design of selective CDK inhibitors. For example, the hydrophilic factor is identified as an important descriptor for the activity of CDK1, CDK2 and CDK5 molecules, with larger values being associated with the active compounds. Furthermore, an abstract principal component space built on the VIP-selected descriptors provides a selectivity map for researchers to project and prioritise compound designs.