Research

A few highlighted projects

Data augmentation in deep learning

In this project, we wanted to determine if, and how would be the best way to use augmentation techniques to improve predictions in the context of drug design. We used well-known benchmark data sets to test several neural network architectures with our newly developed augmentation techniques and we beat state-of-the-art results. We also formalize the idea of confidence prediction on unlabeled data. More details can be found in the publication:

🔗: https://doi.org/10.1016/j.ailsci.2021.100014

deep learning

molecular property prediction

data augmentation

open-source

confidence assessment

SMILES

Explainable AI

Identifying potential toxic substructures in a compound was one of the goals in this research. We used the Deep Taylor Decomposition to assign scores to atomic environments that were important in the labeling of a cytotoxic molecule. The data set came from the Leibniz Associations Research Institute for Molecular Pharmacology (FMP: Leibniz-Forschungsinstitut für Molekulare Pharmakologie), which made it highly consistent. Moreover, the percentage of compounds that were experimentally tested cytotoxic was low (approx. 4.5%), making the data set imbalanced.

deep learning

cytotoxicity maps

model interpretability

toxicity prediction

Kinase similarity assessment for off-target prediction

  • Dominique Sydow

This project involves a modular pipeline that allows the comparison of kinases using four different methods, namely the pocket sequence, structural information, protein-ligand interactions, and ligand-profiling data. The proposed pipeline consists of six Jupyter notebooks. Given a set of kinases in a CSV format, four similarity measures are implemented, and kinases are compared using heatmaps and dendrograms. The project is part of TeachOpenCADD and uses open-source tools and databases such as KLIFS and ChEMBL.

kinase similarity

automated pipeline

open source

off-target prediction

Kinodata - Kinase data sets for machine learning

  • WIP

kinase dataset

open source

KinoML - machine learning for kinase drug discovery

  • WIP

kinase drug discovery

machine learnign

open source