GHDDI's Free AI Virtual Screening Service for COVID-19

To systematically assess the ligand based model properties:

1. Please paste your SMILES list, Drug Name list or upload a file here. (example) Each query can handle 100 compounds at most.

Select input file(*.csv,*.xls, *.xlsx)
Clear

2. Select 1 or more model kinds listed below.

3. Select the output report type.

Model Documentation

A. Ligand based AI models

We have tried different training sets containing different virus species and their targets to build target specific or phenotype based classification AI models using GHDDI self-developed HAG-Net deep learning system. HAG-Net, short for Heterogeneous Aggregation Graph Net, constructs multi-channel convolution with hybrid aggregation to enhance the feature extraction capability for graph-based molecular data. We only selected models showing 5-fold cross-validation AUC>0.9 as qualification for further predictive practice, and the results are ensemble predictions. Viral targets, including RDRP, Helicase, 3C-like protease of SARS-CoV-2 showing relatively higher between species conservation are prioritized in this effort. We use these models to predict different bioactivities of approved or investigational stage drug molecules (~12K) in GHDDI stock as part of the drug repurposing effort. As we are constantly improving our algorithm and expanding our training data, the results will be updated periodically. This work is published at arxiv.org, more details about this work can be found at: https://arxiv.org/abs/2102.04064

A.1 Heterogeneous antiviral AI model

Training Data: Using heterogeneous records of antiviral bioactivity data including target based and phenotype based records from various species and in vitro assays, a total of 76247 compounds with 37332 active and 38915 inactive molecules (EC50 <=100nM for at least one viral species as active). Performance (5-fold cross-validation): AUC avg. = 0.94

A.2 Phenotypic antiviral AI model

Training Data: Using heterogeneous records of antiviral bioactivity data of phenotype based records from various species and in vitro assays, a total of 7305 compounds with 3751 active and 3554 inactive molecules (EC50 <=100nM for at least one viral species as active). Performance (5-fold cross-validation): AUC avg. = 0.908

A.3 RNA-dependent RNA polymerase AI model

Training Data: Using heterogeneous records of RNA-dependent RNA polymerase related bioactivity data from various species and in vitro assays, a total of 583 compounds with 306 active and 277 inactive molecules (IC50 <=1μM as active). Performance (5-fold cross-validation): AUC avg. = 0.952

A.4 Helicase AI model

Training Data: Using heterogeneous records of Helicase related bioactivity data from various pathogen species and in vitro assays, a total of 878 compounds with 127 active and 751 inactive molecules (IC50 <=1μM as active). Performance (5-fold cross-validation): AUC avg. = 0.926

A.5 3C-like protease AI model

Training Data: Using heterogeneous records of 3C-like protease related bioactivity data from various species and in vitro assays, a total of 457 compounds with 132 active and 325 inactive molecules (IC50 <=1μM as active). Performance (5-fold cross-validation): AUC avg. = 0.97