Virtual Screening

This notebook aims to demonstrate how to use AutoDock Vina (via Smina) and Ledock to dock multiple molecules in the same protein target and binding site.

Content of this notebook

  1. Feching system and cleanup
  2. System Visualization
  3. Docking with Smina
    • Receptor preparation
    • Ligand preparation
    • Docking box definition
    • Docking
    • 3D visualization of docking results
  4. Docking with LeDock
    • Receptor preparation
    • Ligand preparation
    • Docking box definition
    • Docking
    • DOK results file conversion to SDF
    • 3D visualization of docking results
In [1]:
from pymol import cmd
import py3Dmol

from openbabel import pybel

from rdkit import Chem
from rdkit.Chem import AllChem

import sys, os, random
sys.path.insert(1, 'utilities/')

from utils import getbox, generate_ledock_file, dok_to_sdf

import warnings
warnings.filterwarnings("ignore")

%config Completer.use_jedi = False
In [2]:
os.chdir('test/Virtual_Screening/')

1. Feching system and cleanup

Implementing Pymol is a simple way to download PDB structures. The user can launch this or any other Jupyter Dock's protocol by providing his or her own files.

In [3]:
cmd.fetch(code='1X1R',type='pdb1')
cmd.select(name='Prot',selection='polymer.protein')
cmd.select(name='GDP',selection='organic')
cmd.save(filename='1X1R_clean.pdb',format='pdb',selection='Prot')
cmd.save(filename='1X1R_GDP.mol2',format='mol2',selection='GDP')
cmd.delete('all')

2. System Visualization

A cool feature of Jupyter Dock is the posibility of visualize ligand-protein complexes and docking results into the notebbok. All this thanks to the powerful py3Dmol.

Now the protein and ligand have been sanitized it would be recomended to vissualize the ligand-protein reference system.

In [4]:
view = py3Dmol.view()
view.removeAllModels()
view.setViewStyle({'style':'outline','color':'black','width':0.1})

view.addModel(open('1X1R_clean.pdb','r').read(),format='pdb')
Prot=view.getModel()
Prot.setStyle({'cartoon':{'arrows':True, 'tubes':True, 'style':'oval', 'color':'white'}})
view.addSurface(py3Dmol.VDW,{'opacity':0.6,'color':'white'})

view.addModel(open('1X1R_GDP.mol2','r').read(),format='mol2')
ref_m = view.getModel()
ref_m.setStyle({},{'stick':{'colorscheme':'greenCarbon','radius':0.2}})

view.zoomTo()
view.show()

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

3. Docking with Smina

Despite the presence of Python bindings in AutoDock Vina 1.2.0, other tools that incorporate AutoDock Vina allow for cool features such as custom score functions (smina), fast execution (qvina), and the use of wider boxes (qvina-w). Jupyter Dock can run such binaries in a notebook, giving users more options.

Smina is a fork of AutoDock Vina that is customized to better support scoring function development and high-performance energy minimization. Smina is maintained by David Koes at the University of Pittsburgh and is not directly affiliated with the AutoDock project.

Info: The following cell contains an example of using Smina to run the current docking example. However, the executable files for qvina and qvina-w are available in the Jupyter Dock repo's bin directory. As a result, the user can use such a tool by adding the necessary cells or replacing the current docking engine.

3.1. Receptor preparation

Despite the fact that Smina is a modified version of AutoDock Vina, the input file for a receptor in Smina can be either a PDBQT file or a PDB file with explicit hydrogens in all residues. At this point, we can make use ofca protein structure from Jupyter Dock's _fix_protein()_ function or implementing LePro (for more information, see the 2.1 section of the Molecular Docking notebook).

In [5]:
!../../bin/lepro_linux_x86 {'1X1R_clean.pdb'}

os.rename('pro.pdb','1X1R_clean_H.pdb') # Output from lepro is pro.pdb, this line will change the name to '1X1R_clean_H.pdb'

3.2. Ligand preparation

The ligand molecules in Virtual Screening protocols could come from a variety of sources (i.e. ZINC15, PubChem, DrugBank, etc) and diverse formats (i.e. SDF, PDB, MOL, MOL2, SMILES, etc). The cell below depicts one of the simplest approaches to using molecules from SMILES codes. Users, however, can use any known chemical format in their molecules thanks to the use of PyBel and RDKit.

Regardless of the format of the molecules or the differences between docking algorithms, ligand preparation must achieve at least the following objectives:

  • Set the proper protonation and tautomeric state to the molecules.
  • Provide a valid 3D structure to initialize the conformational search.

Jupyter Dock generates protonated neutral molecules and performs energy minimization under the MMFF94s force field using the pybel functions make3D and localopt, respectively.

It is recommended that users tune the pybel settings or use other preparation methods before running the molecular docking for special ligand needs.

In [6]:
smiles=['C1=NC(=C2C(=N1)N(C=N2)CCOCP(=O)(O)O)N',
        'C1=NC(=C2C(=N1)N(C=N2)[C@H]3[C@@H]([C@@H]([C@H](O3)COP(=O)(O)O)O)O)N',
        'C[C@@H](C1=CC2=C(C(=CC=C2)Cl)C(=O)N1C3=CC=CC=C3)NC4=NC=NC5=C4NC=N5',
        'C1=NC(=C2C(=N1)N(C=N2)C3C(C(C(O3)CO)O)O)N',
        'C[C@H](CN1C=NC2=C(N=CN=C21)N)OC[P@@](=O)(N[C@@H]',
        'C1=NC(=C2C(=N1)N(C=N2)[C@H]3[C@@H]([C@@H]([C@H](O3)C[C@H](CC[C@@H](C(=O)O)N)N)O)O)N',
        'C[S+](CC[C@@H](C(=O)O)N)C[C@@H]1[C@H]([C@H]([C@@H](O1)N2C=NC3=C(N=CN=C32)N)O)O',
        'CN1C=C(C(=N1)OC)NC2=C3C(=NC(=N2)N4C[C@H]([C@@H](C4)F)NC(=O)C=C)N(C=N3)C',
        'C1COC[C@@H]1NC2=C3C(=NC=N2)N(C=N3)[C@H]4[C@@H]([C@@H]([C@H](O4)CO)O)O']
In [7]:
out=pybel.Outputfile(filename='InputMols.mol2',format='mol2',overwrite=True)
for index,smi in enumerate(smiles):
    mol=pybel.readstring(string=smi,format='smiles')
    mol.title='mol_'+str(index)
    mol.make3D('mmff94s')
    mol.localopt(forcefield='mmff94s', steps=500)
    out.write(mol)
out.close()

3.3. Docking box definition

In this example, the natural ligand GDP of 1X1R will be used as a reference for box definition (Section 4.3 of the Molecular Docking Notebook contains more information about the getbox() function).

In [8]:
cmd.load(filename='1X1R_clean_H.pdb',format='pdb',object='prot') #Not needed but as reference of the system
cmd.load(filename='1X1R_GDP.mol2',format='mol2',object='lig')

center,size=getbox(selection='lig',extending=6.0,software='vina')
cmd.delete('all')
print(center)
print(size)
{'center_x': 3.4204999953508377, 'center_y': 9.91599988937378, 'center_z': 11.27299976348877}
{'size_x': 19.56700000166893, 'size_y': 18.30399990081787, 'size_z': 23.20599937438965}

3.4. Docking

Jupyter Dock comes with the smina executable for Linux and Mac OS. By running the binary file, the parameters can be accessed.

Hint: Smina also has the ability to use a multimodel file as input and then save the results to a single multimodel output file. Furthermore, the input and output molecule files can be tar.gz. sdf files, which is very useful when working with large databases.

In [9]:
!../../bin/smina -r {'1X1R_clean_H.pdb'} -l {'InputMols.mol2'} -o {'1X1R_lig_smina_out.sdf'} --center_x {center['center_x']} --center_y {center['center_y']} --center_z {center['center_z']} --size_x {size['size_x']} --size_y {size['size_y']} --size_z {size['size_z']} --exhaustiveness 8 --num_modes 5
   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Weights      Terms
-0.035579    gauss(o=0,_w=0.5,_c=8)
-0.005156    gauss(o=3,_w=2,_c=8)
0.840245     repulsion(o=0,_c=8)
-0.035069    hydrophobic(g=0.5,_b=1.5,_c=8)
-0.587439    non_dir_h_bond(g=-0.7,_b=0,_c=8)
1.923        num_tors_div

Using random seed: 1174223054

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -7.1       0.000      0.000    
2       -7.0       4.222      6.100    
3       -7.0       3.803      5.854    
4       -7.0       2.995      4.131    
5       -6.8       3.922      5.301    
Refine time 8.195
Using random seed: 1174223054

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -9.8       0.000      0.000    
2       -9.6       1.308      1.743    
3       -9.0       4.473      7.254    
4       -8.9       1.608      2.661    
5       -8.5       1.643      2.583    
Refine time 12.469
Using random seed: 1174223054

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -9.4       0.000      0.000    
2       -9.1       2.616      5.000    
3       -8.6       3.401      5.918    
4       -8.0       4.080      6.302    
5       -7.9       3.441      6.072    
Refine time 19.370
Using random seed: 1174223054

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -8.1       0.000      0.000    
2       -7.7       2.135      3.224    
3       -7.7       3.180      6.236    
4       -7.7       2.006      2.895    
5       -7.5       2.097      2.856    
Refine time 6.438
Using random seed: 1174223054

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -7.1       0.000      0.000    
2       -7.0       3.900      8.314    
3       -6.9       3.468      4.357    
4       -6.8       3.781      8.216    
5       -6.6       3.234      7.536    
Refine time 10.387
Using random seed: 1174223054

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -10.1      0.000      0.000    
2       -9.2       1.325      2.123    
3       -8.9       2.481      8.154    
4       -8.5       2.389      8.169    
5       -8.1       3.390      8.328    
Refine time 30.488
Using random seed: 1174223054

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -10.0      0.000      0.000    
2       -8.9       1.616      2.461    
3       -8.8       1.062      1.897    
4       -8.8       2.865      8.326    
5       -8.1       2.755      3.488    
Refine time 29.620
Using random seed: 1174223054

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -8.0       0.000      0.000    
2       -7.9       2.517      7.369    
3       -7.9       2.125      7.234    
4       -7.4       2.469      4.642    
5       -7.2       3.771      6.174    
Refine time 29.695
Using random seed: 1174223054

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1       -8.9       0.000      0.000    
2       -8.0       2.301      7.375    
3       -8.0       2.840      7.385    
4       -7.8       2.884      7.425    
5       -7.7       2.384      3.749    
Refine time 13.313
Loop time 172.436

3.5. 3D visualization of docking results

As with the system visualization (section 2), the docking results can be inspected and compared to the reference structure (if one exists). Smina saves the "minimizedAffinity" information corresponding to the docking score as the molecule's attribute.

In [10]:
view = py3Dmol.view()
view.removeAllModels()
view.setViewStyle({'style':'outline','color':'black','width':0.1})

view.addModel(open('1X1R_clean_H.pdb','r').read(),'pdb')
Prot=view.getModel()
Prot.setStyle({'cartoon':{'arrows':True, 'tubes':True, 'style':'oval', 'color':'white'}})
view.addSurface(py3Dmol.VDW,{'opacity':0.8,'color':'white'})


view.addModel(open('1X1R_GDP.mol2','r').read(),'mol2')
ref_m = view.getModel()
ref_m.setStyle({},{'stick':{'colorscheme':'greenCarbon','radius':0.2}})


poses=Chem.SDMolSupplier('1X1R_lig_smina_out.sdf',True)
for p in list(poses)[::5]:
    pose_1=Chem.MolToMolBlock(p)
    print(p.GetProp('_Name'),'Score: {}'.format(p.GetProp('minimizedAffinity')))
    color = ["#"+''.join([random.choice('0123456789ABCDEF') for j in range(6)])]
    view.addModel(pose_1,'mol')
    z= view.getModel()
    z.setStyle({},{'stick':{'color':color[0],'radius':0.05,'opacity':0.6}})

view.zoomTo()
view.show()
mol_0 Score: -7.07914
mol_1 Score: -9.79016
mol_2 Score: -9.37383
mol_3 Score: -8.05686
mol_4 Score: -7.05481
mol_5 Score: -10.09781
mol_6 Score: -9.98733
mol_7 Score: -8.02271
mol_8 Score: -8.91059

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

4. Docking with LeDock

LeDock is designed for fast and accurate flexible docking of small molecules into a protein. It achieves a pose-prediction accuracy of greater than 90% on the Astex diversity set and takes about 3 seconds per run for a drug-like molecule. It has led to the discovery of novel kinase inhibitors and bromodomain antagonists from high-throughput virtual screening campaigns. It directly uses SYBYL Mol2 format as input for small molecules.

4.1. Receptor preparation

In LeDock, the input file for a receptor is a PDB file with explicit hydrogens in all residues. LePro was created as a tool for preparing protein structures for docking with LeDock. Thus, at this stage, we can use the file ater sanitization steps as well as a protein structure from Jupyter Dock's _fix_protein()_ function or implementing LePro (for more information, see the 2.1 section of the Molecular Docking notebook).

4.2. Ligand preparation

MOL2 is the ligand input format in LeDock. LeDock, on the other hand, cannot accept a multimodel MOL2 file as an input. To use LeDock, we can run the preparation in the same way as Smina (section 3.2), but generate one MOL2 file per ligand.

In [11]:
smiles=['C1=NC(=C2C(=N1)N(C=N2)CCOCP(=O)(O)O)N',
        'C1=NC(=C2C(=N1)N(C=N2)[C@H]3[C@@H]([C@@H]([C@H](O3)COP(=O)(O)O)O)O)N',
        'C[C@@H](C1=CC2=C(C(=CC=C2)Cl)C(=O)N1C3=CC=CC=C3)NC4=NC=NC5=C4NC=N5',
        'C1=NC(=C2C(=N1)N(C=N2)C3C(C(C(O3)CO)O)O)N',
        'C[C@H](CN1C=NC2=C(N=CN=C21)N)OC[P@@](=O)(N[C@@H]',
        'C1=NC(=C2C(=N1)N(C=N2)[C@H]3[C@@H]([C@@H]([C@H](O3)C[C@H](CC[C@@H](C(=O)O)N)N)O)O)N',
        'C[S+](CC[C@@H](C(=O)O)N)C[C@@H]1[C@H]([C@H]([C@@H](O1)N2C=NC3=C(N=CN=C32)N)O)O',
        'CN1C=C(C(=N1)OC)NC2=C3C(=NC(=N2)N4C[C@H]([C@@H](C4)F)NC(=O)C=C)N(C=N3)C',
        'C1COC[C@@H]1NC2=C3C(=NC=N2)N(C=N3)[C@H]4[C@@H]([C@@H]([C@H](O4)CO)O)O']
In [12]:
for index,smi in enumerate(smiles):
    mol=pybel.readstring(string=smi,format='smiles')
    mol.title='mol_'+str(index)
    mol.make3D('mmff94s')
    mol.localopt(forcefield='mmff94s', steps=500)
    out=pybel.Outputfile(filename='ledock_inputfiles/'+'mol_'+str(index)+'.mol2',format='mol2',overwrite=True)
    out.write(mol)
    out.close()

4.3. Docking box definition

This step can be completed in the same manner as the Smina box definition (section 3.3). To obtain the identical box from Smina docking but in LeDock format, the user only needs to change the parameter "software" from "vina" to "ledock."

Info: The implementation of the getbox() function allows for the easy replication of binding sites between AutoDock Vina and LeDock, with the goal of replicating and comparing results between both programs.

In [13]:
cmd.load(filename='1X1R_clean_H.pdb',format='pdb',object='prot')
cmd.load(filename='1X1R_GDP.mol2',format='mol2',object='lig')

X,Y,Z=getbox(selection='lig',extending=6.0,software='ledock')
cmd.delete('all')
print(X)
print(Y)
print(Z)
{'minX': -6.363000005483627, 'maxX': 13.203999996185303}
{'minY': 0.7639999389648438, 'maxY': 19.067999839782715}
{'minZ': -0.3299999237060547, 'maxZ': 22.875999450683594}

4.4 Docking

Aside from one file per ligand, Ledock requires the user to create a file containing the docking paths for the ligands (ligand.list in Jupyter Dock). This file can be easily created by using the following cell and the _generate_ledock_file()_ function.

In [14]:
l_list=[]
for file in os.listdir('ledock_inputfiles/'):
    if 'mol2' in file:
        l_list.append('ledock_inputfiles/'+file+'\n')
In [15]:
l_list
Out[15]:
['ledock_inputfiles/mol_0.mol2\n',
 'ledock_inputfiles/mol_1.mol2\n',
 'ledock_inputfiles/mol_2.mol2\n',
 'ledock_inputfiles/mol_3.mol2\n',
 'ledock_inputfiles/mol_4.mol2\n',
 'ledock_inputfiles/mol_5.mol2\n',
 'ledock_inputfiles/mol_6.mol2\n',
 'ledock_inputfiles/mol_7.mol2\n',
 'ledock_inputfiles/mol_8.mol2\n']
In [16]:
generate_ledock_file(receptor='1X1R_clean_H.pdb',l_list=l_list,
                     l_list_outfile='ligand.list',
                     x=[X['minX'],X['maxX']],
                     y=[Y['minY'],Y['maxY']],
                     z=[Z['minZ'],Z['maxZ']],
                     n_poses=10,
                     rmsd=1.0,
                     out='dock.in')

Once all of the docking parameters have been entered into a configuration file (dock.in), everything is ready to go.

In [17]:
!../../bin/ledock_linux_x86 {'dock.in'}

4.5. DOK results file conversion to SDF

LeDock produces a file with the dok extension that contains docking properties in the same way that a pdb file does. Despite this, the dok file is not widely used for representing chemical structures. As a result, Jupyter Dock is capable of converting dok files to the widely used sdf format. Jupyter Dock will save the "Pose" and "Score" results as molecule attributes while preserving the chemical features (more information in section 6.5 of the Molecular Docking notebook).

In [18]:
for file in os.listdir('ledock_inputfiles/'):
    if '.dok' in file:
        os.rename('ledock_inputfiles/'+file,'ledock_outfiles/'+file)
        dok_to_sdf(dok_file='ledock_outfiles/'+file,output='ledock_outfiles/'+file.replace('dok','sdf'))

4.5. 3D visualization of docking results

As with the system visualization (section 2), the docking results can be inspected and compared to the reference structure (if one exists).

In [19]:
view = py3Dmol.view()
view.removeAllModels()
view.setViewStyle({'style':'outline','color':'black','width':0.1})

view.addModel(open('1X1R_clean_H.pdb','r').read(),'pdb')
Prot=view.getModel()
Prot.setStyle({'cartoon':{'arrows':True, 'tubes':True, 'style':'oval', 'color':'white'}})
view.addSurface(py3Dmol.VDW,{'opacity':0.8,'color':'white'})


view.addModel(open('1X1R_GDP.mol2','r').read(),'mol2')
ref_m = view.getModel()
ref_m.setStyle({},{'stick':{'colorscheme':'greenCarbon','radius':0.2}})

for file in os.listdir('ledock_outfiles/'):
    if 'sdf' in file:
        pose_1=Chem.SDMolSupplier('ledock_outfiles/'+file,False)[0]
        p=Chem.MolToMolBlock(pose_1)
        print('Name: {} | Pose: {} | Score: {}'.format(file.split('.')[0],pose_1.GetProp('Pose'),pose_1.GetProp('Score')))
        color = ["#"+''.join([random.choice('0123456789ABCDEF') for j in range(6)])]
        view.addModel(p,'mol')
        z= view.getModel()
        z.setStyle({},{'stick':{'color':color[0],'radius':0.05,'opacity':0.6}})

view.zoomTo()
view.show()
Name: mol_0 | Pose: 2 | Score: -7.91
Name: mol_1 | Pose: 5 | Score: -9.55
Name: mol_2 | Pose: 3 | Score: -7.84
Name: mol_3 | Pose: 1 | Score: -7.08
Name: mol_4 | Pose: 1 | Score: -7.51
Name: mol_5 | Pose: 2 | Score: -9.46
Name: mol_6 | Pose: 4 | Score: -9.99
Name: mol_7 | Pose: 1 | Score: -8.42
Name: mol_8 | Pose: 2 | Score: -7.16

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol


Comments

comments powered by Disqus