Building a Multi-Molecule Mol2 reader for RDKit V2

In this mini-tools entry, I want to introduce a simple but power modification to my previous version of Mol2MolSupplier for RDKIT:

https://chem-workflows.com/articles/2019/07/18/building-a-multi-molecule-mol2-reader-for-rdkit/

Perhaps some of you (as me) encountered errors when using the previous version of the Mol2MolSupplier.

Well, this modification overcomes many of the previous errors which were differences in the headers and order of the Mol2 molecule blocks in different Mol2 files.

This new version used the same approach as before but in a simpler and faster way.

Let’s go directly to the new function:

Importing the libraries

[1]:
from rdkit import Chem
from rdkit.Chem import Draw,AllChem
from rdkit.Chem.Draw import IPythonConsole

import pandas as pd
[2]:
def Mol2MolSupplier (file=None,sanitize=True):
    mols=[]
    with open(file, 'r') as f:
        doc=[line for line in f.readlines()]

    start=[index for (index,p) in enumerate(doc) if '@<TRIPOS>MOLECULE' in p]
    finish=[index-1 for (index,p) in enumerate(doc) if '@<TRIPOS>MOLECULE' in p]
    finish.append(len(doc))

    interval=list(zip(start,finish[1:]))
    for i in interval:
        block = ",".join(doc[i[0]:i[1]]).replace(',','')
        m=Chem.MolFromMol2Block(block,sanitize=sanitize)
        mols.append(m)
    return(mols)

I will use the following multi-molecule mol2 file to show how the function works. The file contains 169 different molecules from ZINC.

[3]:
filePath ='for-sale+in-man+fda+named+endogenous.mol2'
[4]:
database=Mol2MolSupplier(filePath,sanitize=True)
RDKit WARNING: [12:48:54] ZINC000003801919: warning - O.co2 with non C.2 or S.o2 neighbor.

Because we are using RDKit to convert mol2 block texts to RDKit molecules, we can use sanitization or not. Moreover, we can see the warning for sanitization problems. If a molecule is not valid we will get a ‘None’ element.

[5]:
database[:10] #The first 10 elements in the list
[5]:
[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 <rdkit.Chem.rdchem.Mol at 0x7f4d27fc4c10>,
 <rdkit.Chem.rdchem.Mol at 0x7f4d27e57df0>]

Once molecules loaded. We can perform any calculation available in RDKit or converting the molecules to other formats (e.g. SDF). For instance, creating a pandas table with some useful molecular information.

[6]:
table=pd.DataFrame()
index=0
for mol in database:
    if mol:
        table.loc[index,'Name']=mol.GetProp('_Name')
        table.loc[index,'NumAtoms']=mol.GetNumAtoms()
        table.loc[index,'SMILES']=Chem.MolToSmiles(mol)
        index=index+1
[7]:
table.head(10) #The first 10 non None elements in the list
[7]:
Name NumAtoms SMILES
0 ZINC000003830891 20.0 [NH3+][C@@H](CCC(=O)N[C@@H](CS)C(=O)NCC(=O)[O-...
1 ZINC000004474414 29.0 C=C1CC[C@H](O)C/C1=C/C=C1\CCC[C@@]2(C)[C@H]1CC...
2 ZINC000033943508 22.0 CC1=C(/C=C/C(C)=C\C=C\C(C)=C/C(=O)[O-])C(C)(C)...
3 ZINC000000001011 9.0 O=C([O-])c1ccccc1
4 ZINC000001530575 22.0 COc1cc(CNC(=O)CCCC/C=C/C(C)C)ccc1O
5 ZINC000003780893 30.0 CCC(C)(C)C(=O)O[C@H]1C[C@@H](C)C=C2C=C[C@H](C)...
6 ZINC000003875332 28.0 C[C@@H]1C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C...
7 ZINC000004095858 31.0 Cc1c(C)c2c(c(C)c1O)CC[C@@](C)(CCC[C@H](C)CCC[C...
8 ZINC000008577218 32.0 Nc1nc2ncc(CNc3ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=...
9 ZINC000100015048 30.0 C=C1/C(=C\C=C2/CCC[C@@]3(C)[C@H]2CC[C@@H]3[C@H...

Drawing some non None molecules keeping the 3D coordinates from the mol2 file.

[8]:
no_none=[mol for mol in database if mol] # None element can´t be drawn, this loop keep only valid entries
[Chem.SanitizeMol(mol) for mol in no_none]
Draw.MolsToGridImage(no_none[:14],molsPerRow=7,subImgSize=(150,150),legends=[mol.GetProp('_Name') for mol in no_none[:14]],maxMols=100)
[8]:
../_images/content_MultiMoleculeMolReader_RDKit_13_0.png
[10]:
# Drawing 3 random molecules of non None list
Draw.IPythonConsole.drawMol3D(no_none[2])
Draw.IPythonConsole.drawMol3D(no_none[6])
Draw.IPythonConsole.drawMol3D(no_none[9])

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

As you can see, the functionality of this library is similar to the previous one. I would strongly recommend using this version instead of the previous one to avoid several errors.

If you find this Mini-Tool useful or encounter some errors, please leave me a comment.

Cheers!!!