Building a Multi-Molecule Mol2 reader for RDKit V2¶
In this mini-tools entry, I want to introduce a simple but power modification to my previous version of Mol2MolSupplier for RDKIT:
https://chem-workflows.com/articles/2019/07/18/building-a-multi-molecule-mol2-reader-for-rdkit/
Perhaps some of you (as me) encountered errors when using the previous version of the Mol2MolSupplier.
Well, this modification overcomes many of the previous errors which were differences in the headers and order of the Mol2 molecule blocks in different Mol2 files.
This new version used the same approach as before but in a simpler and faster way.
Let's go directly to the new function:
Importing the libraries¶
from rdkit import Chem
from rdkit.Chem import Draw,AllChem
from rdkit.Chem.Draw import IPythonConsole
import pandas as pd
def Mol2MolSupplier (file=None,sanitize=True):
mols=[]
with open(file, 'r') as f:
doc=[line for line in f.readlines()]
start=[index for (index,p) in enumerate(doc) if '@<TRIPOS>MOLECULE' in p]
finish=[index-1 for (index,p) in enumerate(doc) if '@<TRIPOS>MOLECULE' in p]
finish.append(len(doc))
interval=list(zip(start,finish[1:]))
for i in interval:
block = ",".join(doc[i[0]:i[1]]).replace(',','')
m=Chem.MolFromMol2Block(block,sanitize=sanitize)
mols.append(m)
return(mols)
I will use the following multi-molecule mol2 file to show how the function works. The file contains 169 different molecules from ZINC.
filePath ='for-sale+in-man+fda+named+endogenous.mol2'
database=Mol2MolSupplier(filePath,sanitize=True)
Because we are using RDKit to convert mol2 block texts to RDKit molecules, we can use sanitization or not. Moreover, we can see the warning for sanitization problems. If a molecule is not valid we will get a 'None' element.
database[:10] #The first 10 elements in the list
Once molecules loaded. We can perform any calculation available in RDKit or converting the molecules to other formats (e.g. SDF). For instance, creating a pandas table with some useful molecular information.
table=pd.DataFrame()
index=0
for mol in database:
if mol:
table.loc[index,'Name']=mol.GetProp('_Name')
table.loc[index,'NumAtoms']=mol.GetNumAtoms()
table.loc[index,'SMILES']=Chem.MolToSmiles(mol)
index=index+1
table.head(10) #The first 10 non None elements in the list
Drawing some non None molecules keeping the 3D coordinates from the mol2 file.
no_none=[mol for mol in database if mol] # None element can“t be drawn, this loop keep only valid entries
[Chem.SanitizeMol(mol) for mol in no_none]
Draw.MolsToGridImage(no_none[:14],molsPerRow=7,subImgSize=(150,150),legends=[mol.GetProp('_Name') for mol in no_none[:14]],maxMols=100)
# Drawing 3 random molecules of non None list
Draw.IPythonConsole.drawMol3D(no_none[2])
Draw.IPythonConsole.drawMol3D(no_none[6])
Draw.IPythonConsole.drawMol3D(no_none[9])
Comments
comments powered by Disqus