{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Building a Multi-Molecule Mol2 reader for RDKit V2\n", "\n", "In this mini-tools entry, I want to introduce a simple but power modification to my previous version of Mol2MolSupplier for RDKIT:\n", "\n", "https://chem-workflows.com/articles/2019/07/18/building-a-multi-molecule-mol2-reader-for-rdkit/\n", "\n", "Perhaps some of you (as me) encountered errors when using the previous version of the Mol2MolSupplier.\n", "\n", "Well, this modification overcomes many of the previous errors which were differences in the headers and order of the Mol2 molecule blocks in different Mol2 files.\n", "\n", "This new version used the same approach as before but in a simpler and faster way.\n", "\n", "Let's go directly to the new function:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing the libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from rdkit import Chem\n", "from rdkit.Chem import Draw,AllChem\n", "from rdkit.Chem.Draw import IPythonConsole\n", "\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def Mol2MolSupplier (file=None,sanitize=True):\n", " mols=[]\n", " with open(file, 'r') as f:\n", " doc=[line for line in f.readlines()]\n", "\n", " start=[index for (index,p) in enumerate(doc) if '@MOLECULE' in p]\n", " finish=[index-1 for (index,p) in enumerate(doc) if '@MOLECULE' in p]\n", " finish.append(len(doc))\n", " \n", " interval=list(zip(start,finish[1:]))\n", " for i in interval:\n", " block = \",\".join(doc[i[0]:i[1]]).replace(',','')\n", " m=Chem.MolFromMol2Block(block,sanitize=sanitize)\n", " mols.append(m)\n", " return(mols)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will use the following multi-molecule mol2 file to show how the function works. The file contains 169 different molecules from [ZINC](https://zinc.docking.org/)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "filePath ='for-sale+in-man+fda+named+endogenous.mol2'" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "RDKit WARNING: [12:48:54] ZINC000003801919: warning - O.co2 with non C.2 or S.o2 neighbor.\n" ] } ], "source": [ "database=Mol2MolSupplier(filePath,sanitize=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because we are using RDKit to convert mol2 block texts to RDKit molecules, we can use sanitization or not. Moreover, we can see the warning for sanitization problems. If a molecule is not valid we will get a 'None' element." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[None,\n", " None,\n", " None,\n", " None,\n", " None,\n", " None,\n", " None,\n", " None,\n", " ,\n", " ]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "database[:10] #The first 10 elements in the list " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once molecules loaded. We can perform any calculation available in RDKit or converting the molecules to other formats (e.g. SDF).\n", "For instance, creating a pandas table with some useful molecular information." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "table=pd.DataFrame()\n", "index=0\n", "for mol in database:\n", " if mol:\n", " table.loc[index,'Name']=mol.GetProp('_Name')\n", " table.loc[index,'NumAtoms']=mol.GetNumAtoms()\n", " table.loc[index,'SMILES']=Chem.MolToSmiles(mol)\n", " index=index+1" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameNumAtomsSMILES
0ZINC00000383089120.0[NH3+][C@@H](CCC(=O)N[C@@H](CS)C(=O)NCC(=O)[O-...
1ZINC00000447441429.0C=C1CC[C@H](O)C/C1=C/C=C1\\CCC[C@@]2(C)[C@H]1CC...
2ZINC00003394350822.0CC1=C(/C=C/C(C)=C\\C=C\\C(C)=C/C(=O)[O-])C(C)(C)...
3ZINC0000000010119.0O=C([O-])c1ccccc1
4ZINC00000153057522.0COc1cc(CNC(=O)CCCC/C=C/C(C)C)ccc1O
5ZINC00000378089330.0CCC(C)(C)C(=O)O[C@H]1C[C@@H](C)C=C2C=C[C@H](C)...
6ZINC00000387533228.0C[C@@H]1C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C...
7ZINC00000409585831.0Cc1c(C)c2c(c(C)c1O)CC[C@@](C)(CCC[C@H](C)CCC[C...
8ZINC00000857721832.0Nc1nc2ncc(CNc3ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=...
9ZINC00010001504830.0C=C1/C(=C\\C=C2/CCC[C@@]3(C)[C@H]2CC[C@@H]3[C@H...
\n", "
" ], "text/plain": [ " Name NumAtoms \\\n", "0 ZINC000003830891 20.0 \n", "1 ZINC000004474414 29.0 \n", "2 ZINC000033943508 22.0 \n", "3 ZINC000000001011 9.0 \n", "4 ZINC000001530575 22.0 \n", "5 ZINC000003780893 30.0 \n", "6 ZINC000003875332 28.0 \n", "7 ZINC000004095858 31.0 \n", "8 ZINC000008577218 32.0 \n", "9 ZINC000100015048 30.0 \n", "\n", " SMILES \n", "0 [NH3+][C@@H](CCC(=O)N[C@@H](CS)C(=O)NCC(=O)[O-... \n", "1 C=C1CC[C@H](O)C/C1=C/C=C1\\CCC[C@@]2(C)[C@H]1CC... \n", "2 CC1=C(/C=C/C(C)=C\\C=C\\C(C)=C/C(=O)[O-])C(C)(C)... \n", "3 O=C([O-])c1ccccc1 \n", "4 COc1cc(CNC(=O)CCCC/C=C/C(C)C)ccc1O \n", "5 CCC(C)(C)C(=O)O[C@H]1C[C@@H](C)C=C2C=C[C@H](C)... \n", "6 C[C@@H]1C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C... \n", "7 Cc1c(C)c2c(c(C)c1O)CC[C@@](C)(CCC[C@H](C)CCC[C... \n", "8 Nc1nc2ncc(CNc3ccc(C(=O)N[C@@H](CCC(=O)[O-])C(=... \n", "9 C=C1/C(=C\\C=C2/CCC[C@@]3(C)[C@H]2CC[C@@H]3[C@H... " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table.head(10) #The first 10 non None elements in the list " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Drawing some non None molecules keeping the 3D coordinates from the mol2 file." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "no_none=[mol for mol in database if mol] # None element canĀ“t be drawn, this loop keep only valid entries\n", "[Chem.SanitizeMol(mol) for mol in no_none]\n", "Draw.MolsToGridImage(no_none[:14],molsPerRow=7,subImgSize=(150,150),legends=[mol.GetProp('_Name') for mol in no_none[:14]],maxMols=100)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", "text/html": [ "
\n", "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", " jupyter labextension install jupyterlab_3dmol

\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", "text/html": [ "
\n", "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", " jupyter labextension install jupyterlab_3dmol

\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", "text/html": [ "
\n", "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", " jupyter labextension install jupyterlab_3dmol

\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Drawing 3 random molecules of non None list\n", "Draw.IPythonConsole.drawMol3D(no_none[2])\n", "Draw.IPythonConsole.drawMol3D(no_none[6])\n", "Draw.IPythonConsole.drawMol3D(no_none[9])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### As you can see, the functionality of this library is similar to the previous one. I would strongly recommend using this version instead of the previous one to avoid several errors.\n", "\n", "### If you find this Mini-Tool useful or encounter some errors, please leave me a comment.\n", "\n", "#### Cheers!!!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ".. disqus::" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 4 }