Retrieve Uniprot data using python¶
In this Mini-tool I will show you to retrieve data from Uniprot using a PDB ID or an Uniprot ID directly on python. I wrote this function based on the Uniprot programmatic access for python. Despite in essence is the same idea, I did minor modifications to retrieve the data as a plain text from which would be easier to extract useful data.
Let's put the hands on it.
The python funtion¶
import urllib
from bs4 import BeautifulSoup
def get_uniprot (query='',query_type='PDB_ID'):
#query_type must be: "PDB_ID" or "ACC"
url = 'https://www.uniprot.org/uploadlists/' #This is the webser to retrieve the Uniprot data
params = {
'from':query_type,
'to':'ACC',
'format':'txt',
'query':query
}
data = urllib.parse.urlencode(params)
data = data.encode('ascii')
request = urllib.request.Request(url, data)
with urllib.request.urlopen(request) as response:
res = response.read()
page=BeautifulSoup(res).get_text()
page=page.splitlines()
return page
This very simple function will allow us to save Uniprot data for a PDB entry or Uniprot entry into a list
PDB entry¶
x=get_uniprot(query='1eve',query_type='PDB_ID')
x[:10] #Just get the first 10 lines of data
Uniprot entry¶
y=get_uniprot(query='P04058',query_type='ACC')
y[:10] #Just get the first 10 lines of data
Another examples¶
Because all the data info is available through a list, it is very easy to find the info that we are interested in, the annotated Gene Ontology for instance.
for line in y:
if 'DR GO;' in line:
print (line)
Or all reported PDB's for desired protein including experimental methodology, resolution, and length.
for line in y:
if 'DR PDB;' in line:
print (line)
The main use for which I created this function is to store data of a list of proteins (PDB entries or Uniprot entries) into a single table. For example:
import pandas as pd #To create our table
prots=['P40926','O43175','Q9UM73']
table=pd.DataFrame()
for index,entry in enumerate(prots):
pdbs=[]
funtions=[]
process=[]
organism=[]
data=get_uniprot(query=entry,query_type='ACC')
table.loc[index,'Uniprot_entry']=entry
for line in data:
if 'OS ' in line:
line=line.strip().replace('OS ','').replace('.','')
organism.append(line)
table.loc[index,'Organism']=(", ".join(list(set(organism))))
if 'DR PDB;' in line:
line=line.strip().replace('DR ','').replace(';','')
pdbs.append ((line.split()[1]+':'+line.split()[3]))
table.loc[index,'PDB:Resol']=(", ".join(list(set(pdbs))))
if 'DR GO; GO:' in line:
line=line.strip().replace('DR GO; GO:','').replace(';','').split(':')
if 'F' in line[0]:
funtions.append(line[1])
table.loc[index,'GO_funtion']=(", ".join(list(set(funtions))))
else:
process.append (line[1])
table.loc[index,'GO_process']=(", ".join(list(set(process))))
table
Saving the table¶
table.to_csv('Uniprot_search.csv')
Comments
comments powered by Disqus