使用EUtils从NCBI批量下载序列数据
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用EUtils从NCBI批量下载序列数据相关的知识,希望对你有一定的参考价值。
Based on http://www.ncbi.nlm.nih.gov/books/NBK25498/#chapter3.Application_3_Retrieving_large
#!/usr/bin/python # Author: Dr. Kumaran Kandasamy # E-Mail: [email protected] import urllib, urllib2, re def main(giList, database, rettype): output = "NO_DATA" base = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/" url = base+"epost.fcgi" values = {'db' : database, 'id' : giList, } data = urllib.urlencode(values) req = urllib2.Request(url, data) response = urllib2.urlopen(req) queryKey = ""; webEnv = ""; for line in response.readlines(): line = line.strip() if re.search("<WebEnv>(.*)</WebEnv>", line): webEnv = re.search("<WebEnv>(.*)</WebEnv>", line).groups()[0] if re.search("<QueryKey>(.*)</QueryKey>", line): queryKey = re.search("<QueryKey>(.*)</QueryKey>", line).groups()[0] if queryKey != "" and webEnv != "": print queryKey, webEnv url = base+"efetch.fcgi"; values = { 'db':database, 'query_key':queryKey, 'WebEnv':webEnv, 'rettype':rettype, 'retmode':'text' } #post the efetch URL data = urllib.urlencode(values) req = urllib2.Request(url, data) response = urllib2.urlopen(req) output = response.readlines() return output if __name__ == "__main__": gi = "24475906,224465210,50978625,9507198" main(gi, 'nucleotide', 'fasta')
以上是关于使用EUtils从NCBI批量下载序列数据的主要内容,如果未能解决你的问题,请参考以下文章