python 根据python字典快速查找排序ID(seqID)或加入名称。最初是为了处理来自100的加入
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 根据python字典快速查找排序ID(seqID)或加入名称。最初是为了处理来自100的加入相关的知识,希望对你有一定的参考价值。
#!/usr/bin/python
"""
Author: Johan Zicola
"""
import sys
import ast
import os
def make_dict(tsv_file):
# function to merge 2 dictionaries
def merge_two_dicts(x, y):
z = x.copy() # start with x's keys and values
z.update(y) # modifies z with y's keys and values & returns None
return z
file = open(tsv_file,"r")
dictionary = {}
# If lines commented (start with #), return an error
# I let it before but it creates a bug when user intervert column using
# awk (the # is then moved and the line is considered as valid)
# Ignore also empty lines (if for example blank line at the
# end of the file) with line.strip() (should be true if anything on the line)
# Use split() instead of split('\t') as no argument allows the function
# to deal with spaces as well as with tabulation
for line in file:
if line[0] == "#":
sys.exit("'#' sign in the file, please remove comment lines and rerun")
elif line.strip():
line = line.strip().split()
key = line[0]
try:
value = line[1]
except IndexError:
sys.exit("'"+str(tsv_file)+"' is missing a second column")
dict_line = {key:value}
dictionary = merge_two_dicts(dictionary, dict_line)
print(dictionary)
def main():
"""
Provide as first argument a python dictionary (as generated by make_dict.py)
Provide as second argument a list of accession ID (one ID by row) or a list of
ID as strings separated by spaces (24555 54545 66556)
If dictionnary was generated using accession name as key and ID as definition
the script return the ID for a list of accessions or a list of strings (Col-0 Ler-0 Cvi-0)
"""
if len(sys.argv) == 1 or sys.argv[1] == '-h':
print("""
find_accession
Work with Python2.x and Python3.x
Author: Johan Zicola (johan.zicola@gmail.com)
Date: 2017-09-09
usage: find_accession.py [-h] dictionary accession
argument 'dictionary' should be a file containing a dictionary
with either seqID:name_accession or name_accession:seqID as
key:value pairs. The dictionary can be generated using the script
by using the function make_dict
example:
python find_accession.py make_dict <accession_seqID.txt>
For instance, we want the accession name to be our key
and the seqID to be our value
$ cat accession_seqID.txt
Col-0 6909
Cvi-0 6911
$ python find_accession.py make_dict file.txt
{'Col-0': '6909', 'Cvi-0': '6911'}
Just redirect the standard output to a file
$ python find_accession.py make_dict accession_seqID.txt > accession_seqID.dict
To generate the opposite dictionary, reverse the columns of accession_seqID.txt
using awk
$ awk '{ print $2,$1 }' accession_seqID.txt > seqID_accession.txt
And relaunch the python script
$ python find_accession.py make_dict seqID_accession.txt > seqID_accession.dict
Argument 'accession' can be one seqID/name, a list of
seqIDs/names separated by spaces, or a file with a seqID/name by row
example:
Get the seq_ID of one accession
$ python find_accession.py accession_seqID.dict Col-0
> Col-0 6909
Find seqID using names of 2 accessions
$ python find_accession.py accession_name_to_seqID.dict Col-0 Cvi-0
Col-0 6909
Cvi-0 6911
Find the accession name using the seqID
$ python find_accession.py seqID_accession.dict 6911
6911 Cvi-0
""")
sys.exit()
else:
if sys.argv[1] == 'make_dict' and len(sys.argv) == 3:
make_dict(sys.argv[2])
elif sys.argv[1] == 'make_dict' and len(sys.argv) == 2:
sys.exit("Provide file to generate the dictionary")
else:
# Check if dictionary exists
if not os.path.exists(sys.argv[1]):
sys.exit("File '"+str(sys.argv[1])+"' does not exists")
# Open dictionary
dictionary = open(sys.argv[1],'r')
dictionary = dictionary.read().strip()
# Assess the content of the string and recognize a dictionary
# Exit if object is not a dictionary
try:
dictionary = ast.literal_eval(dictionary)
except SyntaxError:
sys.exit("'"+str(sys.argv[1])+"' is not a dictionary")
# Test if argument(s) given after the dictionary
try:
accession = sys.argv[2]
except IndexError:
sys.exit("No argument provided after the dictionary")
# Get either a file with one ID per line or a suite of strings
accession = sys.argv[2:]
# Check if the accession is a list or a file, if file, open the file
if os.path.exists(accession[0]):
accession = open(sys.argv[2],'r')
# Search for ID in the dictionary. Do not return anything if key does not exist
for i in accession:
i = i.strip()
if i in dictionary:
acc_name = dictionary[i]
print(i+"\t"+acc_name)
if __name__ == "__main__":
sys.exit(main())
以上是关于python 根据python字典快速查找排序ID(seqID)或加入名称。最初是为了处理来自100的加入的主要内容,如果未能解决你的问题,请参考以下文章