python 根据python字典快速查找排序ID(seqID)或加入名称。最初是为了处理来自100的加入

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 根据python字典快速查找排序ID(seqID)或加入名称。最初是为了处理来自100的加入相关的知识,希望对你有一定的参考价值。

#!/usr/bin/python

"""
 Author: Johan Zicola
"""
import sys
import ast
import os


def make_dict(tsv_file): 
    # function to merge 2 dictionaries
    def merge_two_dicts(x, y):
        z = x.copy()   # start with x's keys and values
        z.update(y)    # modifies z with y's keys and values & returns None
        return z

    file = open(tsv_file,"r")
    dictionary = {}
    
    # If lines commented (start with #), return an error
    # I let it before but it creates a bug when user intervert column using
    # awk (the # is then moved and the line is considered as valid)
    # Ignore also empty lines (if for example blank line at the 
    # end of the file) with line.strip() (should be true if anything on the line)
    # Use split() instead of split('\t') as no argument allows the function
    # to deal with spaces as well as with tabulation
    for line in file:
        if line[0] == "#":
            sys.exit("'#' sign in the file, please remove comment lines and rerun")
        elif line.strip():
            line = line.strip().split()
            key = line[0]
            try:            
                value = line[1]
            except IndexError:
                sys.exit("'"+str(tsv_file)+"' is missing a second column")
            dict_line = {key:value}
            dictionary = merge_two_dicts(dictionary, dict_line)
    print(dictionary)



def main():
    """
     Provide as first argument a python dictionary (as generated by make_dict.py)
     Provide as second argument a list of accession ID (one ID by row) or a list of 
     ID as strings separated by spaces (24555 54545 66556)
     If dictionnary was generated using accession name as key and ID as definition
     the script return the ID for a list of accessions or a list of strings (Col-0 Ler-0 Cvi-0)

    """
    if len(sys.argv) == 1 or  sys.argv[1] == '-h':
        print("""
    find_accession
    
    Work with Python2.x and Python3.x
    
    Author: Johan Zicola (johan.zicola@gmail.com)
    
    Date: 2017-09-09
        
    usage: find_accession.py [-h] dictionary accession
    
    argument 'dictionary' should be a file containing a dictionary
    with either seqID:name_accession or name_accession:seqID as
    key:value pairs. The dictionary can be generated using the script 
    by using the function make_dict
    
    example:
    python find_accession.py make_dict <accession_seqID.txt>

    For instance, we want the accession name to be our key
    and the seqID to be our value
    $ cat accession_seqID.txt
    Col-0   6909
    Cvi-0   6911

    $ python find_accession.py make_dict file.txt
    {'Col-0': '6909', 'Cvi-0': '6911'}

    Just redirect the standard output to a file
    $ python find_accession.py make_dict accession_seqID.txt > accession_seqID.dict

    To generate the opposite dictionary, reverse the columns of accession_seqID.txt
    using awk
    $ awk '{ print $2,$1 }' accession_seqID.txt > seqID_accession.txt

    And relaunch the python script
    $ python find_accession.py make_dict seqID_accession.txt > seqID_accession.dict

    Argument 'accession' can be one seqID/name, a list of 
    seqIDs/names separated by spaces, or a file with a seqID/name by row

    example:
    Get the seq_ID of one accession
    $ python find_accession.py accession_seqID.dict Col-0
    > Col-0   6909
    
    Find seqID using names of 2 accessions
    $ python find_accession.py accession_name_to_seqID.dict Col-0 Cvi-0 
    Col-0   6909
    Cvi-0   6911

    Find the accession name using the seqID
    $ python find_accession.py seqID_accession.dict 6911
    6911    Cvi-0

    """)
        sys.exit() 

    else:
        
        if  sys.argv[1] == 'make_dict' and len(sys.argv) == 3:
            make_dict(sys.argv[2])

        elif sys.argv[1] == 'make_dict' and len(sys.argv) == 2:
            sys.exit("Provide file to generate the dictionary")
        else: 
           
            # Check if dictionary exists
            if not os.path.exists(sys.argv[1]):
                sys.exit("File '"+str(sys.argv[1])+"' does not exists")
             
            # Open dictionary
            dictionary = open(sys.argv[1],'r')
            dictionary = dictionary.read().strip()

            # Assess the content of the string and recognize a dictionary
            # Exit if object is not a dictionary
            try:
                dictionary = ast.literal_eval(dictionary)
            except SyntaxError:
                sys.exit("'"+str(sys.argv[1])+"' is not a dictionary")

            # Test if argument(s) given after the dictionary
            try:
                accession = sys.argv[2]
            except IndexError:
                sys.exit("No argument provided after the dictionary")

            # Get either a file with one ID per line or a suite of strings
            accession = sys.argv[2:]
            
            # Check if the accession is a list or a file, if file, open the file
            if os.path.exists(accession[0]):
                accession = open(sys.argv[2],'r')
            
            # Search for ID in the dictionary. Do not return anything if key does not exist
            for i in accession:
                i = i.strip()
                if i in dictionary:
                    acc_name = dictionary[i]
                    print(i+"\t"+acc_name)


if __name__ == "__main__":
    sys.exit(main())

以上是关于python 根据python字典快速查找排序ID(seqID)或加入名称。最初是为了处理来自100的加入的主要内容,如果未能解决你的问题,请参考以下文章

python-12-字典的嵌套与int快速排序

python根据字典的值进行排序:

Python 按照 list 中的字典的某个 key 排序

根据键对 Python 字典进行排序? [复制]

python基础一 ------如何根据字典值对字典进行"排序"

python怎么对字典进行排序