搜索+反向搜索腌制文件,跳过值

Posted

技术标签:

【中文标题】搜索+反向搜索腌制文件,跳过值【英文标题】:Searching + reverse seeking a pickled file , values getting skipped 【发布时间】:2021-10-03 20:59:35 【问题描述】:

最小可重现示例,我的代码中仅使用了goto_index()。其余的不言自明:

import pickle,os

def goto_index(idx_str,src,dest=False) :
    '''Go to index :
       1. Convert 1-based comma seperated digits in idx_str into 0-based list containing each index digit as int.
       2. Starting from current position of src, iterate until index[0] matches current objec's position.
          If matched, try to index the object as given. If not matched, function raises EOFError. If index illegal
          in object function raises IndexError.If object found and index found in object, return value found and
          seek src to begining of object @ index. 
       3. If dest is specified, all values until index will be copied to it from it's current position.
          If element is not found in src, the result will be that all elements from src's current positon
          to EOF are copied to dest.
    '''

    index = [int(subidx)-1 for subidx in idx_str.split(',')]
    val = None
    obj_cnt = -1                              # 0-based count

    try :
        while True :                          # EOFError if index[0] >= EOF point
            obj = pickle.load(src)
            obj_cnt += 1
            if obj_cnt == index[0] :
                val = obj
                for subidx in index[1::] :
                    val = val[subidx]         # IndexError if index illegal
                src.seek(-len(pickle.dumps(obj)),os.SEEK_CUR) # Seek to start of object at index
                return val
            elif dest : pickle.dump(obj,dest)
    except (EOFError,IndexError) : raise      # Caller will handle exceptions

def add_elements(f) :
    pickle.dump('hello world',f)
    pickle.dump('good morning',f)
    pickle.dump('69 420',f)
    pickle.dump('ending !',f)


def get_elements(f) :
    elements = []
    # Actual code similarly calls goto_index() in ascending order of indices, avoiding repeated seeks.
    for idx_str in ('1','2','3') : 
        elements.append(goto_index(idx_str,f))
    return elements

with open("tmp","wb+") as tmp :
    add_elements(tmp)
    print(', '.join(get_elements(tmp)))

    '''Expected output : hello world, good morning, 69 420
       Actual output   : hello world, good morning, ending !
       Issue : When asking for 3rd element, 3rd element skipped, 4th returned, why ?
    '''

编辑:问题在于goto_index() 在每次通话时都将obj_cnt 设置为-1。如何缓解这种情况?

【问题讨论】:

【参考方案1】:

问题是:

obj_cnt 在函数调用期间不是持久的,因此即使在每次调用中修改文件位置时也总是从头开始,所以goto_idx() 表现得好像它在 BOF 处,但反而会领先得多。 在索引处寻找对象的开始 (src.seek(-len(pickle.dumps(obj)),os.SEEK_CUR)) 会导致下一次读取读取与之前相同的对象 - 如果修复了之前的错误,这将导致goto_index() 总是从第一次调用中转到并返回索引处的对象。

我通过以下方式修复它:a) 将函数放在一个可以访问计数变量的类中,b) 添加一个附加标志 fp_set,并且只有在它设置为真值时才返回,c) 提供一个 @类中的 987654326@ 方法,以便在完成一系列有序查询时将 obj_cnt 重置为 -1

请记住,我对 Python 中的 OOP 非常陌生,下面的代码中可能有些奇怪:

class goto_index:
    obj_cnt = -1 # 0-based count
    
    def sorted(idx_str,src,dest=None,fp_set=False) :
    #Use if going to indexes in ascending order in loop
    # idx_str = comma-seperated index , eg : "7,8" like foo[7][8]
    # src     = file object to search in, from it's current position
    # dest    = if True, will copy all objects until obj @ idx_str found OR EOF
    # fp_set  = if True, will seek such that next read will return obj @ idx_str
    
        index = [int(subidx)-1 for subidx in idx_str.split(',')]
        # Make 0-based int list from 1-based csv string 
        val = None
        try :
            while True :                            # EOFError if not found
                obj = pickle.load(src)
                goto_index.obj_cnt += 1             # increment counter
                if goto_index.obj_cnt == index[0] : # 1st element of index is object number
                    val = obj
                    for subidx in index[1::] :      # Index the object itself
                        val = val[subidx]           # IndexError if illegal index
                    if fp_set : src.seek(-len(pickle.dumps(obj)),os.SEEK_CUR)
                    # Seek back to begining of object in src
                    return val                      # Return value @ index
                elif dest : pickle.dump(obj,dest)   # Copy object to dest
        except (EOFError, IndexError) : raise       # Caller handles these 

    def reset():
        goto_index.obj_cnt = -1

    def random(idx_str,src,dest=None,fp_set=False) :
        goto_index.reset() # Just in case
        src.seek(0)        # Just in case       
        goto_index.sorted(idx_str,src,dest=None,fp_set=False)
        goto_index.reset() # Clear count

除了fetch_elements()之外,问题的其他功能基本相同:

def fetch_elements(f) :
    elements = []
    for idx_str in ('1','2','3') : # Indexes are passed sorted
        elements.append(goto_index.sorted(idx_str,f))
    goto_index.reset()            # Required if using the methods later
    return elements

【讨论】:

以上是关于搜索+反向搜索腌制文件,跳过值的主要内容,如果未能解决你的问题,请参考以下文章

Rx .NET 在某些情况下跳过值更改

如何更改主键自动增量,以便在删除行时不会跳过值[重复]

如果不与其他表共享 ID,如何从 SQL 中跳过值

“更改开”中有两个值的条形图,跳过值为 0 的条形图

MongoCursorException:MongoDB as_array() 中查询中的错误跳过值

使用 rest-client 文件上传在 Yandex 上进行 Ruby 反向图像搜索