进行 AppEngine 模型内存缓存的最佳方法是啥？

Posted 2023-02-24

技术标签:

【中文标题】进行 AppEngine 模型内存缓存的最佳方法是啥？【英文标题】：What is the best way to do AppEngine Model Memcaching?进行 AppEngine 模型内存缓存的最佳方法是什么？ 【发布时间】：2011-01-18 23:19:36 【问题描述】：

目前我的应用程序在 memcache 中缓存模型，如下所示：

memcache.set("somekey", aModel)

但 Nicks 在 http://blog.notdot.net/2009/9/Efficient-model-memcaching 的帖子表明，首先将其转换为 protobuffers 会更有效。但在运行了一些测试后，我发现它的尺寸确实更小，但实际上更慢 (~10%)。

其他人有同样的经历还是我做错了什么？

测试结果：http://1.latest.sofatest.appspot.com/?times=1000

import pickle
import time
import uuid

from google.appengine.ext import webapp
from google.appengine.ext import db
from google.appengine.ext.webapp import util
from google.appengine.datastore import entity_pb
from google.appengine.api import memcache

class Person(db.Model):
 name = db.StringProperty()

times = 10000

class MainHandler(webapp.RequestHandler):

 def get(self):

  self.response.headers['Content-Type'] = 'text/plain'

  m = Person(name='Koen Bok')

  t1 = time.time()

  for i in xrange(int(self.request.get('times', 1))):
   key = uuid.uuid4().hex
   memcache.set(key, m)
   r = memcache.get(key)

  self.response.out.write('Pickle took: %.2f' % (time.time() - t1))


  t1 = time.time()

  for i in xrange(int(self.request.get('times', 1))):
   key = uuid.uuid4().hex
   memcache.set(key, db.model_to_protobuf(m).Encode())
   r = db.model_from_protobuf(entity_pb.EntityProto(memcache.get(key)))


  self.response.out.write('Proto took: %.2f' % (time.time() - t1))


def main():
 application = webapp.WSGIApplication([('/', MainHandler)], debug=True)
 util.run_wsgi_app(application)


if __name__ == '__main__':
 main()

【问题讨论】：

我也只是尝试了非常大且复杂的模型，但结果大致相同。 GAE 上可能有docs.python.org/library/timeit.html？这应该会显示更准确的结果，但仍然 - 在阅读了您链接到的博客文章后，我预计 protobuffers 和 pickle 的性能之间存在数量级差异 - 无论如何这应该被 time.time() 捕获.. 我正在使用 java appengine，所以我懒得测试这个理论——pickle() 是否在某处缓存了幕后的结果，而 to_protobuf 不是？根据这篇文章，我不确定我是否会期望速度会提高一个数量级，因为即使使用 protobuf 版本，仍然会调用 pickle。不过，使用的空间肯定会小得多。我做了更多的测试，memcache 只腌制非字符串，所以存储单个模型根本不会腌制，模型列表将被腌制为带有字符串的列表。这无疑是一个令人惊讶的结果。我会说这是一个 dev_appserver 现象，但你在 apppot 上看到了相同的结果。让我感到困惑 - 这当然不是过去的情况。 【参考方案1】：

无论是否使用 protobuf，Memcache 调用仍然会腌制对象。 Pickle 使用 protobuf 对象更快，因为它有一个非常简单的模型

普通 pickle 对象比 protobuf+pickle 对象大，因此它们在 Memcache 上节省时间，但在进行 protobuf 转换时需要更多处理器时间

因此，一般来说，这两种方法的效果都差不多……但是

您应该使用 protobuf 的原因是它可以处理模型版本之间的更改，而 Pickle 会出错。这个问题总有一天会咬你的，所以最好早点解决

【讨论】：

虽然提出了一些好的观点，但并非所有陈述都是正确的。如果您查看代码，memcache api 只会腌制非字符串。因此，带有 protobuffed 模型的列表将被腌制，单个模型则不会。确实 protobufs 输出更简单更小，我的测试表明它的 CPU 密集度并不低 - 因此是原始问题。模型版本点是有效的，但对我来说不是太重要，因为无论如何您都应该有一种处理无效缓存结果的方法，而且我认为它不会经常发生。【参考方案2】：

pickle 和 protobuf 在 App Engine 中都很慢，因为它们是用纯 Python 实现的。我发现使用 str.join 等方法编写自己的简单序列化代码往往会更快，因为大部分工作都是在 C 中完成的。但这仅适用于简单数据类型。

【讨论】：

您是否也为模型对象这样做过？我很想看看你的实现。我曾经这样做过，但是python2.7给了我们cpickle，现在更快了。【参考方案3】：

一种更快的方法是将您的模型转换为字典并使用本机 eval / repr 函数作为您的（反）序列化程序——当然要小心，与邪恶的 eval 一样，但它应该是考虑到没有外部步骤，这里很安全。

下面是一个 Fake_entity 类的例子。您首先通过fake = Fake_entity(entity) 创建您的字典，然后您可以通过memcache.set(key, fake.serialize()) 简单地存储您的数据。 serialize() 是对 repr 的本机字典方法的简单调用，如果需要，还可以添加一些内容（例如，在字符串的开头添加标识符）。

要取回它，只需使用fake = Fake_entity(memcache.get(key))。 Fake_entity 对象是一个简单的字典，其键也可以作为属性访问。您可以正常访问实体属性，除了 referenceProperties 提供键而不是获取对象（这实际上非常有用）。您还可以使用 fake.get() 获取（）实际实体，或者更有趣的是，更改它然后使用 fake.put() 保存。

它不适用于列表（如果您从查询中获取多个实体），但可以使用诸如“### FAKE MODEL ENTITY ###”之类的标识符作为分隔符通过连接/拆分函数轻松调整。仅与 db.Model 一起使用，需要对 Expando 进行小幅调整。

class Fake_entity(dict):
    def __init__(self, record):
        # simple case: a string, we eval it to rebuild our fake entity
        if isinstance(record, basestring):
            import datetime # <----- put all relevant eval imports here
            from google.appengine.api import datastore_types
            self.update( eval(record) ) # careful with external sources, eval is evil
            return None

        # serious case: we build the instance from the actual entity
        for prop_name, prop_ref in record.__class__.properties().items():
            self[prop_name] = prop_ref.get_value_for_datastore(record) # to avoid fetching entities
        self['_cls'] = record.__class__.__module__ + '.' + record.__class__.__name__
        try:
            self['key'] = str(record.key())
        except Exception: # the key may not exist if the entity has not been stored
            pass

    def __getattr__(self, k):
        return self[k]

    def __setattr__(self, k, v):
        self[k] = v

    def key(self):
        from google.appengine.ext import db
        return db.Key(self['key'])

    def get(self):
        from google.appengine.ext import db
        return db.get(self['key'])

    def put(self):
        _cls = self.pop('_cls') # gets and removes the class name form the passed arguments
        # import xxxxxxx ---> put your model imports here if necessary
        Cls = eval(_cls) # make sure that your models declarations are in the scope here
        real_entity = Cls(**self) # creates the entity
        real_entity.put() # self explanatory
        self['_cls'] = _cls # puts back the class name afterwards
        return real_entity

    def serialize(self):
        return '### FAKE MODEL ENTITY ###\n' + repr(self)
        # or simply repr, but I use the initial identifier to test and eval directly when getting from memcache

我欢迎对此进行速度测试，我认为这比其他方法要快得多。另外，如果您的模型在此期间发生了某种变化，您也不会有任何风险。

下面是序列化假实体的示例。特别查看日期时间（创建）以及参考属性（子域）：

### 假模型实体### 'status'：u'admin'，'session_expiry'：无，'first_name'：u'Louis'，'last_name'：u'Le Sieur'，'modified_by'：无，'password_hash'：u'a9993e364706816aba3e25717000000000000000'， '语言': u'fr', 'created': datetime.datetime(2010, 7, 18, 21, 50, 11, 750000), 'modified': 无, 'created_by': 无, 'email': u' chou@glou.bou'，'key'：'agdqZXJlZ2xlcgwLEgVMb2dpbhjmAQw'，'session_ref'：无，'_cls'：'models.Login'，'groups'：[]，'email___password_hash'：u'chou@glou.bou+ a9993e364706816aba3e25717000000000000000', 'subdomain': datastore_types.Key.from_path(u'Subdomain', 229L, _app=u'jeregle'), 'permitted': [], 'permissions': []

就我个人而言，我还使用静态变量（比 memcache 更快）在短期内缓存我的实体，并在服务器发生更改或由于某种原因（实际上经常发生）刷新其内存时获取数据存储。

【讨论】：

以上是关于进行 AppEngine 模型内存缓存的最佳方法是啥？的主要内容，如果未能解决你的问题，请参考以下文章