Node.js:如何使用循环引用序列化大对象

Posted

技术标签:

【中文标题】Node.js:如何使用循环引用序列化大对象【英文标题】:Node.js: How to serialize a large object with circular references 【发布时间】:2015-12-24 04:24:21 【问题描述】:

我使用 Node.js 并希望将大型 javascript 对象序列化到 HDD。该对象基本上是一个“哈希图”,只包含数据,不包含函数。该对象包含具有循环引用的元素。

这是一个在线应用程序,因此该进程不应阻塞主循环。在我的用例中,非阻塞比速度重要得多(数据是内存中的实时数据,仅在启动时加载,保存用于每 X 分钟和关机/故障时的定时备份)

最好的方法是什么?非常欢迎指向执行我想要的库的指针。

【问题讨论】:

Node.js 文件系统 API 默认是异步的,所以你需要问的问题是“如何使用循环引用序列化对象”? s/文件系统/IO。 NodeJS 中任何类型的 I/O 的默认设置都是异步的;您必须在标准库中的每种情况下明确要求同步执行。 你说的都是对的。问题是关于整个过程(javascript 对象 -> HDD -> 对象),但也许正确的问题确实是 joews 所建议的。 标题已更新。 到底有多大?如果你不能在一个部分中序列化它,你将在可变性方面遇到巨大的问题。 【参考方案1】:

我有一个很好的解决方案,我一直在使用。它的缺点是它的运行时间为 O(n^2),这让我很难过。

代码如下:

// I defined these functions as part of a utility library called "U".
var U = 
  isObj: function(obj, cls) 
    try  return obj.constructor === cls;  catch(e)  return false; ;
  ,
  straighten: function(item) 
    /*
    Un-circularizes data. Works if `item` is a simple Object, an Array, or any inline value (string, int, null, etc).
    */
    var arr = [];
    U.straighten0(item, arr);
    return arr.map(function(item)  return item.calc; );
  ,
  straighten0: function(item, items) 
    /*
    The "meat" of the un-circularization process. Returns the index of `item`
    within the array `items`. If `item` didn't initially exist within
    `items`, it will by the end of this function, therefore this function
    always produces a usable index.
    
    Also, `item` is guaranteed to have no more circular references (it will
    be in a new format) once its index is obtained.
    */
    
    /*
    STEP 1) If `item` is already in `items`, simply return it.
    
    Note that an object's existence can only be confirmed by comparison to
    itself, not an un-circularized version of itself. For this reason an
    `orig` value is kept ahold of to make such comparisons possible. This
    entails that every entry in `items` has both an `orig` value (the
    original object, for comparison) and a `calc` value (the calculated, un
    circularized value).
    */
    for (var i = 0, len = items.length; i < len; i++) // This is O(n^2) :(
      if (items[i].orig === item) return i;
    
    var ind = items.length;
    
    // STEP 2) Depending on the type of `item`, un-circularize it differently
    if (U.isObj(item, Object)) 
      
      /*
      STEP 2.1) `item` is an `Object`. Create an un-circularized version of
      that `Object` - keep all its keys, but replace each value with an index
      that points to that values.
      */
      var obj = ;
      items.push( orig: item, calc: obj ); // Note both `orig` AND `calc`.
      for (var k in item)
        obj[k] = U.straighten0(item[k], items);
        
     else if (U.isObj(item, Array)) 
      
      /*
      STEP 2.2) `item` is an `Array`. Create an un-circularized version of
      that `Array` - replace each of its values with an index that indexes
      the original value.
      */
      var arr = [];
      items.push( orig: item, calc: arr ); // Note both `orig` AND `calc`.
      for (var i = 0; i < item.length; i++)
        arr.push(U.straighten0(item[i], items));
      
     else 
      
      /*
      STEP 2.3) `item` is a simple inline value. We don't need to make any
      modifications to it, as inline values have no references (let alone
      circular references).
      */
      items.push( orig: item, calc: item );
      
    
        
    return ind;
  ,
  unstraighten: function(items) 
    /*
    Re-circularizes un-circularized data! Used for undoing the effects of
    `U.straighten`. This process will use a particular marker (`unbuilt`) to
    show values that haven't yet been calculated. This is better than using
    `null`, because that would break in the case that the literal value is
    `null`.
    */
    var unbuilt =  UNBUILT: true ;
    var initialArr = [];
    // Fill `initialArr` with `unbuilt` references
    for (var i = 0; i < items.length; i++) initialArr.push(unbuilt);
    return U.unstraighten0(items, 0, initialArr, unbuilt);
  ,
  unstraighten0: function(items, ind, built, unbuilt) 
    /*
    The "meat" of the re-circularization process. Returns an Object, Array,
    or inline value. The return value may contain circular references.
    */
    if (built[ind] !== unbuilt) return built[ind];
    
    var item = items[ind];
    var value = null;
    
    /*
    Similar to `straighten`, check the type. Handle Object, Array, and inline
    values separately.
    */
    
    if (U.isObj(item, Object)) 
      
      // value is an ordinary object
      var obj = built[ind] = ;
      for (var k in item)
        obj[k] =  U.unstraighten0(items, item[k], built, unbuilt);
      return obj;
      
     else if (U.isObj(item, Array)) 
      
      // value is an array
      var arr = built[ind] = [];
      for (var i = 0; i < item.length; i++)
        arr.push(U.unstraighten0(items, item[i], built, unbuilt));
      return arr;
      
    
    
    built[ind] = item;
    return item;
  ,
  thingToString: function(thing) 
    /*
    Elegant convenience function to convert any structure (circular or not)
    to a string! Now that this function is available, you can ignore
    `straighten` and `unstraighten`, and the headaches they may cause.
    */
    var st = U.straighten(thing);  
    return JSON.stringify(st);
  ,
  stringToThing: function(string) 
    /*
    Elegant convenience function to reverse the effect of `U.thingToString`. 
    */
    return U.unstraighten(JSON.parse(string));
  
;
 
var circular = 
  val: 'haha',
  val2: [ 'hey', 'ho', 'hee' ],
  doesNullWork: null
;
circular.circle1 = circular;
circular.confusing = 
  circular: circular,
  value: circular.val2
;

console.log('Does JSON.stringify work??');
try 
  var str = JSON.stringify(circular);
  console.log('JSON.stringify works!!');
 catch(err) 
  console.log('JSON.stringify doesn\'t work!');


console.log('');
console.log('Does U.thingToString work??');
try 
  var str = U.thingToString(circular);
  console.log('U.thingToString works!!');
  console.log('Its result looks like this:')
  console.log(str);
  console.log('And here\'s it converted back into an object:');
  var obj = U.stringToThing(str);
  for (var k in obj) 
    console.log('`obj` has key "' + k + '"');
  
  console.log('Did `null` work?');
  if (obj.doesNullWork === null)
    console.log('yes!');
  else
    console.log('nope :(');
 catch(err) 
  console.error(err);
  console.log('U.thingToString doesn\'t work!');

整个想法是通过将每个对象直接放入数组中来序列化一些循环结构。

例如如果你有这样的对象:


    val: 'hello',
    anotherVal: 'hi',
    circular: << a reference to itself >>

那么U.straighten就会产生这个结构:

[
    0: 
        val: 1,
        anotherVal: 2,
        circular: 0 // Note that it's become possible to refer to "self" by index! :D
    ,
    1: 'hello',
    2: 'hi'
]

只是一些额外的说明:

我已经在各种情况下使用这些功能很长时间了! 非常不太可能存在隐藏的错误。

O(n^2) 运行时问题可以通过将每个对象映射到唯一哈希值(可以实现)的能力来解决。 O(n^2) 性质的原因是必须使用线性搜索来查找已经循环化的项目。因为这种线性搜索发生在一个已经线性的过程中,所以运行时间变成了 O(n^2)

这些方法实际上提供了少量的压缩!相同的内联值不会在不同的索引处出现两次。内联值的所有相同实例都将映射到相同的索引。例如:


    hi: 'hihihihihihihihihihihi-very-long-pls-compress',
    ha: 'hihihihihihihihihihihi-very-long-pls-compress'

变成(U.straighten之后):

[
    0: 
        hi: 1,
        ha: 1
    ,
    1: 'hihihihihihihihihihihi-very-long-pls-compress'
]

最后,如果不清楚,使用此代码非常容易!您只需要查看U.thingToStringU.stringToThing。这些函数的用法与JSON.stringifyJSON.parse的用法完全一样。

var circularObj = // Some big circular object you have
var serialized = U.thingToString(circularObj);
var unserialized = U.stringToThing(serialized);

【讨论】:

以上是关于Node.js:如何使用循环引用序列化大对象的主要内容,如果未能解决你的问题,请参考以下文章

如何使用node.js中的循环将表单数据保存为对象和对象数组到mongodb?

《饿了么大前端 Node.js 进阶教程》—Javascript 基础问题—引用传递

从 BinaryFormatter 序列化切换而没有太大变化:大数据和循环引用

如何通过 Node.js 或 NPM 模块组合/序列化此类 JSON 对象?

考虑到 Mongoose Node.js 中的引用,如何删除对象?

EF6:如何避免循环引用?