python Pyspark:双爆炸发生器

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python Pyspark:双爆炸发生器相关的知识,希望对你有一定的参考价值。

def dualExplode(row):
    """Explode weights and category_ids list elements to separate rows.
    Args:
        row: Row
    Yield:
        Row(**newDict)
    """
    rowDict = row.asDict()
    xList = rowDict.pop('x')
    yList = rowDict.pop('y')
    for x,y in zip(xList, yList):
        newDict = dict(rowDict)
        newDict['category_ids'] = x
        newDict['weights'] = y
        yield Row(**newDict)
 
# Example usage
exploded_df = sqlContext.createDataFrame(df.rdd.flatMap(dualExplode))

以上是关于python Pyspark:双爆炸发生器的主要内容,如果未能解决你的问题,请参考以下文章