加权斜率一算法? (从 Python 移植到 R)

Posted

技术标签:

【中文标题】加权斜率一算法? (从 Python 移植到 R)【英文标题】:Weighted slope one algorithm? (porting from Python to R) 【发布时间】:2010-11-04 14:11:58 【问题描述】:

我正在阅读有关Weighted slope one algorithm(以及更多 正式为here (PDF)),它应该从不同用户那里获取项目评分,并给定一个包含至少 1 个评分和 1 个缺失值的用户向量,预测缺失的评分。

我找到了一个Python implementation of the algorithm,但我很难将它移植到R(我更愿意这样做)。下面是我的尝试。有关如何使其工作的任何建议?

提前谢谢各位。

# take a 'training' set, tr.set and a vector with some missing ratings, d
pred=function(tr.set,d) 
    tr.set=rbind(tr.set,d)
    n.items=ncol(tr.set)

    # tally frequencies to use as weights
    freqs=sapply(1:n.items, function(i) 
        unlist(lapply(1:n.items, function(j) 
            sum(!(i==j)&!is.na(tr.set[,i])&!is.na(tr.set[,j])) )) )

    # estimate product-by-product mean differences in ratings
    diffs=array(NA, dim=c(n.items,n.items))
    diffs=sapply(1:n.items, function(i) 
        unlist(lapply(1:n.items, function(j) 
            diffs[j,i]=mean(tr.set[,i]-tr.set[,j],na.rm=T) )) )

    # create an output vector with NAs for all the items the user has already rated
    pred.out=as.numeric(is.na(d))
    pred.out[!is.na(d)]=NA

    a=which(!is.na(pred.out))
    b=which(is.na(pred.out))

    # calculated the weighted slope one estimate
    pred.out[a]=sapply(a, function(i) 
        sum(unlist(lapply(b,function (j) 
            sum((d[j]+diffs[j,i])*freqs[j,i])/rowSums(freqs)[i] ))) )

    names(pred.out)=colnames(tr.set)
    return(pred.out) 
# end function

# test, using example from [3]
alice=c(squid=1.0, octopus=0.2, cuttlefish=0.5, nautilus=NA)
bob=c(squid=1.0, octopus=0.5, cuttlefish=NA, nautilus=0.2)
carole=c(squid=0.2, octopus=1.0, cuttlefish=0.4, nautilus=0.4)
dave=c(squid=NA, octopus=0.4, cuttlefish=0.9, nautilus=0.5)
tr.set2=rbind(alice,bob,carole,dave)
lucy2=c(squid=0.4, octopus=NA, cuttlefish=NA, nautilus=NA)
pred(tr.set2,lucy2)
# not correct
# correct(?): 'nautilus': 0.10, 'octopus': 0.23, 'cuttlefish': 0.25

【问题讨论】:

我尝试将代码格式化为更具可读性,但 R 对我来说并不熟悉。如果风格不好,请见谅。 【参考方案1】:

不久前,我使用相同的参考资料(Bryan O'Sullivan 的 python 代码)编写了 Slope One 的 R 版本。我粘贴下面的代码以防万一。

predict <- function(userprefs, data.freqs, data.diffs) 
    seen <- names(userprefs)

    preds <- sweep(data.diffs[ , seen, drop=FALSE], 2, userprefs, '+') 
    preds <- preds * data.freqs[ , seen]
    preds <- apply(preds, 1, sum)

    freqs <- apply(data.freqs[ , seen, drop=FALSE], 1, sum)

    unseen <- setdiff(names(preds), seen)
    result <- preds[unseen] / freqs[unseen]
    return(result[is.finite(result)])


update <- function(userdata, freqs, diffs) 
    for (ratings in userdata) 
        items <- names(ratings)
        n <- length(ratings)

        ratdiff <- rep(ratings, n) - rep(ratings, rep(n, n))
        diffs[items, items] <- diffs[items, items] + ratdiff

        freqs[items, items] <- freqs[items, items] + 1
    
    diffs <- diffs / freqs
    return(list(freqs=freqs, diffs=diffs))



userdata <- list(alice=c(squid=1.0, cuttlefish=0.5, octopus=0.2),
                 bob=c(squid=1.0, octopus=0.5, nautilus=0.2),
                 carole=c(squid=0.2, octopus=1.0, cuttlefish=0.4, nautilus=0.4),
                 dave=c(cuttlefish=0.9, octopus=0.4, nautilus=0.5))

items <- c('squid', 'cuttlefish', 'nautilus', 'octopus')
n.items <- length(items)
freqs <- diffs <- matrix(0, nrow=n.items, ncol=n.items, dimnames=list(items, items))

result <- update(userdata, freqs, diffs)
print(result$freqs)
print(result$diffs)

userprefs <- c(squid=.4)
predresult <- predict(userprefs, result$freqs, result$diffs)
print(predresult)

【讨论】:

以上是关于加权斜率一算法? (从 Python 移植到 R)的主要内容,如果未能解决你的问题,请参考以下文章

如何将 RMSE、斜率、截距、r^2 添加到 R 图中?

java和python实现一个加权SlopeOne推荐算法

一文速学-时间序列分析算法之加权移动平均法详解+Python代码实现

局部加权回归法是啥

将集合操作从 R 的数据帧移植到数据表:如何识别重复行?

R语言 | 加权基因共表达网络分析(WGCNA)