MATLAB中索引的累积和
Posted
技术标签:
【中文标题】MATLAB中索引的累积和【英文标题】:Cumulative sum over index in MATLAB 【发布时间】:2014-11-20 14:25:25 【问题描述】:考虑以下矩阵,其中第一列是索引,第二列 - 是值,第三列 - 是索引更改后重置的累积和:
1 1 1 % 1
1 2 3 % 1+2
1 3 6 % 3+3
2 4 4 % 4
2 5 9 % 4+5
3 6 6 % 6
3 7 13 % 6+7
3 8 21 % 13+8
3 9 30 % 21+9
4 10 10 % 10
4 11 21 % 10+11
如何获得第三列避免循环?
我尝试以下方法:
A = [1 1;... % Input
1 2;...
1 3;...
2 4;...
2 5;...
3 6;...
3 7;...
3 8;...
3 9;...
4 10;...
4 11];
CS = cumsum(A(:,2)); % cumulative sum over the second column
I = [diff(data(:,1));0]; % indicate the row before the index (the first column)
% changes
offset=CS.*I; % extract the last value of cumulative sum for a given
% index
offset(end)=[]; offset=[0; offset] %roll offset 1 step forward
[A, CS, offset]
结果是:
ans =
1 1 1 0
1 2 3 0
1 3 6 0
2 4 10 6
2 5 15 0
3 6 21 15
3 7 28 0
3 8 36 0
3 9 45 0
4 10 55 45
4 11 66 0
如果有一种简单的方法可以将上面矩阵的第四列转换为
O =
0
0
0
6
6
15
15
15
15
45
45
因为 CS-O 给出了想要的输出。
如果有任何建议,我将不胜感激。
【问题讨论】:
有趣的问题,它显示了努力:+1 【参考方案1】:您的策略实际上可能是我所做的。您的最后一步可以通过这种方式实现:(但请记住,您的方法假设连续索引。您当然可以通过 offset=[0; CS(1:end-1).*(diff(A(:,1))~=0)];
更改此设置,但仍需要排序索引。)
I = find(offset);
idxLastI = cumsum(offset~=0);
hasLastI = idxLastI~=0; %// For the zeros at the beginning
%// Combine the above to the output
O = zeros(size(offset));
O(hasLastI) = offset(I(idxLastI(hasLastI)));
out = CS-O;
这应该类似于 Divakar 的 cumsum
-diff
方法。
【讨论】:
【参考方案2】:将accumarray
与自定义函数一起使用:
result = accumarray(A(:,1), A(:,2), [], @(x) cumsum(x));
result = vertcat(result:);
无论索引更改是否以 1 为步长(如您的示例),这都有效。
以下方法更快,因为它避免了单元格。在his answer 中查看@Divakar 的出色基准测试(并查看他的解决方案,这是最快的):
如果索引更改始终对应于增加 1(如您的示例中所示):
last = find(diff(A(:,1)))+1; %// index of last occurrence of each index value
result = A(:,2); %// this will be cumsum'd, after correcting for partial sums
correction = accumarray(A(:,1), A(:,2)); %// correction to be applied for cumsum
result(last) = result(last)-correction(1:end-1); %// apply correction
result = cumsum(result); %// compute result
如果索引值的变化幅度超过 1(即可能存在“跳过”值):这需要稍作修改以稍微减慢速度。
last = find(diff(A(:,1)))+1; %// index of last occurrence of each index value
result = A(:,2); %// this will be cumsum'd, after correcting for partial sums
correction = accumarray(A(:,1), A(:,2), [], @sum, NaN); %// correction
correction = correction(~isnan(correction)); %// remove unused values
result(last) = result(last)-correction(1:end-1); %// apply correction
result = cumsum(result);
【讨论】:
我试过accumarray
,但后来不知道我应该在它周围使用花括号。 +1
啊,我明白了,使用带有匿名函数的单元格数组?肯定很整洁!
@Divakar 是的,accumarray
使用的这个符号有点奇怪。人们会期待像(...'uniformoutput','0)
我要问你一件事,因为我没有那么多地使用匿名函数,我想你会知道更多。这些函数可以在很多地方使用,比如accumarray
和bsxfun
,但我的猜测/直觉/直觉认为 MATLAB 在使用匿名函数时不是以性能为导向的。你怎么看待这件事?或者你认为它只取决于那个特定的安。 Func 不能做出这样的概括性陈述吗?
@LuisMendo 你说的都说得通!感谢您提出您的想法!同样是的,我记得在某个地方也看到isempty
比@isempty
快。无论如何,MATLAB 函数调用都很昂贵。【参考方案3】:
基于cumsum
和diff
的方法可能对性能有好处 -
%// cumsum values for the entire column-2
cumsum_vals = cumsum(A(:,2));
%// diff for column-1
diffA1 = diff(A(:,1));
%// Cumsum after each index
cumsum_after_each_idx = cumsum_vals([diffA1 ;0]~=0);
%// Get cumsum for each "group" and place each of its elements at the right place
%// to be subtracted from cumsum_vals for getting the final output
diffA1(diffA1~=0) = [cumsum_after_each_idx(1) ; diff(cumsum_after_each_idx)];
out = cumsum_vals-[0;cumsum(diffA1)];
基准测试
如果您关心性能,这里有一些基于accumarray
的其他解决方案的基准测试。
基准代码(为了紧凑而删除了 cmets)-
A = .. Same as in the question
num_runs = 100000; %// number of runs
disp('---------------------- With cumsum and diff')
tic
for k1=1:num_runs
cumsum_vals = cumsum(A(:,2));
diffA1 = diff(A(:,1));
cumsum_after_each_idx = cumsum_vals([diffA1 ;0]~=0);
diffA1(diffA1~=0) = [cumsum_after_each_idx(1) ; diff(cumsum_after_each_idx)];
out = cumsum_vals-[0;cumsum(diffA1)];
end
toc,clear cumsum_vals diffA1 cumsum_after_each_idx out
disp('---------------------- With accumarray - version 1')
tic
for k1=1:num_runs
result = accumarray(A(:,1), A(:,2), [], @(x) cumsum(x));
result = vertcat(result:);
end
toc, clear result
disp('--- With accumarray - version 2 (assuming consecutive indices only)')
tic
for k1=1:num_runs
last = find(diff(A(:,1)))+1; %// index of last occurrence of each index value
result = A(:,2); %// this will be cumsum'd, after correcting for partial sums
correction = accumarray(A(:,1), A(:,2)); %// correction to be applied for cumsum
result(last) = result(last)-correction(1:end-1); %// apply correction
result = cumsum(result); %// compute result
end
toc, clear last result correction
disp('--- With accumarray - version 2 ( general case)')
tic
for k1=1:num_runs
last = find(diff(A(:,1)))+1; %// index of last occurrence of each index value
result = A(:,2); %// this will be cumsum'd, after correcting for partial sums
correction = accumarray(A(:,1), A(:,2), [], @sum, NaN); %// correction
correction = correction(~isnan(correction)); %// remove unused values
result(last) = result(last)-correction(1:end-1); %// apply correction
result = cumsum(result);
end
toc
结果 -
---------------------- With cumsum and diff
Elapsed time is 1.688460 seconds.
---------------------- With accumarray - version 1
Elapsed time is 28.630823 seconds.
--- With accumarray - version 2 (assuming consecutive indices only)
Elapsed time is 2.416905 seconds.
--- With accumarray - version 2 ( general case)
Elapsed time is 4.839310 seconds.
【讨论】:
我添加了一个不同的基于accumarray
的方法,没有单元格。你能把它包括在你的测试中吗?
@LuisMendo 与前一个相比有了很大的改进!再次证明细胞对性能没有好处!
根据您的良好观察,我已将方法 2 一分为二。对不起,一团糟!
@LuisMendo 没关系。如您所见,仍然是一个相当不错的改进!
再次感谢,很抱歉让您在基准测试方面做更多工作!以上是关于MATLAB中索引的累积和的主要内容,如果未能解决你的问题,请参考以下文章