报告所有可能的列组合
Posted
技术标签:
【中文标题】报告所有可能的列组合【英文标题】:Report all possible combinations of columns 【发布时间】:2018-02-14 07:13:55 【问题描述】:我有一个关于一般组合的问题,但在一个相当复杂的情况下,我还没有找到任何帮助。我正在尝试找到一种方法来报告数据集中所有可能的列组合。
土地变化文献调查的数据报告,并指出每篇文章中报告了哪些直接和潜在的驱动因素。因此,行表示单个文章,而列都表示最接近和潜在的驱动程序。有六种类型的邻近驱动因素和五种类型的潜在驱动因素。对于每篇文章,在该文章中标识的驱动程序的列中放置一个 1,在未标识的驱动程序的列中放置一个 0。表格大致如下:
key | d1 | d2 |...| d6 | i1 |...| i5 |
--------------------------------------
A1 | 1 | 0 |...| 1 | 1 |...| 0 |
A2 | 0 | 1 |...| 0 | 0 |...| 1 |
其中文章 A1 将 d1 和 d6 标识为直接驱动程序,将 i1 标识为间接驱动程序等。
我想做的是找出报告直接驱动因素、间接驱动因素以及直接和间接驱动因素的所有可能组合的文章数量。因此,例如,有多少文章标识了 d1、d2 和 i1;有多少识别 d1、d2 和 i2;等等?我的学生有一个 Excel 文件中的表格,我在想也许 Calc 或 Base 可能具有自动化该过程的功能。有人知道我该怎么做吗?
谢谢!
【问题讨论】:
所以您想识别所有 2^11 个组合并计算每个组合有多少?那是 2048 种不同的组合。 这就是我希望简化流程的原因。这个想法是确定文献中最常出现的驱动因素组合。 使用一点 UDF(或连接二进制数字)将驱动程序组合成一个条件字符串。然后使用数据透视表计算每个组合字符串的数量。 【参考方案1】:我终于放弃了,采用了蛮力的方法。我将表格导出为文本并将其拉入 mysql,然后使用 bash 脚本遍历选项。如果其他人有类似的问题,这里是 bash 脚本:
# Generate array containing factors
faclis1=( d_inf d_com d_inm d_ind d_agr d_bos i_dem i_eco i_tec i_pol i_cul );
#faclis=( "d_inf" "d_com" "d_inm" );
a=0
#echo $faclis[@];
# Remove output file if exists
if [ -e permcounts.txt ];
then
rm permcounts.txt;
fi;
# Cycle through list of factors
for f1 in $faclis1[@];
do
# only proceed if factor not null
if [ $f1 ];
then
# print remaining array just to be sure
echo "factor list is $faclis1[@]";
#echo $faclis[@];
echo "Now on factor $f1";
echo "FACTOR $f1" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
$f1 = 1;" metamorelia >> permcounts.txt;
# create sub array without current factor, 2 factors
faclis2=( $faclis1[@]/$f1/ );
#set sub-counter
b=0
#echo "$faclis2[@]";
# loop through sub array, two factors
for f2 in $faclis2[@];
do
if [ $f2 ] && \
[ "$f1" != "$f2" ];
then
echo "FACTOR $f1 \
AND $f2" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
$f1 = 1 and \
$f2 = 1;" metamorelia >> permcounts.txt;
# next sub-array
faclis3=( $faclis2[@]//$f2 );
c=0
#echo "$faclis3[@]";
# loop through sub-array
for f3 in $faclis3[@];
do
if [ $f3 ] && \
[ "$f1" != "$f3" ] && \
[ "$f2" != "$f3" ];
then
echo "FACTOR $f1 \
AND $f2 \
AND $f3" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
$f1 = 1 and \
$f2 = 1 and \
$f3 = 1;" metamorelia >> permcounts.txt;
# next sub-array
faclis4=( $faclis3[@]//$f3 );
d=0
#echo "$faclis4[@]";
# loop through sub-array
for f4 in $faclis4[@];
do
if [ $f4 ] && \
[ "$f1" != "$f4" ] && \
[ "$f2" != "$f4" ] && \
[ "$f3" != "$f4" ];
then
echo "FACTOR $f1 \
AND $f2 \
AND $f3 \
AND $f4" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
$f1 = 1 and \
$f2 = 1 and \
$f3 = 1 and \
$f4 = 1;" metamorelia >> permcounts.txt;
# next sub-array
faclis5=( $faclis4[@]//$f4 );
e=0
#echo "$faclis5[@]";
# loop through sub-array
for f5 in $faclis5[@];
do
if [ $f5 ] && \
[ "$f1" != "$f5" ] && \
[ "$f2" != "$f5" ] && \
[ "$f3" != "$f5" ] && \
[ "$f4" != "$f5" ];
then
echo "FACTOR $f1 \
AND $f2 \
AND $f3 \
AND $f4 \
AND $f5" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
$f1 = 1 and \
$f2 = 1 and \
$f3 = 1 and \
$f4 = 1 and \
$f5 = 1;" metamorelia >> permcounts.txt;
# next sub-array
faclis6=( $faclis5[@]//$f5 );
f=0
#echo "$faclis6[@]";
# loop through sub-array
for f6 in $faclis6[@];
do
if [ $f6 ] && \
[ "$f1" != "$f6" ] && \
[ "$f2" != "$f6" ] && \
[ "$f3" != "$f6" ] && \
[ "$f4" != "$f6" ] && \
[ "$f5" != "$f6" ];
then
echo "FACTOR $f1 \
AND $f2 \
AND $f3 \
AND $f4 \
AND $f5 \
AND $f6" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
$f1 = 1 and \
$f2 = 1 and \
$f3 = 1 and \
$f4 = 1 and \
$f5 = 1 and \
$f6 = 1;" metamorelia >> permcounts.txt;
# next sub-array
faclis7=( $faclis6[@]//$f6 );
g=0
#echo "$faclis7[@]";
# loop through sub-array
for f7 in $faclis7[@];
do
if [ $f7 ] && \
[ "$f1" != "$f7" ] && \
[ "$f2" != "$f7" ] && \
[ "$f3" != "$f7" ] && \
[ "$f4" != "$f7" ] && \
[ "$f5" != "$f7" ] && \
[ "$f6" != "$f7" ];
then
echo "FACTOR $f1 \
AND $f2 \
AND $f3 \
AND $f4 \
AND $f5 \
AND $f6 \
AND $f7" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
$f1 = 1 and \
$f2 = 1 and \
$f3 = 1 and \
$f4 = 1 and \
$f5 = 1 and \
$f6 = 1 and \
$f7 = 1;" metamorelia >> permcounts.txt;
# next sub-array
faclis8=( $faclis7[@]//$f7 );
h=0
#echo "$faclis8[@]";
# loop through sub-array
for f8 in $faclis8[@];
do
if [ $f8 ] && \
[ "$f1" != "$f8" ] && \
[ "$f2" != "$f8" ] && \
[ "$f3" != "$f8" ] && \
[ "$f4" != "$f8" ] && \
[ "$f5" != "$f8" ] && \
[ "$f6" != "$f8" ] && \
[ "$f7" != "$f8" ];
then
echo "FACTOR $f1 \
AND $f2 \
AND $f3 \
AND $f4 \
AND $f5 \
AND $f6 \
AND $f7 \
AND $f8" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
$f1 = 1 and \
$f2 = 1 and \
$f3 = 1 and \
$f4 = 1 and \
$f5 = 1 and \
$f6 = 1 and \
$f7 = 1 and \
$f8 = 1;" metamorelia >> permcounts.txt;
# next sub-array
faclis9=( $faclis8[@]//$f8 );
i=0
#echo "$faclis9[@]";
# loop through sub-array
for f9 in $faclis9[@];
do
if [ $f9 ] && \
[ "$f1" != "$f9" ] && \
[ "$f2" != "$f9" ] && \
[ "$f3" != "$f9" ] && \
[ "$f4" != "$f9" ] && \
[ "$f5" != "$f9" ] && \
[ "$f6" != "$f9" ] && \
[ "$f7" != "$f9" ] && \
[ "$f8" != "$f9" ];
then
echo "FACTOR $f1 \
AND $f2 \
AND $f3 \
AND $f4 \
AND $f5 \
AND $f6 \
AND $f7 \
AND $f8 \
AND $f9" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
$f1 = 1 and \
$f2 = 1 and \
$f3 = 1 and \
$f4 = 1 and \
$f5 = 1 and \
$f6 = 1 and \
$f7 = 1 and \
$f8 = 1 and \
$f9 = 1;" metamorelia >> permcounts.txt;
# next sub-array
faclis10=( $faclis9[@]//$f9 );
j=0
#echo "$faclis10[@]";
# loop through sub-array
for f10 in $faclis10[@];
do
if [ $f10 ] && \
[ "$f1" != "$f10" ] && \
[ "$f2" != "$f10" ] && \
[ "$f3" != "$f10" ] && \
[ "$f4" != "$f10" ] && \
[ "$f5" != "$f10" ] && \
[ "$f6" != "$f10" ] && \
[ "$f7" != "$f10" ] && \
[ "$f8" != "$f10" ] && \
[ "$f9" != "$f10" ];
then
echo "FACTOR $f1 \
AND $f2 \
AND $f3 \
AND $f4 \
AND $f5 \
AND $f6 \
AND $f7 \
AND $f8 \
AND $f9 \
AND $f10" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
$f1 = 1 and \
$f2 = 1 and \
$f3 = 1 and \
$f4 = 1 and \
$f5 = 1 and \
$f6 = 1 and \
$f7 = 1 and \
$f8 = 1 and \
$f9 = 1 and \
$f10 = 1;" metamorelia >> permcounts.txt;
# next sub-array
faclis11=( $faclis10[@]//$f10 );
k=0
#echo "$faclis11[@]";
# loop through sub-array
for f11 in $faclis11[@];
do
if [ $f11 ] && \
[ "$f1" != "$f11" ] && \
[ "$f2" != "$f11" ] && \
[ "$f3" != "$f11" ] && \
[ "$f4" != "$f11" ] && \
[ "$f5" != "$f11" ] && \
[ "$f6" != "$f11" ] && \
[ "$f7" != "$f11" ] && \
[ "$f8" != "$f11" ] && \
[ "$f9" != "$f11" ] && \
[ "$f10" != "$f11" ];
then
echo "FACTOR $f1 \
AND $f2 \
AND $f3 \
AND $f4 \
AND $f5 \
AND $f6 \
AND $f7 \
AND $f8 \
AND $f9 \
AND $f10 \
AND $f11" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
$f1 = 1 and \
$f2 = 1 and \
$f3 = 1 and \
$f4 = 1 and \
$f5 = 1 and \
$f6 = 1 and \
$f7 = 1 and \
$f8 = 1 and \
$f9 = 1 and \
$f10 = 1 and \
$f11 = 1;" metamorelia >> permcounts.txt;
unset faclis11[k];
k=$(($k + 1));
fi;
done;
unset faclis10[j];
j=$(($j + 1));
fi;
done;
unset faclis9[i];
i=$(($i + 1));
fi;
done;
unset faclis8[h];
h=$(($h + 1));
fi;
done;
unset faclis7[g];
g=$(($g + 1));
fi;
done;
unset faclis6[f];
f=$(($f + 1));
fi;
done;
unset faclis5[e];
e=$(($e + 1));
fi;
done;
unset faclis4[d];
d=$(($d + 1));
fi;
done;
unset faclis3[c];
c=$(($c + 1));
fi;
done;
# Remove analyzed factors from vector
unset faclis2[b];
b=$(($b + 1));
fi;
done;
# remove nth item from array (progressively remove one item)
unset faclis1[a];
# increment n for next round
a=$(($a + 1));
echo $n;
fi;
done;
这个脚本效率有点低,我认为我包含了很多不必要的操作,但它完成了工作。 (我认为确实如此。我的学生必须浏览输出文件以确保所有内容都在那里。)
【讨论】:
以上是关于报告所有可能的列组合的主要内容,如果未能解决你的问题,请参考以下文章