当列表值与Pyspark数据帧中的列值的子字符串匹配时，填充新列

Question

我在Pyspark有一个数据框，如下所示

df.show()

+---+----------------------+
| id|                   con|
+---+----------------------+
|  3|           mac,mac pro|
|  1|        iphone5,iphone|
|  1| android,android phone|
|  1|    windows,windows pc|
|  1| spy camera,spy camera|
|  2|               camera,|
|  3|             cctv,cctv|
|  2|   apple iphone,iphone|
|  3|           ,spy camera|
+---+----------------------+

我想基于某些lists创建新列。列表如下

phone_list = ['iphone', 'android', 'nokia']
pc_list = ['windows', 'mac']

Condition:

if a element in a list matches a string/substring in a column then flag the column to the value of that particular list

基本上我想要的是在phone_list我有元素iphone所以应该匹配id 1其中con是iphone5, iphone和旗帜为phones等等。

Expected result

+---+----------------------+------+----+
| id|                   con|   cat| abc|
+---+----------------------+------+----+
|  3|           mac,mac pro|  null|  pc|
|  1|        iphone5,iphone|phones|null|
|  1| android,android phone|phones|null|
|  1|    windows,windows pc|  null|  pc|
|  1| spy camera,spy camera|  null|null|
|  2|               camera,|  null|null|
|  3|             cctv,cctv|  null|null|
|  2|   apple iphone,iphone|phones|null|
|  3|           ,spy camera|  null|null|
+---+----------------------+------+----+

我在下面做了。

df1 = df.withColumn('cat', F.when(df.con.isin(phone_list), 'phones')).withColumn('abc', F.when(df.con.isin(pc_list), 'pc'))

output

df1.show()

+---+----------------------+----+----+
| id|                   con| cat| abc|
+---+----------------------+----+----+
|  3|           mac,mac pro|null|null|
|  1|        iphone5,iphone|null|null|
|  1| android,android phone|null|null|
|  1|    windows,windows pc|null|null|
|  1| spy camera,spy camera|null|null|
|  2|               camera,|null|null|
|  3|             cctv,cctv|null|null|
|  2|   apple iphone,iphone|null|null|
|  3|           ,spy camera|null|null|
+---+----------------------+----+----+

我怎样才能以正确的方式进行这种比较？

Answer 1

另一答案