Python Pandas Regex

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python Pandas Regex相关的知识,希望对你有一定的参考价值。

我有一个熊猫数据框,如下例所示。列0具有许多html标记,在尊重行顺序的同时,我需要从中提取所有URL并将它们添加为此DataFrame中的列。

在这种情况下,第2列第0行将具有:“https://sco...”。实际上,此列最多可以包含10个URL,应将其添加到数据框的各个列中。我曾尝试使用Beautiful Soup,但无法使它与像这样的Dataframe一起正常工作。

我尝试使用下面的正则表达式提取所有这些URL,但无法将其插入数据框。

postsOnlyURL = re.findall('"(http.*?)"',all_text,re.IGNORECASE|re.DOTALL)


                                                    0                                                  1
0   src="https://sco ...                               publicado a 23/10/2019Ident...
1   Ativo</div></div><div class="_7jwu">Começou a ...  AtivoComeçou a ser publicado a 23/10/2019Ident...
2   Ativo</div></div><div class="_7jwu">Começou a ...  AtivoComeçou a ser publicado a 23/10/2019Ident...

有没有办法使这项工作有效?

答案
import pandas as pd import re from bs4 import BeautifulSoup # Create sample df a = ["""Ativo</div></div><div class="_7jwu">Começou a ser publicado a <span>23/10/2019</span></div><div class="_8jox"><div aria-describedby="js_m" aria-haspopup="true" class="_4rhp" role="tooltip" tabindex="0">Identificação: 411753089755204</div></div></div><div class="_8k-_"><div class="_3qn7 _61-0 _2fyi _3qng" style="max-width: 120px;"><span data-hover="tooltip"><i class="_3-8_ img sp_-Fn2d835eMD sx_39e484" alt=""></i></span><span data-hover="tooltip"><i class="_3-8_ img sp_-Fn2d835eMD sx_f3b669" alt=""></i></span><span data-hover="tooltip"><i class="img sp_-Fn2d835eMD sx_e31062" alt=""></i></span></div></div></div><div class="_7jwv"><div style="display: inline-block; width: auto;"><button aria-pressed="false" data-testid="SUIAbstractMenu/button" type="button" aria-disabled="false" class="_271k _271l _1o4e _271m _1qjd _7tvm _7tv2 _7tv4" style="width: auto; letter-spacing: normal; color: rgb(68, 73, 80); font-size: 12px; font-weight: bold; font-family: Arial, sans-serif; line-height: 26px; text-align: center; background-color: transparent; border-color: transparent; height: 28px; padding-left: 7px; padding-right: 7px; border-radius: 2px;"><div class="_43rl"><i aria-hidden="true" class="_271o img sp_6UxJZoFesmZ sx_e4448e" alt=""></i><span class="accessible_elem">Abrir menu pendente</span></div></button></div></div></div><div class="_7jwy"><div class="_7jyg _7jyh"><div class="_7k71"><div class="_8nsi _8nqp"><div class="_3qn7 _61-0 _2fyi _3qng" style="width: 100%;"><img alt="imaginBank" class="_8nqq img" src="https://scontent.flis8-1.fna.fbcdn.net/v/t1.6435-9/56757490_843111089374606_3751796641934344192_n.png?_nc_cat=105&amp;_nc_oc=AQn_sfVuUVpGuXh9Xew56gOSFzdktA5s1xfEWBMkYzLNQ6m8zdOZve6xFIzu7IOEJL0&amp;_nc_ht=scontent.flis8-1.fna&amp;oh=ce8b3a3feea1162bc0874c260ab2b308&amp;oe=5EC4B78C"><div class="_3qn7 _61-0 _2fyh _3qnf" style="width: 100%;"><div class="_8nqr _3qn7 _61-3 _2fyi _3qng"><span style="font-family: Arial, sans-serif; font-size: 12px; line-height: 16px; letter-spacing: normal; font-weight: bold; overflow-wrap: normal; text-align: left; color: rgb(28, 30, 33);"><a data-hovercard="/ajax/hovercard/hovercard.php?id=197438223941899" target="_blank" href="https://www.facebook.com/imaginBank/">imaginBank</a></span></div><div class="_8nrv"><div class="_4ik4 _4ik5" style="-webkit-line-clamp: 2;"><div><span class="_8jos">Patrocinado</span></div></div></div></div></div></div></div><div class="_7jyr"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 16px; max-height: 112px; -webkit-line-clamp: 7;"><div>Parking, peajes, impuestos, gasolina... Al final termina siendo una pasta. ¿Te has planteado recortar estos gastos? No, no hablamos de abandonar la conducción. Hablamos de enchufarnos al futuro. Conoce todos los beneficios de tener un coche un eléctrico y lo fácil que es conseguirlo con un Préstamo Auto de imaginBank. #Enchúfate<br> <br> *La concesión de la operación está sujeta al análisis de la solvencia y de la capacidad de devolución del solicitante, en función de las políticas de riesgo de la entidad. imaginBank de CaixaBank</div></div></div></div><div maxchangeamount="1" currentselectedindex="0" class="_23n-"><div class="_4u-c"><div index="0" class="_a28"><div class="_a2e"><div class="_2zgz"><div class="_7jy-"><div class="_7jyr"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 16px; max-height: 112px; -webkit-line-clamp: 7;"> </div></div></div><a target="_blank" class="_231w _231z _4yee" href="http://play.google.com/store/apps/details?id=com.imaginbank.app" style="color: rgb(33, 111, 219);"><img class="_7jys _7jyt img" src="https://scontent.flis8-2.fna.fbcdn.net/v/t39.16868-6/s600x600/68872437_623249314832062_3424786237267902464_n.jpg?_nc_cat=107&amp;_nc_oc=AQnCOg6lOVmyYNmKW9TeJMIQqFnp__ENhA6b0IF9n6OOvKhuFdfBFFn5A-i6mv9Qs9A&amp;_nc_ht=scontent.flis8-2.fna&amp;_nc_tp=7&amp;oh=4c316693ef6d41f08a19c047bbef6ff5&amp;oe=5EC0B4DA" alt=""><div class="_8jgz _8jg_"><div class="_8jh1"><div class="_8jh2"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;">Préstamo desde 3.000€ hasta 30.000€. Solicita el tuyo desde la app</div></div></div><div class="_8jh3"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh4"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh5"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div></div><div class="_8jh0"><button type="button" aria-disabled="false" class="_271k _271m _1qjd _3-9a" style="max-width: 80px; letter-spacing: normal; color: rgb(68, 73, 80); font-size: 11px; font-weight: normal; font-family: Arial, sans-serif; line-height: 16px; text-align: center; background-color: rgb(245, 246, 247); border-color: rgb(218, 221, 225); height: 18px; padding-left: 4px; padding-right: 4px; background-clip: padding-box;"><div class="_43rl"><div data-hover="tooltip" data-tooltip-display="overflow" class="_43rm">Use App</div></div></button></div></div></a></div></div><div class="_2zgz"><div class="_7jy-"><div class="_7jyr"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 16px; max-height: 112px; -webkit-line-clamp: 7;"> </div></div></div><a target="_blank" class="_231w _231z _4yee" href="http://play.google.com/store/apps/details?id=com.imaginbank.app" style="color: rgb(33, 111, 219);"><img class="_7jys _7jyt img" src="https://scontent.flis8-1.fna.fbcdn.net/v/t39.16868-6/s600x600/69107399_623249321498728_5143385648069083136_n.jpg?_nc_cat=110&amp;_nc_oc=AQlHwBVTCf9XcxXVP4VH0YnbwivUgg1PXA8uYOxShCkbr9woauh1CiNiQTJbguBYmbc&amp;_nc_ht=scontent.flis8-1.fna&amp;_nc_tp=7&amp;oh=e52db30189225e3525fbef0cec013c31&amp;oe=5EFC734A" alt=""><div class="_8jgz _8jg_"><div class="_8jh1"><div class="_8jh2"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;">Préstamo desde 3.000€ hasta 30.000€. Solicita el tuyo desde la app</div></div></div><div class="_8jh3"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 14px; max-height: 28px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh4"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div><div class="_8jh5"><div tabindex="0" role="button"><div class="_4ik4 _4ik5" style="line-height: 12px; max-height: 24px; -webkit-line-clamp: 2;"></div></div></div></div><div class="_8jh0"><button type="button" aria-disabled="false" class="_271k _271m _1

以上是关于Python Pandas Regex的主要内容,如果未能解决你的问题,请参考以下文章

正则表达式 Python - 用单个空格替换换行符、制表符、空格的任意组合[重复]

python pandas怎么用

python(pandas模块)?

python(pandas模块)?

python怎么打开pandas

Python pandas用法