从返回的节点的 Goutte 请求中提取特定的 xml

Posted

技术标签:

【中文标题】从返回的节点的 Goutte 请求中提取特定的 xml【英文标题】:Extracting specific xml from a Goutte request from the node returning back 【发布时间】:2021-11-18 11:20:23 【问题描述】:

我正在使用 Laravel Goutte 包执行一些网页抓取 - 以下代码有效并返回大量数据,我试图仅过滤掉我需要的部分数据。

如果我加载浏览器(同时将 jQuery 注入页面),我可以使用 jQuery 在控制台jQuery('ea-proclub-overview')[0]; 中使用以下命令获取我需要的数据 - 我基本上是在尝试在下面的 Laravel/Goutte 实例。

在控制台中使用 jQuery('ea-proclub-overview')[0].customCrestBaseUrl; 我得到了我需要的确切 URL - https://fifa21.content.easports.com/fifa/fltOnlineAssets/05772199-716f-417d-9fe0-988fa9899c4d/2021/fifaweb/crests/256x256/l'

下面是我的 php 代码 - 我正在返回 $node 变量,但我不确定如何只返回 customCrestBaseUrl,所以它给了我 URL。

$client = new Client();
$client->setServerParameter('HTTP_USER_AGENT', 'Mozilla/5.0 (X11; Linux i686; rv:78.0) Gecko/20100101 Firefox/78.0');
$client->setServerParameter('REFERER', 'https://www.ea.com/');

$url = 'https://www.ea.com/en-gb/games/fifa/pro-clubs/ps5-xbsxs/overview?clubId=2552&platform=ps5';
$crawler = $client->request('GET', $url);
$crawler->filter('ea-proclub-overview')->each(function ($node) 
  dd($node);
);  

预期结果

<ea-proclub-overview endpoints="&quot;settingsEndpoint&quot;:&quot;https://proclubs.ea.com/api/fifa/settings&quot;,&quot;seasonalStatsEndpoint&quot;:&quot;https://proclubs.ea.com/api/fifa/clubs/seasonalStats&quot;,&quot;clubsInfoEndpoint&quot;:&quot;https://proclubs.ea.com/api/fifa/clubs/info&quot;,&quot;matchesEndpoint&quot;:&quot;https://proclubs.ea.com/api/fifa/clubs/matches&quot;,&quot;memberStatEndpoint&quot;:&quot;https://proclubs.ea.com/api/fifa/members/stats&quot;,&quot;memberCareerStatEndpoint&quot;:&quot;https://proclubs.ea.com/api/fifa/members/career/stats&quot;" colors="&quot;currentDivision&quot;:&quot;startColor&quot;:&quot;#FA4358&quot;,&quot;endColor&quot;:&quot;#FA4358&quot;,&quot;nextDivision&quot;:&quot;relegationColor&quot;:&quot;#FA4358&quot;,&quot;pointsColor&quot;:&quot;#FA4358&quot;,&quot;pieChart&quot;:&quot;winsColor&quot;:&quot;#19A863&quot;,&quot;lossesColor&quot;:&quot;#C4010D&quot;,&quot;tiesColor&quot;:&quot;#282D3B&quot;,&quot;stats&quot;:&quot;wins&quot;:&quot;startColor&quot;:&quot;#19A863&quot;,&quot;endColor&quot;:&quot;#94D85D&quot;,&quot;losses&quot;:&quot;startColor&quot;:&quot;#C4010D&quot;,&quot;endColor&quot;:&quot;#F80245&quot;,&quot;ties&quot;:&quot;startColor&quot;:&quot;#282D3B&quot;,&quot;endColor&quot;:&quot;#282D3B&quot;" match-type="[&quot;gameType9&quot;,&quot;gameType13&quot;]" headers-labels="&quot;points&quot;:&quot;Points&quot;,&quot;stats&quot;:&quot;wins&quot;:&quot;label&quot;:&quot;Wins&quot;,&quot;description&quot;:&quot;Wins&quot;,&quot;losses&quot;:&quot;label&quot;:&quot;Losses&quot;,&quot;description&quot;:&quot;Losses&quot;,&quot;ties&quot;:&quot;label&quot;:&quot;Draws&quot;,&quot;description&quot;:&quot;Draws&quot;" division-labels="&quot;title&quot;:&quot;Division Ranking&quot;,&quot;currentDivisionTitle&quot;:&quot;Current Division&quot;,&quot;nextDivisionTitle&quot;:&quot;Points To Next Division&quot;,&quot;seasons&quot;:&quot;Season&quot;,&quot;record&quot;:&quot;Record&quot;,&quot;points&quot;:&quot;Points&quot;,&quot;gamesPlayed&quot;:&quot;Games Played&quot;,&quot;gamesRemaining&quot;:&quot;Games Remaining&quot;,&quot;divisionImgBaseUrl&quot;:&quot;https://media.contentapi.ea.com/content/dam/eacom/fifa/pro-clubs/divisioncrest&quot;,&quot;stats&quot;:&quot;wins&quot;:&quot;W&quot;,&quot;losses&quot;:&quot;L&quot;,&quot;ties&quot;:&quot;D&quot;" progressbar-labels="&quot;div&quot;:&quot;Div&quot;,&quot;promotion&quot;:&quot;Promotion&quot;,&quot;relegation&quot;:&quot;Relegation&quot;,&quot;title&quot;:&quot;Title&quot;" members-labels="&quot;title&quot;:&quot;Members&quot;,&quot;linkText&quot;:&quot;View All Members&quot;,&quot;linkUrl&quot;:&quot;members&quot;,&quot;totalTitle&quot;:&quot;Total Members&quot;,&quot;totalCountsLabel&quot;:&quot;Total&quot;,&quot;memberDetails&quot;:&quot;proOverall&quot;:&quot;Overall Rating&quot;,&quot;ratingAve&quot;:&quot;Average Match Rating&quot;,&quot;gamesPlayed&quot;:&quot;Games Played&quot;,&quot;memberPosition&quot;:&quot;defender&quot;:&quot;Defender&quot;,&quot;forward&quot;:&quot;Forward&quot;,&quot;goalkeeper&quot;:&quot;Goalkeeper&quot;,&quot;midfielder&quot;:&quot;Midfielder&quot;,&quot;positions&quot;:&quot;defender&quot;:&quot;Defenders&quot;,&quot;forward&quot;:&quot;Forwards&quot;,&quot;goalkeeper&quot;:&quot;Goalkeepers&quot;,&quot;midfielder&quot;:&quot;Midfielders&quot;,&quot;defaultMemberAvatar&quot;:&quot;https://media.contentapi.ea.com/content/dam/ea/fifa/fifa-21/pro-clubs/common/pro-clubs/avatar.png&quot;" match-labels="&quot;title&quot;:&quot;Last Match&quot;,&quot;linkText&quot;:&quot;View All Match History&quot;,&quot;linkUrl&quot;:&quot;match-history&quot;,&quot;altTitle&quot;:&quot;No match data was found&quot;" trophies-labels="&quot;title&quot;:&quot;Trophies&quot;,&quot;cupsLabel&quot;:&quot;leaguesWon&quot;:&quot;Leagues Won&quot;,&quot;titlesWon&quot;:&quot;Titles Won&quot;,&quot;totalCupsWon&quot;:&quot;Total Cups Won&quot;,&quot;cupsImg&quot;:&quot;leaguesWonImgUrl&quot;:&quot;https://media.contentapi.ea.com/content/dam/ea/fifa/fifa-21/pro-clubs/common/pro-clubs/league-titles-21.png&quot;,&quot;titlesWonImgUrl&quot;:&quot;https://media.contentapi.ea.com/content/dam/ea/fifa/fifa-21/pro-clubs/common/pro-clubs/all-tiles-21.png&quot;,&quot;totalCupsWonImgUrl&quot;:&quot;https://media.contentapi.ea.com/content/dam/ea/fifa/fifa-21/pro-clubs/common/pro-clubs/cups-won-21.png&quot;" history-labels="&quot;title&quot;:&quot;Club History&quot;,&quot;subTitle&quot;:&quot;Overall Record&quot;,&quot;pts&quot;:&quot;Points&quot;,&quot;division&quot;:&quot;Division&quot;,&quot;historyDetails&quot;:&quot;seasons&quot;:&quot;Seasons Played&quot;,&quot;totalGames&quot;:&quot;Total Games&quot;,&quot;titlesWon&quot;:&quot;Titles Won&quot;,&quot;bestPoints&quot;:&quot;Highest Points Total&quot;,&quot;promotions&quot;:&quot;Promotions&quot;,&quot;relegations&quot;:&quot;Relegations&quot;,&quot;stats&quot;:&quot;wins&quot;:&quot;Wins&quot;,&quot;losses&quot;:&quot;Losses&quot;,&quot;ties&quot;:&quot;Draws&quot;,&quot;statsShort&quot;:&quot;wins&quot;:&quot;W&quot;,&quot;losses&quot;:&quot;L&quot;,&quot;ties&quot;:&quot;D&quot;,&quot;progressBar&quot;:&quot;title&quot;:&quot;Best Season Finish&quot;,&quot;tipLabel&quot;:&quot;DIV&quot;,&quot;startColor&quot;:&quot;#9B7801&quot;,&quot;endColor&quot;:&quot;#F9F1A5&quot;,&quot;divisionBaseUrl&quot;:&quot;https://media.contentapi.ea.com/content/dam/eacom/fifa/pro-clubs/divisioncrest&quot;" translations="&quot;4543827&quot;:&quot;East Coast US&quot;,&quot;5723475&quot;:&quot;West Coast US&quot;,&quot;5719381&quot;:&quot;Western Europe&quot;,&quot;4539733&quot;:&quot;Eastern Europe&quot;,&quot;5129557&quot;:&quot;Northern Europe&quot;,&quot;5457237&quot;:&quot;Southern Europe&quot;,&quot;4344147&quot;:&quot;British Isles&quot;,&quot;5456205&quot;:&quot;South America&quot;,&quot;4407629&quot;:&quot;Central America&quot;,&quot;4281153&quot;:&quot;Asia&quot;,&quot;4281683&quot;:&quot;Australia / New Zealand&quot;" crest-base-url="https://fifa21.content.easports.com/fifa/fltOnlineAssets/05772199-716f-417d-9fe0-988fa9899c4d/2021/fifaweb/crests/256x256/l" custom-crest-base-url="https://fifa21.content.easports.com/fifa/fltOnlineAssets/05772199-716f-417d-9fe0-988fa9899c4d/2021/fifaweb/crests/256x256/l" default-crest-url="https://media.contentapi.ea.com/content/dam/ea/fifa/fifa-21/pro-clubs/common/pro-clubs/crest-default.png" loading-image="https://media.contentapi.ea.com/content/dam/eacom/fifa/pro-clubs/loading-animation.png" default-club-name="Disbanded"></ea-proclub-overview>

实际结果

-- Too much post but it all of the html & XML

下面是使用 dd() 转储时来自 $crawler 的整个响应的 pastebin。 https://pastebin.com/qxUTpu9p

【问题讨论】:

【参考方案1】:

根据documentation:

$customCrestBaseUrl = $crawler
    ->filter('ea-proclub-overview')
    ->first()
    ->extract(['custom-crest-base-url'])
;

【讨论】:

完美!!谢谢恩里科 :)

以上是关于从返回的节点的 Goutte 请求中提取特定的 xml的主要内容,如果未能解决你的问题,请参考以下文章

从 html / json 页面中提取特定部分的最佳方法?

PHP使用Goutte不校验SSL证书

如何从 Goutte 获取响应状态代码

使用掩码从特定坐标处的图像(2d数组)中提取像素值

从多维数组中提取特定元素

XPath可以只返回具有X子节点的节点吗?