从返回的节点的 Goutte 请求中提取特定的 xml
Posted
技术标签:
【中文标题】从返回的节点的 Goutte 请求中提取特定的 xml【英文标题】:Extracting specific xml from a Goutte request from the node returning back 【发布时间】:2021-11-18 11:20:23 【问题描述】:我正在使用 Laravel Goutte 包执行一些网页抓取 - 以下代码有效并返回大量数据,我试图仅过滤掉我需要的部分数据。
如果我加载浏览器(同时将 jQuery 注入页面),我可以使用 jQuery 在控制台jQuery('ea-proclub-overview')[0];
中使用以下命令获取我需要的数据 - 我基本上是在尝试在下面的 Laravel/Goutte 实例。
在控制台中使用 jQuery('ea-proclub-overview')[0].customCrestBaseUrl;
我得到了我需要的确切 URL - https://fifa21.content.easports.com/fifa/fltOnlineAssets/05772199-716f-417d-9fe0-988fa9899c4d/2021/fifaweb/crests/256x256/l'
下面是我的 php 代码 - 我正在返回 $node 变量,但我不确定如何只返回 customCrestBaseUrl
,所以它给了我 URL。
$client = new Client();
$client->setServerParameter('HTTP_USER_AGENT', 'Mozilla/5.0 (X11; Linux i686; rv:78.0) Gecko/20100101 Firefox/78.0');
$client->setServerParameter('REFERER', 'https://www.ea.com/');
$url = 'https://www.ea.com/en-gb/games/fifa/pro-clubs/ps5-xbsxs/overview?clubId=2552&platform=ps5';
$crawler = $client->request('GET', $url);
$crawler->filter('ea-proclub-overview')->each(function ($node)
dd($node);
);
预期结果
<ea-proclub-overview endpoints=""settingsEndpoint":"https://proclubs.ea.com/api/fifa/settings","seasonalStatsEndpoint":"https://proclubs.ea.com/api/fifa/clubs/seasonalStats","clubsInfoEndpoint":"https://proclubs.ea.com/api/fifa/clubs/info","matchesEndpoint":"https://proclubs.ea.com/api/fifa/clubs/matches","memberStatEndpoint":"https://proclubs.ea.com/api/fifa/members/stats","memberCareerStatEndpoint":"https://proclubs.ea.com/api/fifa/members/career/stats"" colors=""currentDivision":"startColor":"#FA4358","endColor":"#FA4358","nextDivision":"relegationColor":"#FA4358","pointsColor":"#FA4358","pieChart":"winsColor":"#19A863","lossesColor":"#C4010D","tiesColor":"#282D3B","stats":"wins":"startColor":"#19A863","endColor":"#94D85D","losses":"startColor":"#C4010D","endColor":"#F80245","ties":"startColor":"#282D3B","endColor":"#282D3B"" match-type="["gameType9","gameType13"]" headers-labels=""points":"Points","stats":"wins":"label":"Wins","description":"Wins","losses":"label":"Losses","description":"Losses","ties":"label":"Draws","description":"Draws"" division-labels=""title":"Division Ranking","currentDivisionTitle":"Current Division","nextDivisionTitle":"Points To Next Division","seasons":"Season","record":"Record","points":"Points","gamesPlayed":"Games Played","gamesRemaining":"Games Remaining","divisionImgBaseUrl":"https://media.contentapi.ea.com/content/dam/eacom/fifa/pro-clubs/divisioncrest","stats":"wins":"W","losses":"L","ties":"D"" progressbar-labels=""div":"Div","promotion":"Promotion","relegation":"Relegation","title":"Title"" members-labels=""title":"Members","linkText":"View All Members","linkUrl":"members","totalTitle":"Total Members","totalCountsLabel":"Total","memberDetails":"proOverall":"Overall Rating","ratingAve":"Average Match Rating","gamesPlayed":"Games Played","memberPosition":"defender":"Defender","forward":"Forward","goalkeeper":"Goalkeeper","midfielder":"Midfielder","positions":"defender":"Defenders","forward":"Forwards","goalkeeper":"Goalkeepers","midfielder":"Midfielders","defaultMemberAvatar":"https://media.contentapi.ea.com/content/dam/ea/fifa/fifa-21/pro-clubs/common/pro-clubs/avatar.png"" match-labels=""title":"Last Match","linkText":"View All Match History","linkUrl":"match-history","altTitle":"No match data was found"" trophies-labels=""title":"Trophies","cupsLabel":"leaguesWon":"Leagues Won","titlesWon":"Titles Won","totalCupsWon":"Total Cups Won","cupsImg":"leaguesWonImgUrl":"https://media.contentapi.ea.com/content/dam/ea/fifa/fifa-21/pro-clubs/common/pro-clubs/league-titles-21.png","titlesWonImgUrl":"https://media.contentapi.ea.com/content/dam/ea/fifa/fifa-21/pro-clubs/common/pro-clubs/all-tiles-21.png","totalCupsWonImgUrl":"https://media.contentapi.ea.com/content/dam/ea/fifa/fifa-21/pro-clubs/common/pro-clubs/cups-won-21.png"" history-labels=""title":"Club History","subTitle":"Overall Record","pts":"Points","division":"Division","historyDetails":"seasons":"Seasons Played","totalGames":"Total Games","titlesWon":"Titles Won","bestPoints":"Highest Points Total","promotions":"Promotions","relegations":"Relegations","stats":"wins":"Wins","losses":"Losses","ties":"Draws","statsShort":"wins":"W","losses":"L","ties":"D","progressBar":"title":"Best Season Finish","tipLabel":"DIV","startColor":"#9B7801","endColor":"#F9F1A5","divisionBaseUrl":"https://media.contentapi.ea.com/content/dam/eacom/fifa/pro-clubs/divisioncrest"" translations=""4543827":"East Coast US","5723475":"West Coast US","5719381":"Western Europe","4539733":"Eastern Europe","5129557":"Northern Europe","5457237":"Southern Europe","4344147":"British Isles","5456205":"South America","4407629":"Central America","4281153":"Asia","4281683":"Australia / New Zealand"" crest-base-url="https://fifa21.content.easports.com/fifa/fltOnlineAssets/05772199-716f-417d-9fe0-988fa9899c4d/2021/fifaweb/crests/256x256/l" custom-crest-base-url="https://fifa21.content.easports.com/fifa/fltOnlineAssets/05772199-716f-417d-9fe0-988fa9899c4d/2021/fifaweb/crests/256x256/l" default-crest-url="https://media.contentapi.ea.com/content/dam/ea/fifa/fifa-21/pro-clubs/common/pro-clubs/crest-default.png" loading-image="https://media.contentapi.ea.com/content/dam/eacom/fifa/pro-clubs/loading-animation.png" default-club-name="Disbanded"></ea-proclub-overview>
实际结果
-- Too much post but it all of the html & XML
下面是使用 dd() 转储时来自 $crawler 的整个响应的 pastebin。 https://pastebin.com/qxUTpu9p
【问题讨论】:
【参考方案1】:根据documentation:
$customCrestBaseUrl = $crawler
->filter('ea-proclub-overview')
->first()
->extract(['custom-crest-base-url'])
;
【讨论】:
完美!!谢谢恩里科 :)以上是关于从返回的节点的 Goutte 请求中提取特定的 xml的主要内容,如果未能解决你的问题,请参考以下文章