NGINX屏蔽垃圾爬虫

Posted ☆♂安♀★

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了NGINX屏蔽垃圾爬虫相关的知识,希望对你有一定的参考价值。

 

 

if ($http_user_agent ~* (80legs.com|Abonti|AcoonBot|Acunetix|adbeat_bot|AddThis.com|adidxbot|ADmantX|AhrefsBot|AngloINFO|Antelope|Applebot|BaiduSpider|BeetleBot|billigerbot|binlar|bitlybot|BlackWidow|BLP_bbot|BoardReader|Bolt 0|BOT for JCE|Bot mailto:craftbot@yahoo.com|casper|CazoodleBot|CCBot|checkprivacy|ChinaClaw|chromeframe|Clerkbot|Cliqzbot|clshttp|CommonCrawler|comodo|CPython|crawler4j|Crawlera|CRAZYWEBCRAWLER|Curious|Curl|Custo|CWS_proxy|Default Browser 0|diavol|DigExt|Digincore|DIIbot|discobot|DISCo|DoCoMo|DotBot|Download Demon|DTS.Agent|EasouSpider|eCatch|ecxi|EirGrabber|Elmer|EmailCollector|EmailSiphon|EmailWolf|Exabot|ExaleadCloudView|ExpertSearchSpider|ExpertSearch|Express WebPictures|ExtractorPro|extract|EyeNetIE|Ezooms|F2S|FastSeek|feedfinder|FeedlyBot|FHscan|finbot|Flamingo_SearchEngine|FlappyBot|FlashGet|flicky|Flipboard|g00g1e|Genieo|genieo|GetRight|GetWeb!|GigablastOpenSource|GozaikBot|Go!Zilla|Go-Ahead-Got-It|GrabNet|grab|Grafula|GrapeshotCrawler|GTB5|GT::WWW|Guzzle|harvest|heritrix|HMView|HomePageBot|HTTP::Lite|HTTrack|HubSpot|ia_archiver|icarus6|IDBot|id-search|IlseBot|Image Stripper|Image Sucker|Indigonet|Indy Library|integromedb|InterGET|InternetSeer.com|Internet Ninja|IRLbot|ISC Systems iRc Search 2.1|jakarta|Java|JetCar|JobdiggerSpider|JOC Web Spider|Jooblebot|kanagawa|KINGSpider|kmccrew|larbin|LeechFTP|libwww|Lingewoud|LinkChecker|linkdexbot|LinksCrawler|LinksManager.com_bot|linkwalker|LinqiaRSSBot|LivelapBot|ltx71|LubbersBot|lwp-trivial|Mail.RU_Bot|masscan|Mass Downloader|maverick|Maxthon$|Mediatoolkitbot|MegaIndex|MegaIndex|megaindex|MFC_Tear_Sample|Microsoft URL Control|microsoft.url|MIDown tool|miner|Missigua Locator|Mister PiX|mj12bot|Mozilla.*Indy|Mozilla.*NEWT|MSFrontPage|msnbot|Navroad|NearSite|NetAnts|netEstate|NetSpider|NetZIP|Net Vampire|NextGenSearchBot|nutch|Octopus|Offline Explorer|Offline Navigator|OpenindexSpider|OpenWebSpider|OrangeBot|Owlin|PageGrabber|PagesInventory|panopta|panscient.com|Papa Foto|pavuk|pcBrowser|PECL::HTTP|PeoplePal|Photon|phpCrawl|planetwork|PleaseCrawl|PNAMAIN.EXE|PodcastPartyBot|prijsbest|proximic|psbot|purebot|pycurl|QuerySeekerSpider|R6_CommentReader|R6_FeedFetcher|RealDownload|ReGet|Riddler|Rippers 0|rogerbot|RSSingBot|rv:1.9.1|RyzeCrawler|SafeSearch|SBIder|Scrapy|Scrapy|Screaming|SeaMonkey$|search.goo.ne.jp|SearchmetricsBot|search_robot|SemrushBot|Semrush|SentiBot|SEOkicks|SeznamBot|ShowyouBot|SightupBot|SISTRIX|sitecheck.internetseer.com|siteexplorer.info|SiteSnagger|skygrid|Slackbot|Slurp|SmartDownload|Snoopy|Sogou|Sosospider|spaumbot|Steeler|sucker|SuperBot|Superfeedr|SuperHTTP|SurdotlyBot|Surfbot|tAkeOut|Teleport Pro|TinEye-bot|TinEye|Toata dragostea mea pentru diavola|Toplistbot|trendictionbot|TurnitinBot|turnit|Twitterbot|URI::Fetch|urllib|Vagabondo|Vagabondo|vikspider|VoidEYE|VoilaBot|WBSearchBot|webalta|WebAuto|WebBandit|WebCollage|WebCopier|WebFetch|WebGo IS|WebLeacher|WebReaper|WebSauger|Website eXtractor|Website Quester|WebStripper|WebWhacker|WebZIP|Web Image Collector|Web Sucker|Wells Search II|WEP Search|WeSEE|Wget|Widow|WinInet|woobot|woopingbot|worldwebheritage.org|Wotbox|WPScan|WWWOFFLE|WWW-Mechanize|Xaldon WebSpider|XoviBot|yacybot|Yahoo|YandexBot|Yandex|YisouSpider|zermelo|Zeus|zh-CN|ZmEu|ZumBot|ZyBorg) ) {
    return 410;
}

 

 

来源https://www.webfree.net/1165/

https://gist.github.com/hans2103/733b8eef30e89c759335017863bd721d

https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/blob/master/robots.txt/robots.txt

以上是关于NGINX屏蔽垃圾爬虫的主要内容,如果未能解决你的问题,请参考以下文章

利用nginx来屏蔽指定的user_agent的访问以及根据user_agent做跳转

用nginx屏蔽爬虫的方法

Nginx 使用 sever 段规则屏蔽恶意 User Agent

如何配置nginx屏蔽恶意域名解析指向《包含隐藏nginx版本号》

Python爬虫2------爬虫屏蔽手段之代理服务器实战

你见过最垃圾的代码长什么样?(来长长见识)