javascript 一种快速节点爬虫,可在全球范围内寻找开放式监控摄像头。灵感来自并使用http://i.document.m05中的URL模式

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了javascript 一种快速节点爬虫,可在全球范围内寻找开放式监控摄像头。灵感来自并使用http://i.document.m05中的URL模式相关的知识,希望对你有一定的参考价值。

var Crawler = require("crawler").Crawler; // https://github.com/sylvinus/node-crawler
var S = require('string');
var fs = require('fs');

// A list of some patterns that will show up in webcam URLs
var patterns = ["jpg/image.jpg\?r="
	, "mjpg/video.mjpg"
	, "record/current.jpg"
	, "cgi-bin/faststream.jpg"
	, "oneshotimage.jpg"
	, "SnapshotJPEG"
	, "nphMotionJpeg"].join("|");
var regex = new RegExp( patterns ); 
var crawled = []; // This will contain all of the URLs that we've already crawled

var onContent = function(error, result, $) {
	if(error) {
		process.stdout.write("!");
		return;
	} 

	// Provide some feedback that we're still running
	process.stdout.write(".");
	crawled.push( result.uri );
	if(crawled.length % 100==0)
		console.log("\n\n"+crawled.length+" pages crawled\n");

	// Loop through all of the links (<a> tags) in the document and add them to the queue
	// There are some weird errros thrown here, so throw it in a try/catch
	// Perhaps it's https://github.com/sylvinus/node-crawler/issues/69
	try {
		$("a").each(function(index, a) {

			// as long as it's an http uri && we haven't already crawled them
			if(S(a.href).startsWith("http") && crawled.indexOf(a.href) == -1) {

				// I the href matches one of our patterns, write it to a file
				if(a.href.match(regex))
					fs.appendFileSync('./matches.txt', a.href+"\n");
			
				c.queue( a.href );
			}
		});
	} catch (err) {
	    console.log(err);
	}
} 


// Kick off the craler
var c = new Crawler({
	"maxConnections": 20,
	"timeout": 10000,
	"callback": onContent
});

// Add some initial URLs to crawl.
c.queue(["https://www.google.com/search?q=webcam", 
	"https://www.google.com/search?q=public+webcams",
	"https://www.google.com/search?q=open+webcams",
	"https://www.google.com/search?q=network+cameras"]);

以上是关于javascript 一种快速节点爬虫,可在全球范围内寻找开放式监控摄像头。灵感来自并使用http://i.document.m05中的URL模式的主要内容,如果未能解决你的问题,请参考以下文章

如果网页内容是由javascript生成的,应该怎么实现爬虫

JavaScript 初体验

范浩强treap——可持久化

熹乐科技范维肖CC:基于开源 YoMo 框架构建“全球同服”的 Realtime Metaverse Application

javascript 我第一次尝试在节点中的网络爬虫...显然我不知道如何使用异步性

二期L4 · Microsoft陌上花开,不负春光,解锁超多Surface时髦范