首页 > 编程知识 正文

node 爬虫,nodejs做爬虫

时间:2023-05-03 23:55:43 阅读:187850 作者:2061

首先我们要会安装node 环境,然后安装三个模块

const rp = require('request-promise'); //进入request-promise模块const cheerio = require('cheerio');//引入cheerio模块const fs = require('fs');//引入fs模块

下面就是爬取一个页面所有图片并下载下来的小爬虫啦

const rp = require('request-promise'); //进入request-promise模块const cheerio = require('cheerio');//引入cheerio模块const fs = require('fs');//引入fs模块const savePath = 'D:/blog/'; //存储图片的路径const wormPath = 'http://vip66.sushenyue.cn/sh56/04/';//我们要爬取的网址 const getData = async ( url ) => {const data = await rp({url:url}) //获取目标domconst $ = cheerio.load(data) //将目标dom解析为jq模式$('img').each((i,e)=>{//获取页面所有img元素 遍历const url = e.attribs.src //获取图片地址const name = e.attribs.src.replace('images/','')downLoad(url,name)})};const downLoad = async ( url , name ) => { let headers = { Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", "Accept-Encoding": "gzip, deflate", "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8", "Cache-Control": "no-cache", Host: "i.meizitu.net", Pragma: "no-cache", "Proxy-Connection": "keep-alive", Referer: wormPath+url, "Upgrade-Insecure-Requests": 1, "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.19 Safari/537.36" };//反防盗链 await rp({ url:wormPath+url, resolveWithFullResponse: true, headers }).pipe(fs.createWriteStream(`${savePath}/${name}`))};getData(wormPath)

 

版权声明:该文观点仅代表作者本人。处理文章:请发送邮件至 三1五14八八95#扣扣.com 举报,一经查实,本站将立刻删除。