python微信api(python抓取公众号文章)

私信我或者关注微信号：猿猴过来，回复：学习，获得免费学习资源包。

爬微信官方账号常见的方法有两种。

穿过搜狗搜索。缺点是只能得到最新的十篇推送文章。

通过微信官方账号的素材管理获取微信官方账号的文章。缺点是需要申请自己的微信官方账号。

今天介绍一个在PC端抓取微信获取微信官方账号文章的方法。与其他方法相比，它非常方便。

如上图，我们通过包抓取工具得到微信的网络信息请求，发现每次下拉刷新文章都会请求mp.weixin.qq.com/mp/xxx的界面(微信官方账号不允许添加首页链接，xxx表示profile_ext)。

经过多次测试和分析，使用以下参数。

__biz :用户与微信官方账号之间的唯一id，uin:用户的私有idkey:请求的密钥，只会在一段时间内无效。偏移量：偏移量计数：每次请求数的数据如下

{

ret': 0，

Ermsg' : '正常'，#请求状态

消息数' : 10，#消息数

Can_msg_continue': 1，#不管是否还有，1表示有。0表示没有，这是最后一页。

general _ msg _ list ' : ' { list ' :[]} '，#微信官方账号文字信息

next_offset': 20，

video_count': 1，

使用_video_tab': 1，

real_type': 0，

主页列表' : []

}

一些代码如下

params={

__biz': biz，

uin': uin，

密钥' :密钥，

偏移量' :偏移量，

计数' :计数，

操作' : 'getmsg '，

f': 'json '

}

标题={ 0

用户代理' : ' Mozilla/5.0(Windows NT 10.0；Win64x64)applebwebkit/537.36(KHTML，像Gecko)Chrome/74 . 0 . 3729 . 131 Safari/537.36’

}

response=requests.get(url=url，params=params，headers=headers)

resp_json=response.json()

if resp _ JSON . get(' errmsg ')==' ok ' :

resp_json=response.json()

#是否有分页数据判断返回值？

can _ msg _ continue=resp _ JSON[' can _ msg _ continue ']

#当前分页文章的数量

msg _ count=resp _ JSON[' msg _ count ']

general _ msg _ list=JSON . loads(resp _ JSON[' general _ msg _ list '])

list=general _ msg _ list . get(' list ')

打印(列表，' *********** ')

最后一张打印的名单是微信官方账号。

的文章信息详情。包括标题(titile)、摘要(digest)、文章地址(content_url)、阅读原文地址(source_url)、封面图(cover)、作者(author)等等...

输出结果如下：

[{ "comm_msg_info": { "id": 1000000038, "type": 49, "datetime": 1560474000, "fakeid": "3881067844", "status": 2, "content": "" }, "app_msg_ext_info": { "title": "入门爬虫，这一篇就够了！！！", "digest": "入门爬虫，这一篇就够了！！！", "content": "", "fileid": 0, "content_url": "http:XXXXXX", "source_url": "", "cover": "I5kME6BVXeLibZDUhsiaEYiaX7zOoibxa9sb4stIwrfuqID5ttmiaoVAFyxKF6IjOCyl22vg8n2NPv98ibow\/0?wx_fmt=jpeg", "subtype": 9, "is_multi": 0, "multi_app_msg_item_list": [], "author": "Python3X", "copyright_stat": 11, "duration": 0, "del_flag": 1, "item_show_type": 0, "audio_fileid": 0, "play_url": "", "malicious_title_reason_id": 0, "malicious_content_type": 0 } },{...},{...},{...},{...},{...},{...},{...},{...},{...}]

获取数据之后，可以保存到数据库中，也可以将文章保存在PDF中。

1、保存在Mongo中

# Mongo配置 conn = MongoClient('127.0.0.1', 27017) db = conn.wx #连接wx数据库，没有则自动创建 mongo_wx = db.article #使用article集合，没有则自动创建 for i in list: app_msg_ext_info = i['app_msg_ext_info'] # 标题 title = app_msg_ext_info['title'] # 文章地址 content_url = app_msg_ext_info['content_url'] # 封面图 cover = app_msg_ext_info['cover'] # 发布时间 datetime = i['comm_msg_info']['datetime'] datetime = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(datetime)) mongo_wx.insert({ 'title': title, 'content_url': content_url, 'cover': cover, 'datetime': datetime })

结果如下

2、导入到PDF文件中

Python3 中常用的操作 PDF 的库有 python-pdf 和 pdfkit。我用了 pdfkit 这个模块导出 pdf 文件。

pdfkit 是工具包 Wkhtmltopdf 的封装类，因此需要安装 Wkhtmltopdf 才能使用。

可以访问

https://wkhtmltopdf.org/downloads.html

下载和操作系统匹配的工具包。

实现代码也比较简单，只需要传入导入文件的 url 即可。

安装pdfkit库

pip3 install pdfkit -i http://pypi.douban.com/simple --trusted-host pypi.douban.com

import pdfkit pdfkit.from_url('公众号文章地址', 'out.pdf')

运行之后成功导出 pdf 文件。

来源网络，侵权联系删除

私信我或关注微信号：猿来如此呀，回复：学习，获取免费学习资源包。