知識
不管是網(wǎng)站,軟件還是小程序,都要直接或間接能為您產(chǎn)生價值,我們在追求其視覺表現(xiàn)的同時,更側(cè)重于功能的便捷,營銷的便利,運營的高效,讓網(wǎng)站成為營銷工具,讓軟件能切實提升企業(yè)內(nèi)部管理水平和效率。優(yōu)秀的程序為后期升級提供便捷的支持!
您當前位置>首頁 » 新聞資訊 » 公眾號相關(guān) >
python wx 公眾號 反爬交流
發(fā)表時間:2020-11-7
發(fā)布人:葵宇科技
瀏覽次數(shù):40
爬微信公眾號求后續(xù)破解
from selenium import webdriver
import time,json,re,random,requests
driver = webdriver.Chrome()
driver.get(‘https://mp.weixin.qq.com/’)
time.sleep(1)
driver.find_element_by_link_text(“使用帳號登錄”).click()
time.sleep(1)
driver.find_element_by_name(“account”).clear()
driver.find_element_by_name(“account”).send_keys()#輸入’你的郵箱’
time.sleep(2)
driver.find_element_by_name(“password”).clear()
driver.find_element_by_name(“password”).send_keys()#輸入’你的密碼’
driver.find_element_by_class_name(“icon_checkbox”).click()
time.sleep(2)
driver.find_element_by_class_name(“btn_login”).click()
time.sleep(15)
cookies = driver.get_cookies()
print(cookies)
cookie = {}
for items in cookies :
cookie[items.get(‘name’)] = items.get(‘value’)
with open(‘cookies.txt’,‘w’) as file:
file.write(json.dumps(cookie))
分割線,上面獲取cookie
import time,json,re,random,requests
from bs4 import BeautifulSoup
def find_cookies():
with open(“cookies.txt”, “r”) as file:
cookie = file.read()
cookies = json.loads(cookie)
return cookies
def find_token(cookies):
url = “https://mp.weixin.qq.com”
response = requests.get(url, cookies=cookies)
token = re.findall(r’token=(\d+)’, str(response.url))[0]
return token
def find_account(token,cookies):
headers = {
“User-Agent”: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36’,
“Referer”: “https://mp.weixin.qq.com/cgi-bin/appmsg?t=media/appmsg_edit_v2&action=edit&isNew=1&type=10&token=”+token+"&lang=zh_CN",
“Host”: “mp.weixin.qq.com”
}
requests_url = ‘https://mp.weixin.qq.com/cgi-bin/searchbiz’
authors_list = [‘占豪’,’’]
authorsnumberlist = []
for author in authors_list:
paras_author = {
‘a(chǎn)ction’: ‘search_biz’,
‘begin’: ‘0’,
‘count’: ‘5’,
‘query’: author,
‘token’: token,
‘lang’: ‘zh_CN’,
‘f’: ‘json’,
‘a(chǎn)jax’: ‘1’
}
res_choose = requests.get(requests_url,params = paras_author,cookies=cookies,headers = headers)
json_choose = res_choose.json()
names = json_choose['list']
for name in names:
author_name = name['nickname']
if author_name == author:
fakeid_number = name['fakeid']
authorsnumberlist.append(fakeid_number)
time.sleep(20)
print(author)
return authorsnumberlist
def get_time(time_sj):
data_sj = time.strptime(time_sj,"%Y-%m-%d") #定義格式
time_int = int(time.mktime(data_sj))
timeStamp = time_int
timeArray = time.localtime(timeStamp)
otherStyleTime = time.strftime("%Y-%m-%d", timeArray)
timeArray = time.strptime(otherStyleTime, “%Y-%m-%d”)
timeStamp = int(time.mktime(timeArray))
return (timeStamp,timeStamp+86400)
def find_article(list,token,cookies,acquire_time):
headers = {
“User-Agent”: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36’,
“Referer”: “https://mp.weixin.qq.com/cgi-bin/appmsg?t=media/appmsg_edit_v2&action=edit&isNew=1&type=10&token=”+token+"&lang=zh_CN",
“Host”: “mp.weixin.qq.com”
}
links = []
for each_number in list:
params = {
‘a(chǎn)ction’: ‘list_ex’,
‘begin’: ‘0’,
‘count’:‘5’,
‘fakeid’: str(each_number),
‘type’:‘9’,
‘query’:’’ ,
‘token’: token,
‘lang’: ‘zh_CN’,
‘f’: ‘json’,
‘a(chǎn)jax’: ‘1’
}
account_url = ‘https://mp.weixin.qq.com/cgi-bin/appmsg’
res_account = requests.get(account_url,params = params,cookies=cookies,headers = headers)
json_account = res_account.json()
papers = json_account[‘a(chǎn)pp_msg_list’]
for each_paper in papers:
time = each_paper[‘create_time’]
if time > acquire_time[0] and time< acquire_time[1]:
link = each_paper[‘link’]
links.append(link)
return links
def findandstore_txt(links):
with open(‘爬文章.txt’,‘a(chǎn)’,encoding = ‘utf-8’) as file:
for link in links:
res = requests.get(link)
soup = BeautifulSoup(res.text,‘html.parser’)
articlene = soup.find(‘div’,id=‘img-content’)
content = articlene.text
file.write(str(content))
file.write(’\n’)
file.write(’-------------------------------------------------’)
cookies = find_cookies()
token = find_token(cookies)
authorsnumberlist = find_account(token,cookies)
#your_time = input(‘輸入你爬的日期格式 2020-08-11 :’)
your_time = ‘2020-11-5’
acquire_time = get_time(your_time)
links = find_article(authorsnumberlist,token,cookies,acquire_time)
print(len(links))
try:
findandstore_txt(links)
except UnicodeEncodeError :
pass
爬到文章并且以txt格式保存。
list里面公眾號加多了,爬多了會被封,有沒有大佬幫忙把后續(xù)反爬完善一下,或者私信一下解決辦法嘛。謝謝了
參考
https://blog.csdn.net/weixin_41267342/article/details/96729138
村西那條彎彎的河流 大佬文章完成。
如有侵權(quán),私信一下。
相關(guān)案例查看更多
相關(guān)閱讀
- 楚雄網(wǎng)站建設(shè)公司
- 云南小程序開發(fā)公司哪家好
- 報廢車回收管理軟件
- 小程序退款
- 二叉樹
- 文山小程序開發(fā)
- 小程序開發(fā)公司
- 云南網(wǎng)站建設(shè)首選
- Web開發(fā)框架
- 報廢車拆解系統(tǒng)
- 報廢車拆解回收管理系統(tǒng)
- 百度推廣
- 公眾號模板消息
- 云南做網(wǎng)站
- 網(wǎng)站建設(shè)制作
- 網(wǎng)站建設(shè)高手
- 網(wǎng)站建設(shè)方法
- 云南網(wǎng)站建設(shè)靠譜公司
- 小程序公司
- 制作一個小程序
- 小程序生成海報
- 小程序分銷商城
- 百度小程序開發(fā)
- 云南網(wǎng)站建設(shè)快速優(yōu)化
- 網(wǎng)站建設(shè)公司網(wǎng)站
- 網(wǎng)站建設(shè)開發(fā)
- 云南網(wǎng)站建設(shè)專業(yè)品牌
- 云南網(wǎng)站建設(shè)哪家好
- 昆明小程序哪家好
- 開發(fā)框架