b站动漫_python爬b站视频
动漫,python,视频
2025-03-13 21:27:15 时间
大家好,又见面了,我是你们的朋友全
。。。闲来无事,爬了一下我最爱的B站~~~卒 首先进入B站的番剧索引页 ps:以前经常浏览这个索引页找动漫看,所以熟练的操作~滑稽
翻页发现url链接并没有改变,用谷歌开发者工具network发现加载了XHR文件并返回json格式的响应
放到atom里看下数据是咋样的
要对其进行翻页处理,观察一下query string的规律,发现那么多个参数只有page这个参数是变化的
所以接下来都很好做了~嘻嘻 items.py
import scrapy
from scrapy import Field
class BilibiliItem(scrapy.Item):
title = Field()
cover = Field()
sum_index = Field()
is_finish = Field()
link = Field()
follow = Field()
plays = Field()
score = Field()
_id = Field()
import scrapy
import demjson #这个库要pip一哈
from scrapy.selector import Selector
from bilibili.items import BilibiliItem
from random import randint
class BzhanSpider(scrapy.Spider):
name = 'bzhan'
allowed_domains = ['bilibili.com']
start_urls = ['https://bangumi.bilibili.com/media/web_api/search/result?season_version=-1&area=-1&is_finish=-1©right=-1&season_status=-1&season_month=-1&pub_date=-1&style_id=-1&order=3&st=1&sort=0&page=1&season_type=1&pagesize=20']
def parse(self, response):
json_content = demjson.decode(response.body)
datas = json_content["result"]["data"]
item = BilibiliItem()
for data in datas:
cover = data['cover']
sum_index = data['index_show']
is_finish = data['is_finish']
is_finish = '已完结' if is_finish == 1 else '未完结'
link = data['link']
follow = data['order']['follow']
plays = data['order']['play']
try:
score = data['order']['score']
except:
score = '未知'
title = data['title']
item['_id'] = title
item['cover'] = cover
item['sum_index'] = sum_index
item['is_finish'] = is_finish
item['link'] = link
item['follow'] = follow
item['plays'] = plays
item['score'] = score
item['title'] = title
yield item
urls = ['https://bangumi.bilibili.com/media/web_api/search/result?season_version=-1&area=-1&is_finish=-1©right=-1&season_status=-1&season_month=-1&pub_date=-1&style_id=-1&order=3&st=1&sort=0&page={0}&season_type=1&pagesize=20'.format(k) for k in range(2,156)]
for url in urls:
request = scrapy.Request(url,callback=self.parse)
yield request
利用python对象字典的方式进行解析。。不难
import pymongo
class BilibiliPipeline(object):
def process_item(self, item, spider):
client = pymongo.MongoClient('localhost', 27017)
mydb = client['mydb']
bilibili = mydb['bilibili']
bilibili.insert_one(item)
print(item)
return item
settings.py略。。。。。。
结果可以爬取到三千多个数据
心疼我的b站一秒。。
发布者:全栈程序员栈长,转载请注明出处:https://javaforall.cn/172278.html原文链接:https://javaforall.cn
相关文章
- Python安装第三方库(离线+在线)
- dataframe loc iloc_python的isnull函数
- 一对兔子从出生后第三个月起每个月_兔子繁衍问题python
- 用python画圣诞树、樱花树、卡通图案及打包成exe文件[通俗易懂]
- pycharm调试python_pycharm调试快捷键
- 全国计算机等级考试二级Python真题及解析(5)_计算机二级有必要考吗
- 利用python pip以及pyCharm安装requests第三方库「建议收藏」
- pycharm与anaconda_python关系抽取
- python格式化json文件_pycharm对齐线
- Python基础24-MySQL模块pymysql
- python之抛出异常(raise语句)「建议收藏」
- pycharm中pyqt5使用方法_python环境变量的配置
- Python 安装 【Pycharm interpreter field is empty(解释器为空)】
- 地球科学领域Python工具合集
- python读写json_python格式化json
- Python基础18-异常处理
- ubuntu用pip离线安装python第三方库
- pycharm怎么看函数源代码_python查看第三方库的源码
- pycharm安装opencv-python_pycharm opencv
- 用 Python 破解 WiFi 密码,太刺激了!