scrapy / scrapy Star 51.4k Code Issues Pull requests Scrapy, a fast high-level web crawling & scraping framework for Python. python crawler framework scraping crawling web-scraping hacktoberfest web-scraping-python Updated Jun 12, 2024 Python
jhao104 / proxy_pool Star 20.5k Code Issues Pull requests Python ProxyPool for web spider redis http crawler spider proxy Updated Feb 4, 2024 Python
binux / pyspider Star 16.4k Code Issues Pull requests A Powerful Spider(Web Crawler) System in Python. python crawler Updated Apr 30, 2024 Python
codelucas / newspaper Star 13.8k Code Issues Pull requests newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs: python crawler scraper news crawling news-aggregator Updated Apr 3, 2024 Python
shengqiangzhang / examples-of-web-crawlers Star 13.6k Code Issues Pull requests 一些非常有趣的python爬?例子,?新手比?友好,主要爬取淘?、天猫、微信、微信??、豆瓣、QQ等?站。(Some interesting examples of python crawlers that are friendly to beginners. ) python crawler spider example selenium multithreading stock wechat taobao pyquery tmall fund agent-pool wechat-report wereader Updated Dec 25, 2023 Python
s0md3v / Photon Sponsor Star 10.6k Code Issues Pull requests Incredibly fast crawler designed for OSINT. python crawler osint spider information-gathering Updated Jan 4, 2024 Python
injetlee / Python Star 9.3k Code Issues Pull requests Python脚本。模?登?知乎, 爬?,操作excel,微信公??,?程?机 python crawler excel wechat Updated Oct 10, 2023 Python
Evil0ctal / Douyin_TikTok_Download_API Star 7.5k Code Issues Pull requests Discussions ??「Douyin_TikTok_Download_API」是一??箱?用的高性能?步?音、快手、TikTok、Bilibili?据爬取工具,支持API?用,在?批量解析及下?。 python api crawler scraper spider async web-scraping asyncio asgi douyin tiktok fastapi tiktok-scraper httpx pywebio no-watermark online-parsing douyin-tiktok-api douyin-tiktok-download douyin-scraper Updated Jun 6, 2024 Python
alirezamika / autoscraper Sponsor Star 6k Code Issues Pull requests Discussions A Smart, Automatic, Fast and Lightweight Web Scraper for Python python crawler machine-learning scraper automation ai scraping artificial-intelligence web-scraping scrape webscraping webautomation Updated Apr 30, 2024 Python
chyroc / WechatSogou Sponsor Star 5.8k Code Issues Pull requests 基于搜狗微信搜索的微信公??爬?接口 python crawler pypi scrapy wechat sogou Updated Nov 15, 2023 Python
rmax / scrapy-redis Star 5.5k Code Issues Pull requests Discussions Redis-based components for Scrapy. redis crawler distributed scrapy Updated May 20, 2024 Python
SpiderClub / haipproxy Star 5.4k Code Issues Pull requests ?? High available distributed ip proxy pool, powerd by Scrapy and Redis redis crawler spider scheduler distributed scrapy high-availability ipproxy Updated Dec 26, 2022 Python
DropsDevopsOrg / ECommerceCrawlers Star 4.5k Code Issues Pull requests ????多??站、?商?据爬???。包含??:淘?商品、微信公??、大?点?、企??、招聘?站、??、阿里任?、博客?、微博、百度??、豆瓣?影、包??、全景?、豆瓣音?、某省??局、搜狐新?、机器??文本采集、fofa??采集、汽?之家、?家??局、百度???收??、蜘蛛泛目?、今日??、豆瓣影?、携程、小米?用商店、安居客、途家民宿??????。微信爬?展示?目: crawler python3 boss scrapy wechat baidu lagou douban-movie baidu-tieba xianyu douban-music ctrip zhilianzhaopin sohu taobao-spider fofa dazhong-spider alitask baotu quanjing Updated May 22, 2024 Python
madawei2699 / myGPTReader Star 4.4k Code Issues Pull requests Discussions A community-driven way to read and chat with AI bots - powered by chatGPT. crawler scraper ai prompt openai reader slack-bot embedding daily-news hot-news chatgpt gpt-35-turbo Updated Apr 25, 2024 Python
imWildCat / scylla Sponsor Star 3.9k Code Issues Pull requests Discussions Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era python crawler scylla python3 proxy-pool Updated Jun 6, 2024 Python
constverum / ProxyBroker Star 3.8k Code Issues Pull requests Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS ?? crawler privacy proxy proxy-server http-proxy socks proxies anonymity anonymous proxypool proxy-list proxy-checker Updated Mar 18, 2024 Python
elliotgao2 / toapi Star 3.5k Code Issues Pull requests Every web site provides APIs. python html api flask json crawler web spider toapi Updated Jul 5, 2022 Python
dataabc / weibo-crawler Star 3.2k Code Issues Pull requests 新浪微博爬?,用python爬取新浪微博?据,?下?微博?片和微博?? crawler weibo weibo-spider Updated May 22, 2024 Python
adbar / trafilatura Star 3.1k Code Issues Pull requests Discussions Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments nlp crawler text-mining news html-to-markdown scraping corpus news-aggregator text-extraction web-scraping rss-feed readability tei html2text news-crawler corpus-builder corpus-tools article-extractor text-cleaning text-preprocessing Updated Jun 12, 2024 Python
wkunzhi / Python3-Spider Star 2.9k Code Issues Pull requests Python爬??? - 模?登?各大?站 包含但不限于:滑???、?多多、美?、百度、bilibili、大?点?、淘?,如果喜??start ?? python crawler spider selenium crawl scrapy splash geek taobao scrapy-crawler meituan dianping pyppeteer Updated Nov 3, 2023 Python