code4craft / webmagic Star 11.3k Code Issues Pull requests A scalable web crawler framework for Java. java crawler framework scraping Updated Jun 3, 2024 Java
ssssssss-team / spider-flow Star 9.2k Code Issues Pull requests 新一代爬?平台,以?形化方式定?爬?流程,不?代??可完成爬?。 crawler spider web-crawler jsoup xpath webcrawler webspider web-spider spider-flow Updated Jun 14, 2023 Java
xtuhcy / gecco Star 2.5k Code Issues Pull requests Easy to use lightweight web crawler(易用的?量化??爬?) java crawler dynamic jsoup gecco fastjson Updated Feb 22, 2024 Java
CatVodTVOfficial / CatVodTVSpider Star 1.8k Code Issues Pull requests player crawler spider tv catvod maotv Updated Jun 10, 2022 Java
dadoonet / fscrawler Star 1.3k Code Issues Pull requests Discussions Elasticsearch File System Crawler (FS Crawler) java elasticsearch crawler tika Updated Jun 13, 2024 Java
TeamNewPipe / NewPipeExtractor Star 1.1k Code Issues Pull requests NewPipe's core library for extracting data from streaming sites crawler scraper youtube extractor soundcloud bandcamp newpipe peertube mediaccc Updated May 24, 2024 Java
codelibs / fess Star 971 Code Issues Pull requests Fess is very powerful and easily deployable Enterprise Search Server. search java search-engine elasticsearch crawler full-text-search lucene fulltext-search enterprise-search Updated May 30, 2024 Java
wycm / zhihu-crawler Star 912 Code Issues Pull requests zhihu-crawler是一?基于Java的高性能、支持免?http代理池、支持?向?展、分布式爬??目 java crawler spider zhihu Updated Apr 2, 2019 Java
xuxueli / xxl-crawler Star 684 Code Issues Pull requests A distributed web crawler framework.(分布式爬??架XXL-CRAWLER) java crawler web spider flexible distributed object-oriented xxl-crawler Updated Mar 23, 2023 Java
fanyong920 / jvppeteer Star 668 Code Issues Pull requests Discussions Headless Chrome For Java (Java 爬?) java crawler chrome scraper chrome-headless puppeteer jvppeteer Updated Jun 9, 2024 Java
fengzhizi715 / NetDiscovery Star 639 Code Issues Pull requests NetDiscovery 是一款基于 Vert.x、RxJava 2 等?架??的通用爬??架/中?件。 kotlin redis middleware crawler kafka spider dsl coroutines selenium rxjava2 lettuce disruptor htmlunit vertx3 Updated Nov 28, 2020 Java
crawljax / crawljax Sponsor Star 507 Code Issues Pull requests Discussions Crawljax javascript crawler dom dynamic crawling test-generation web-testing web-analysis event-driven-crawling Updated Sep 18, 2023 Java
jaeksoft / opensearchserver Star 499 Code Issues Pull requests Open-source Enterprise Grade Search Engine Software search java search-engine enterprise crawler ocr indexing synonyms lucene webcrawler custom-search webcrawling opensearchserver Updated Sep 3, 2022 Java
smuyyh / CrawlerForReader Star 389 Code Issues Pull requests Android 本地??小?爬?,基于jsoup及xpath android crawler jsoup xpath bookreader Updated Sep 2, 2020 Java
commoncrawl / news-crawl Star 304 Code Issues Pull requests Discussions News crawling with StormCrawler - stores content as WARC crawler news web-crawler apache-storm warc commoncrawl common-crawl storm-crawler Updated Dec 13, 2023 Java
yAnXImIN / weiboPicDownloader Star 264 Code Issues Pull requests 免登?下?微博?片 爬? Download Weibo Images without Logging-in java crawler weibo Updated May 20, 2022 Java
tim232385 / WebVideoBot Star 224 Code Issues Pull requests Web crawler. crawler spider pornhub Updated Dec 1, 2019 Java
codesofun / web-bee Star 186 Code Issues Pull requests ?? Web vertical crawler framework for fun java crawler framework java-8 webbee Updated Dec 16, 2023 Java
Norconex / crawlers Star 175 Code Issues Pull requests Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines. java search-engine crawler flexible web-crawler crawlers filesystem-crawler collector-http collector-fs Updated Jun 13, 2024 Java
xjtushilei / ScriptSpider Star 163 Code Issues Pull requests A Java componentized distributed crawler framework. 一?Java版本的?件化的分布式通用爬? java redis distributed-systems crawler spider thread-pool Updated Dec 5, 2023 Java