Web Spider

Some web spiders including: Reddit and experience project.

There are two versions of Reddit spiders using Scrapy and Reddit API.

To run these spiders, some folders need to be created and the path name should be modified.

Reddit API

There are three scripts in the folder of redditapi.

(1) crawl the posts (including titles and text bodies) of subreddit.

(2) crawl the posts and comments.

(3) get the sequences of comments.

Reddit Scrapy

Scrapy is used to crawl the posts in this script and comments in this script . XPATH and CSS are used to match specific information. Run the code:

python scrapy_reddit.py

Experience Project

Scrapy is used to crawl the posts from Experience Project . XPATH and CSS are used to match specific information, implemented in this script . Run the code:

python scrapy_ep.py

Requirements

python 3.6

scrapy 1.3.3

requests 2.18.4

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data/ reddit_list		data/ reddit_list
redditapi		redditapi
redditscrapy		redditscrapy
scrapy_ep		scrapy_ep
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/ reddit_list

data/ reddit_list

redditapi

redditapi

redditscrapy

redditscrapy

scrapy_ep

scrapy_ep

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Web Spider

Reddit API

Reddit Scrapy

Experience Project

Requirements

About

Releases

Packages

Languages

shaoxiongji/webspider

Folders and files

Latest commit

History

Repository files navigation

Web Spider

Reddit API

Reddit Scrapy

Experience Project

Requirements

About

Topics

Resources

Stars

Watchers

Forks

Languages