•  


GitHub - shaoxiongji/webspider: Web spider for Reddit and Experience Project
Skip to content
This repository has been archived by the owner on Jun 8, 2020. It is now read-only.
/ webspider Public archive

Web spider for Reddit and Experience Project

Notifications You must be signed in to change notification settings

shaoxiongji/webspider

Folders and files

Name Name
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Spider

Some web spiders including: Reddit and experience project.

There are two versions of Reddit spiders using Scrapy and Reddit API.

To run these spiders, some folders need to be created and the path name should be modified.

Reddit API

There are three scripts in the folder of redditapi.

(1) crawl the posts (including titles and text bodies) of subreddit.

(2) crawl the posts and comments.

(3) get the sequences of comments.

Reddit Scrapy

Scrapy is used to crawl the posts in this script and comments in this script . XPATH and CSS are used to match specific information. Run the code:

python scrapy_reddit.py

Experience Project

Scrapy is used to crawl the posts from Experience Project . XPATH and CSS are used to match specific information, implemented in this script . Run the code:

python scrapy_ep.py

Requirements

python 3.6

scrapy 1.3.3

requests 2.18.4

About

Web spider for Reddit and Experience Project

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

- "漢字路" 한글한자자동변환 서비스는 교육부 고전문헌국역지원사업의 지원으로 구축되었습니다.
- "漢字路" 한글한자자동변환 서비스는 전통문화연구회 "울산대학교한국어처리연구실 옥철영(IT융합전공)교수팀"에서 개발한 한글한자자동변환기를 바탕하여 지속적으로 공동 연구 개발하고 있는 서비스입니다.
- 현재 고유명사(인명, 지명등)을 비롯한 여러 변환오류가 있으며 이를 해결하고자 많은 연구 개발을 진행하고자 하고 있습니다. 이를 인지하시고 다른 곳에서 인용시 한자 변환 결과를 한번 더 검토하시고 사용해 주시기 바랍니다.
- 변환오류 및 건의,문의사항은 juntong@juntong.or.kr로 메일로 보내주시면 감사하겠습니다. .
Copyright ⓒ 2020 By '전통문화연구회(傳統文化硏究會)' All Rights reserved.
 한국   대만   중국   일본