132 Final Project

Topic

Student reading carnival

Goal

All purpose search on books

Team Members

Linxuan Yang
Ye Hong
Chenfeng Fan
Limian Guo

Dependencies

Build Instructions

$ pip3 install requests
$ pip3 install Scrapy

Run Instructions

Get book titles json file with category @goodreads
$ python goodreads/goodreads_title_crawler.py

Get literature json file @sparknotes

$ cd sparknotes
$ scrapy crawl titles, this will get all book titles and links in literature tab
$ scrapy crawl details, this will get all data for each book

    {'downloader/exception_count': 22,
     'downloader/exception_type_count/twisted.internet.error.NoRouteError': 16,
     'downloader/exception_type_count/twisted.internet.error.TimeoutError': 6,
     'downloader/request_bytes': 5375772,
     'downloader/request_count': 5985,
     'downloader/request_method_count/GET': 5985,
     'downloader/response_bytes': 48833687,
     'downloader/response_count': 5963,
     'downloader/response_status_count/200': 5667,
     'downloader/response_status_count/404': 296,
     'dupefilter/filtered': 9,
     'finish_reason': 'finished',
     'finish_time': datetime.datetime(2019, 5, 9, 20, 52, 48, 906551),
     'item_scraped_count': 579,
     'log_count/DEBUG': 6565,
     'log_count/INFO': 396,
     'log_count/WARNING': 1,
     'memusage/max': 78176256,
     'memusage/startup': 50728960,
     'request_depth_max': 12,
     'response_received_count': 5963,
     'retry/count': 22,
     'retry/reason_count/twisted.internet.error.NoRouteError': 16,
     'retry/reason_count/twisted.internet.error.TimeoutError': 6,
     'robotstxt/request_count': 1,
     'robotstxt/response_count': 1,
     'robotstxt/response_status_count/200': 1,
     'scheduler/dequeued': 5984,
     'scheduler/dequeued/memory': 5984,
     'scheduler/enqueued': 5984,
     'scheduler/enqueued/memory': 5984,
     'start_time': datetime.datetime(2019, 5, 9, 5, 29, 9, 874055)}
    2019-05-09 16:52:48 [scrapy.core.engine] INFO: Spider closed (finished)

Data

what we achieve
methods

Index

what we achieve
methods\
1. Clean and organize raw data from @sparknotes $ index.py\
2. For each book, each key/field associate with a string of related text

Query

what we achieve
methods
1. Words in "": phrase search
2. -word: difference search only
3. +word: conjunctive search only
4. "More like this" button
5. input auto completion function
running instruction python3 query.py

Databse

you need a proper SQLite command working on your computer to have access to our databse

UI

we use Bootstrap to build a progressive single page web application
methods

chauncyf/132FinalProject

132 Final Project

Topic

Goal

Team Members

Dependencies

Build Instructions

Run Instructions

Data

Index

Query

Databse

UI

References