WebXplore offers multitude of tools for web scraping, crawling and performing computations on scraped information to determine sentiment values or tone of the author.
This package helps in retrieving information from these sources:
-
Google Search: Get links from any google search query.
-
Website Text: Use an intelligent parser to strip all the HTML tags from webpage contents.
-
Twitter: Given a word or phrase, get related tweets.
-
Reddit: Get the hottest posts given the subreddit and a key phrase.
-
NewsAPI: Retrieve News Articles given topic or phrase.
$ pip install webxplore
or clone the repository.
$ git clone https://github.com/arnavn101/WebXplore.git
Here are steps for using webxplore.
from webxplore.web_searcher import SearchWeb
search_query = SearchWeb('Artificial Intelligence', 5)
print(search_query.returnListLinks())
from webxplore.web_scraper import ScrapeWebsite
scrape_query = ScrapeWebsite('https://en.wikipedia.org/wiki/Artificial_intelligence')
print(scrape_query.return_article())
from webxplore.utils.sentiment import RetrieveSentiments
sentiment_analyzer = RetrieveSentiments('This is a good situation.')
print(sentiment_analyzer.returnFinalSentiment())
from webxplore.utils.summarizer import SummarizeText
textSummarizer = SummarizeText('He feels very scared. He wants to protect himself.', 1)
print(textSummarizer.returnFinalSummary())
from webxplore.utils.analyzer import ToneAnalysis
textTone = ToneAnalysis('Laugh and the world laughs with you.' +
'Weep and you weep alone.', "watsonApiKey")
print(textTone.returnTone())
from webxplore.search.news import RetrieveNewsArticle
newsArticles = RetrieveNewsArticle('Politics', 5, 'newsApiKey')
print(newsArticles.return_articleSentences())
from webxplore.search.reddit import CrawlSubReddit
redditPosts = CrawlSubReddit('stocks', 'amazon', 10, 'RedditClientId',
'RedditClientSecret', 'RedditUserAgent')
print(redditPosts.return_listSentences())
from webxplore.search.twitter import CrawlTwitter
retrieveTweets = CrawlTwitter('tesla', 10, 'TwitterConsumerKey', 'TwitterConsumerSecret',
'TwitterAccountKey', 'TwitterAccountSecret')
print(retrieveTweets.return_tweets())
Anyone is welcome to add any contribution to this repository. All good changes are welcome. Please create a pull request and ensure that it passes all the CI tests.
MIT License Copyright (c) 2020, Arnav Nidumolu