WebHere is an Objective-C framework for web scraping, packaged for iOS 8+ and OSX 10.10+ platforms.
Briefly put, web scraping is parsing of a website and extraction of data from the HTML pages contained in it.
This work has been inspired by RestKit, but aimed at HTML data and working in a simpler form (no mapping upfront, model classes declare their own building strategy); it is mostly relying on:
- AFNetworking to perform all network operations.
- GDataXML-HTML to extract data using XPath.
Those two projects really deserve attention on their own, make sure to visit their page and understand their APIs, as WebHere will mostly provide a unified facade to their APIs.
- Downloads HTML pages and extracts data into user-defined classes.
- Allows the user to use XPath to query the HTML document.
- Pre-defined methods to extract links and forms.
- Tested.
- At this moment only GET and POST REST methods have been tested.
- Please pay attention to the legal issues when peforming web scraping.
To run the example project, clone the repo, and run pod install
from the Example directory first.
Having a look at the test cases provided should give you an overview of the API.
Example folder contains an iOS app that maps HTML to query Google.
Dependencies are automatically managed by Cocoapod. In case you have to add WebHere to your source tree and use it outside Cocoapod, you must add the following projects along with WebHere:
- AFNetworking for network operations
- GDataXML-HTML for XPath extraction
WebHere is available through CocoaPods. To install it, simply add the following line to your Podfile:
pod "WebHere"
Alternatively, you can add these sources as a git submodule.
Rui Lopes, rui.d.lopes@me.com
WebHere is available under the MIT license. See the LICENSE file for more info.
Project has been covered by unit tests using:
- Specta/Expecta for the generic testing framework.
- Nocilla for stubbing network requests.
Please notice that all tests are performed locally, meaning that no actual network access is needed, all requests being stubed by Nocilla.
- Fork it
- Create your feature branch
- Commit your changes
- Push to the branch
- Create new Pull Request