It scrapes highlights from kinde.amazon.com web site (https://kindle.amazon.com/your_highlights).
- nokogiri
- jsonify
- selenium-webdriver
Using Firefox as default selenium engine. It may be able to specify other ones by passing option[:driver_type] in the constructor.
$ git clone git://github.com/parroty/kindle-your-highlights.git
$ cd kindle-your-highlights
$ bundle
$ export KINDLE_USERNAME="username"
$ export KINDLE_PASSWORD="password"
$ rake update:all
default task is "rake update:recent"
rake convert
call convert:all
rake convert:all
load a local file and convert into xml/html format
rake convert:html
load a local file and convert into html format
rake convert:xml
load a local file and convert into xml format
rake open
call open:html (TODO : mac only solution)
rake open:html
open html file (TODO : mac only solution)
rake open:xml
open xml file (TODO : mac only solution)
rake print
load a local file and print highlight data
rake update
call update:new
rake update:all
retrieve all data from amazon server, and store them into a local file
rake update:new
retrieve only newly arrived items from amazon server, and store them into a local file
rake update:recent
retrieve recent 1 month data from amazon server, and store them into a local file
require 'kindle-your-highlights'
# to create a new KindleYourHighlights object, give it your Amazon email address and password
kindle = KindleYourHighlights.new("foo@bar.com", "password")
kindle.highlights.each do |highlight|
highlight.annotation_id # => a unique value for each highlight, generated by Amazon
highlight.content # => the actual highlight text
highlight.asin # => the Amazon ASIN for the highlight's product
highlight.author # => author of the book from which the highlight is taken
highlight.title # => title of the book from which the highlight is taken
highlight.location # => highlight location in the book
highlight.note # => users' note added along with the highlight
end
kindle.books.each do |book|
book.asin # => the Amazon ASIN for the book
book.author # => author of the book
book.title # => title of the book
book.last_update # => last update of the hightlights for the book (last annoted at)
end
require 'kindle-your-highlights'
# to create a new KindleYourHighlights object, give it your Amazon email address and password
kindle = KindleYourHighlights.new("foo@bar.com", "password", { :page_limit => 100, :day_limit => 31, :wait_time => 2 }) do | h |
puts "loading... [#{h.books.last.title}] - #{h.books.last.last_update}"
end
# xml outputs (needs to create ./xml folder in advance)
KindleYourHighlights::XML.new(:list => kindle.list, :file_name => "xml/out.xml").output
# html outputs (needs to create ./html folder in advance)
KindleYourHighlights::HTML.new(:list => kindle.list, :file_name => "html/out.html").output
require 'kindle-your-highlights'
# to create a new KindleHighlight object, give it your Amazon email address and password
kindle = KindleYourHighlights.new("foo@bar.com", "password", { :page_limit => 100, :wait_time => 2 }) do | h |
puts "loading... [#{h.books.last.title}]"
end
# load previous file, merge with the new one, and dump it again.
if File.exist?("out.dump")
list = KindleYourHighlights::List.load("out.dump")
kindle.merge!(list)
end
KindleYourHighlights::HTML.new(:list => kindle.list, :file_name => "out.html").output
kindle.list.dump("out.dump")
- page_limit : specifies maximum number of pages (books) to be loaded
- day_limit : specifies maximum number of days to be retrieved, based on "Last annotated on" date and today
- stop_date : specifies the "Last annoted on" date to stop collecting more data.
- wait_time : specifies wait time between each page load in seconds (default is 5 seconds)
- block : call-back function which for each page load completion
- driver_type : symbol to identify the selenium driver
XML output example
<?xml version="1.0"?>
<books>
<book>
<asin>ASIN</asin>
<title>TITLE</title>
<author>AUTHOR</author>
<highlights>
<annotation_id>ANNOTATION_ID1</annotation_id>
<content>CONTENT1</content>
</highlights>
<highlights>
<annotation_id>ANNOTATION_ID2</annotation_id>
<content>CONTENT2</content>
</highlights>
</book>
</books>
- 0.3.0
- Change engine from Mechanize to Selenium, as it stopped working due to some unknown reasons.
- 0.2.0
- Adding client-side features for HTML output (searching, highlighting)
- Change output directory in Rakefile (e.g. ../html -> output/html)
- 0.1.0
- Initial upload
This lib was originally from "https://github.com/speric/kindle-highlights", but I created a separate project for adding features and for changing code formats.