Readlist app

This is a toy, not working

A readability.com replacement to curate readinglists, create ebooks from the scraped articles and send them to kindle.

Proof of concept

Idea is to hack a PoC re-using as much as possible:

Starting the app

Generate a rails secret token:

echo "SECRET_KEY_BASE=`rails secret`" >> rails.env

Start the app:

docker-compose up -d

The app is running on port 3000.

ToDo

  • Basic dev setup
  • Scrape article
  • Ebook conversion
  • Download lists as ebook
  • Send to kindle
  • Curate articles in lists
  • UI
  • Deployment & CI
  • Async processing
  • Better traceability of downstream errors
    • Log scraper responses

More ideas

  • Create articles from RSS feeds

Bugs

Deal with errors from scraper

HTTP/1.1 500 Internal Server Error
Access-Control-Allow-Headers: Origin, Requested-With, Content-Type,
Accept
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Length: 57
Content-Type: application/json; charset=utf-8
Date: Tue, 07 Feb 2017 10:13:36 GMT
ETag: W/"39-L1GaNx8z7kKPaYYYZih4+A"
X-Powered-By: Express

{
    "error": {
        "message": "Empty result from Readability.js."
    }
}

Other messages:

  • "Empty result from Readability.js."

Scraper can't access original page

Error: Unable to access http://www.theroot.com/the-national-interest-all-money-ain-t-good-money-the-1791979266
    at /usr/readable-proxy/scrape.js:37:29
    at ChildProcess.exithandler (child_process.js:197:7)
    at emitTwo (events.js:106:13)
    at ChildProcess.emit (events.js:191:7)
    at maybeClose (internal/child_process.js:877:16)
    at Process.ChildProcess._handle.onexit
(internal/child_process.js:226:5)
Error: Unable to access httpbin.org/post
    at /usr/readable-proxy/scrape.js:37:29
    at ChildProcess.exithandler (child_process.js:197:7)
    at emitTwo (events.js:106:13)
    at ChildProcess.emit (events.js:191:7)
    at maybeClose (internal/child_process.js:877:16)
    at Process.ChildProcess._handle.onexit
(internal/child_process.js:226:5)
{ Error: Empty result from Readability.js.
    at /usr/readable-proxy/scrape.js:37:29
    at ChildProcess.exithandler (child_process.js:197:7)
    at emitTwo (events.js:106:13)
    at ChildProcess.emit (events.js:191:7)
    at maybeClose (internal/child_process.js:877:16)
    at Process.ChildProcess._handle.onexit
(internal/child_process.js:226:5)
  sourceHTML: '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2
Final//EN"><html><head><title>405 Method Not
Allowed</title>\n</head><body><h1>Method Not Allowed</h1>\n<p>The method
is not allowed for the requested URL.</p>\n</body></html>',
  consoleLogs: [] }

Cleanup result is empty

$ get "`dm
ip`:3001/api/get?url=http://www.theroot.com/the-national-interest-all-money-ain-t-good-money-the-1791979266&userAgent=Mozilla%2F5.0%20(Macintosh%3B%20Intel%20Mac%20OS%20X%2010_11_5)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F55.0.2883.95%20Safari%2F537.36"
HTTP/1.1 200 OK
Access-Control-Allow-Headers: Origin, Requested-With, Content-Type,
Accept
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Length: 849
Content-Type: application/json; charset=utf-8
Date: Tue, 07 Feb 2017 10:33:01 GMT
ETag: W/"351-9wwImqGkjUjo5g1COJE61g"
X-Powered-By: Express

{
    "byline": "Andre Perry ",
    "consoleLogs": [
        "FPO - setup failed: Can't find variable:
TBWidgetGawkerComments",
        ""
    ],
    "content": "",
    "dir": "",
    "excerpt": "Editor’s note: Once a month, the National Interest
column will tackle broader questions about what the country should do to
increase educational opportunities for black youths.",
    "isProbablyReaderable": true,
    "length": 0,
    "rawText": "",
    "title": "The National Interest: All Money Ain’t Good Money: The
Role of White Foundations in Social Justice Movements",
    "uri": {
        "host": "www.theroot.com",
        "pathBase": "http://www.theroot.com/",
        "prePath": "http://www.theroot.com",
        "scheme": "http",
        "spec":
"http://www.theroot.com/the-national-interest-all-money-ain-t-good-money-the-1791979266"
    },
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95
Safari/537.36"
}