medialab/artoo

news.ycombinator.com Refuses to load the script

ThinkDigitalSoftware opened this issue · 17 comments

This error occurs when trying to follow to follow the initial tutorial and clicking the artoo bookmark
VM4037:1 Refused to load the script 'https://medialab.github.io/artoo/public/dist/artoo-latest.min.js' because it violates the following Content Security Policy directive: "script-src 'self' 'unsafe-inline' https://www.google.com/recaptcha/ https://www.gstatic.com/recaptcha/ https://cdnjs.cloudflare.com/".

Hum... That's unfortunate but HackerNews just updated its header to include Content-Security-Policy thus forbidding arbitrary script execution. You'll have to use a browser extension bypassing those headers and I should probably find another site as example in my docs :)

No worries. I figured as much. Thanks for the response. Where can I ask for help with using artoo that's unrelated to this issue?

Well here seems to be a good place to do so :)

To select items by tag + class, here is what you need to write in CSS:

tagname.class

So, using artoo, you'd probably do something of the kind:

artoo.scrape('tagname.class', ...);

OK, so on this page, I'm running
I'm running artoo.scrape('li.card-btn square ', { text: {sel: 'span', method: 'text'}, url: {sel: 'a', attr: 'href'} });
and I'm getting an empty array. I isolated the element that's on the page and pasted it on this pastebin service.
https://dpaste.de/Oz5n
what I wan't to pull out from the page results that look like this

{
    name: 'Yelena M Stepanenko',
    address: 'Spc 157'
}

What am I doing wrong? I also realize that the selector is wrong. I haven't gotten to that part yet I have no CSS background. I'm more of a desktop programmer, so It's a little slower for me to figure this out. Thanks for your patience.

selector should be li.card-btn.square since you attempt to match two classes.

artoo.scrape('li.card-btn.square', { text: {sel: 'span', method: 'text'}, url: {sel: 'a', attr: 'href'} });

Yes. You have several classes listed in your example. You should probably do a quick html/css tutorial before scraping. It will definitely help you achieve your goals. Scraping is basically html/css retro-engineering.

Just going to close this. Researching Jquery and CSS taught me a lot about selectors!

I should probably find another site as example in my docs

Please do @Yomguithereal -- I need a working example as the sprint board to jump further. thx.

How about echojs.com?

Yeah, super.

While you are at it changing the scrapping code, please throw in some comment as well, as you helped me before:

artoo.ajaxSpider(

  // This function is an iterator.
  // Its aim is to return the next url to fecth or false if you want to stop
  //-- 'i' is the index in the iteration of urls
  //-- '$data' is the jQuery-parsed data of the last fetched url
  function(i, $data) {

    // nextUrl is a function that take a jQuery selector and returns
    // the next url to fetch

    // If !i then, we are only starting the spider meaning that the next url
    // is available on the current page rather than the last fetched one.
    return nextUrl(!i ? artoo.$(document) : $data);
  },

  // Spider's settings
  {

    // We want to fetch a maximum of two pages
    limit: 2,

    // We are going to scrape the pages using the scrape definition written above in the doc example
    scrape: scraper,

    // We want to concatenate results so we have [title1, title2, title3, title4]
    // rather than [[title1, title2], [title3, title4]]
    concat: true,

    // Final callback fired when the spider retrieved everything
    //-- 'data' is the scraped data
    done: function(data) {
      artoo.log.debug('Finished retrieving data. Downloading...');
      artoo.savePrettyJson(
        frontpage.concat(data),
        {filename: 'hacker_news.json'}
      );
    }
  }
);

thx