EU-EDPS/website-evidence-collector

STDOUT (terminal) output format

gotar opened this issue · 2 comments

gotar commented

Hi, thx for a great tool, I just have a question:

When I run scanner without output, just to terminal I got some structure that is not JSON or anything else that looks valid. A mix of different formats.

$ website-evidence-collector --no-output --json --quiet {{url}} -- --ignore-certificate-errors

{ type: 'Cookie.HTTP',
  stack: 
   [ { fileName: 'https://sesshinkan.pl/',
       source: 'set in Set-Cookie HTTP response header for https://sesshinkan.pl/' } ],
  location: 'about:blank',
  raw: '__cfduid=d41ec827fd4b52dd527010c050e046ba11571219993; expires=Thu, 15-Oct-20 09:59:53 GMT; path=/; domain=.sesshinkan.pl; HttpOnly; Secure',
  data: 
   [ { key: '__cfduid',
       value: 'd41ec827fd4b52dd527010c050e046ba11571219993',
       expires: '2020-10-15T09:59:53.000Z',
       domain: 'sesshinkan.pl',
       path: '/',
       secure: true,
       httpOnly: true,
       creation: '2019-10-16T09:59:52.657Z' } ],
  level: 'warn',
  message: '1 Cookie(s) (HTTP) set for host sesshinkan.pl with key(s) __cfduid.',
  timestamp: '2019-10-16T09:59:52.658Z' }
{ type: 'Browser',
  level: 'info',
  message: 'browsing now to https://sesshinkan.pl',
  timestamp: '2019-10-16T09:59:52.445Z' }
{ uri_ins: 'https://sesshinkan.pl',
  uri_refs: [ 'https://sesshinkan.pl' ],
  uri_dest: 'https://sesshinkan.pl/',
  uri_redirects: [],
  host: 'sesshinkan.pl',
  script: 
   { host: 'gotar',
     version: { npm: '0.3.0', commit: '2e18c83-dirty' },
     cmd_args: '--no-output --json --quiet https://sesshinkan.pl -- --ignore-certificate-errors',
     node_version: 'v8.12.0' },
  browser: 
   { name: 'Chromium',
     version: 'HeadlessChrome/78.0.3882.0',
     user_agent: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3617.0 Safari/537.36',
     platform: { name: 'Linux', version: '4.18.5-gentoo' } },
  start_time: 2019-10-16T09:59:52.408Z,
  end_time: 2019-10-16T09:59:56.692Z,
  links: 
   { first_party: 
      [ { href: 'https://sesshinkan.pl/',
          inner_text: 'Sesshin Kan Dojo Gdynia',
          inner_html: 'Sesshin Kan Dojo Gdynia' },
        { href: 'https://sesshinkan.pl/plan_zajec.html',
          inner_text: 'Plan zajęć',
          inner_html: 'Plan zajęć' },
        { href: 'https://sesshinkan.pl/wydarzenia.html',
          inner_text: 'Wydarzenia',
          inner_html: 'Wydarzenia' },
        { href: 'https://sesshinkan.pl/slowniczek.html',
          inner_text: 'Słowniczek',
          inner_html: 'Słowniczek' },
        { href: 'https://sesshinkan.pl/wymagania_egzaminacyjne/kyu.html',
          inner_text: 'Stopnie Kyu',
          inner_html: 'Stopnie Kyu' },
        { href: 'https://sesshinkan.pl/wymagania_egzaminacyjne/dan.html',
          inner_text: 'Stopnie Dan',
          inner_html: 'Stopnie Dan' },
        { href: 'https://sesshinkan.pl/aikido/czym_jest.html',
          inner_text: 'Czym jest?',
          inner_html: 'Czym jest?' },
        { href: 'https://sesshinkan.pl/aikido/historia.html',
          inner_text: 'Historia',
          inner_html: 'Historia' },
        { href: 'https://sesshinkan.pl/aikido/korzysci.html',
          inner_text: 'Korzyści z treningu',
          inner_html: 'Korzyści z treningu' },
        { href: 'https://sesshinkan.pl/biografie/o-sensei.html',
          inner_text: 'O-Sensei',
          inner_html: 'O-Sensei' },
        { href: 'https://sesshinkan.pl/biografie/toyoda.html',
          inner_text: 'Shihan Fumio Toyoda',
          inner_html: 'Shihan Fumio Toyoda' },
        { href: 'https://sesshinkan.pl/biografie/germanov.html',
          inner_text: 'Sensei Edward Germanov',
          inner_html: 'Sensei Edward Germanov' },
        { href: 'https://sesshinkan.pl/kontakt.html',
          inner_text: 'Kontakt',
          inner_html: 'Kontakt' } ],
     social: [],
     keywords: [] },
  browsing_history: [ 'https://sesshinkan.pl' ],
  websockets: {},
  cookies: 
   [ { name: '__cfduid',
       value: 'd41ec827fd4b52dd527010c050e046ba11571219993',
       domain: 'sesshinkan.pl',
       path: '/',
       expires: 1602755992.652875,
       size: 51,
       httpOnly: true,
       secure: true,
       session: false,
       expiresUTC: 2020-10-15T09:59:52.652Z,
       expiresDays: 365,
       log: 
        { stack: 
           [ { fileName: 'https://sesshinkan.pl/',
               source: 'set in Set-Cookie HTTP response header for https://sesshinkan.pl/' } ],
          type: 'Cookie.HTTP',
          timestamp: '2019-10-16T09:59:52.658Z',
          location: 'about:blank' } } ],
  local_storage: {},
  beacons: [],
  hosts: 
   { requests: 
      { count: 3,
        entries: [ 'sesshinkan.pl', 'fonts.googleapis.com', 'fonts.gstatic.com' ] },
     beacons: { count: 0, entries: [] },
     cookies: { count: 1, entries: [ 'sesshinkan.pl' ] },
     local_storage: { count: 0, entries: [] },
     links: { count: 1, entries: [ 'sesshinkan.pl' ] } } }

Is there a possibility to get a valid JSON object? Or output in any other valid format?

Dear @gotar ,

thank you for your interest. The tool produces line-delimited JSON logs on STDERR for consumption with e.g. pino-pretty (see https://github.com/EU-EDPS/website-evidence-collector#use-pretty-printed-live-logs). At the end, the tool produces a JSON file on STDOUT.

You can find in the folder output (if output is enabled) a number of files documented in the FAQ. To my knowledge, all generated output in files is standard compliant. You'll find a JSON and YAML output.

The "json" output you posted is generated by this line:

console.dir(output, {maxArrayLength: null, depth: null});

Indeed, it seems to produce invalid JSON.

I would suggest to disable this output by default and send the content of the JSON file in the output folder to STDOUT if --json is used.

gotar commented

yeah, will be a lot better if just with --no-content switch all data that normally goes to result file go to STDOUT