STDOUT (terminal) output format
gotar opened this issue · 2 comments
Hi, thx for a great tool, I just have a question:
When I run scanner without output, just to terminal I got some structure that is not JSON or anything else that looks valid. A mix of different formats.
$ website-evidence-collector --no-output --json --quiet {{url}} -- --ignore-certificate-errors
{ type: 'Cookie.HTTP',
stack:
[ { fileName: 'https://sesshinkan.pl/',
source: 'set in Set-Cookie HTTP response header for https://sesshinkan.pl/' } ],
location: 'about:blank',
raw: '__cfduid=d41ec827fd4b52dd527010c050e046ba11571219993; expires=Thu, 15-Oct-20 09:59:53 GMT; path=/; domain=.sesshinkan.pl; HttpOnly; Secure',
data:
[ { key: '__cfduid',
value: 'd41ec827fd4b52dd527010c050e046ba11571219993',
expires: '2020-10-15T09:59:53.000Z',
domain: 'sesshinkan.pl',
path: '/',
secure: true,
httpOnly: true,
creation: '2019-10-16T09:59:52.657Z' } ],
level: 'warn',
message: '1 Cookie(s) (HTTP) set for host sesshinkan.pl with key(s) __cfduid.',
timestamp: '2019-10-16T09:59:52.658Z' }
{ type: 'Browser',
level: 'info',
message: 'browsing now to https://sesshinkan.pl',
timestamp: '2019-10-16T09:59:52.445Z' }
{ uri_ins: 'https://sesshinkan.pl',
uri_refs: [ 'https://sesshinkan.pl' ],
uri_dest: 'https://sesshinkan.pl/',
uri_redirects: [],
host: 'sesshinkan.pl',
script:
{ host: 'gotar',
version: { npm: '0.3.0', commit: '2e18c83-dirty' },
cmd_args: '--no-output --json --quiet https://sesshinkan.pl -- --ignore-certificate-errors',
node_version: 'v8.12.0' },
browser:
{ name: 'Chromium',
version: 'HeadlessChrome/78.0.3882.0',
user_agent: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3617.0 Safari/537.36',
platform: { name: 'Linux', version: '4.18.5-gentoo' } },
start_time: 2019-10-16T09:59:52.408Z,
end_time: 2019-10-16T09:59:56.692Z,
links:
{ first_party:
[ { href: 'https://sesshinkan.pl/',
inner_text: 'Sesshin Kan Dojo Gdynia',
inner_html: 'Sesshin Kan Dojo Gdynia' },
{ href: 'https://sesshinkan.pl/plan_zajec.html',
inner_text: 'Plan zajęć',
inner_html: 'Plan zajęć' },
{ href: 'https://sesshinkan.pl/wydarzenia.html',
inner_text: 'Wydarzenia',
inner_html: 'Wydarzenia' },
{ href: 'https://sesshinkan.pl/slowniczek.html',
inner_text: 'Słowniczek',
inner_html: 'Słowniczek' },
{ href: 'https://sesshinkan.pl/wymagania_egzaminacyjne/kyu.html',
inner_text: 'Stopnie Kyu',
inner_html: 'Stopnie Kyu' },
{ href: 'https://sesshinkan.pl/wymagania_egzaminacyjne/dan.html',
inner_text: 'Stopnie Dan',
inner_html: 'Stopnie Dan' },
{ href: 'https://sesshinkan.pl/aikido/czym_jest.html',
inner_text: 'Czym jest?',
inner_html: 'Czym jest?' },
{ href: 'https://sesshinkan.pl/aikido/historia.html',
inner_text: 'Historia',
inner_html: 'Historia' },
{ href: 'https://sesshinkan.pl/aikido/korzysci.html',
inner_text: 'Korzyści z treningu',
inner_html: 'Korzyści z treningu' },
{ href: 'https://sesshinkan.pl/biografie/o-sensei.html',
inner_text: 'O-Sensei',
inner_html: 'O-Sensei' },
{ href: 'https://sesshinkan.pl/biografie/toyoda.html',
inner_text: 'Shihan Fumio Toyoda',
inner_html: 'Shihan Fumio Toyoda' },
{ href: 'https://sesshinkan.pl/biografie/germanov.html',
inner_text: 'Sensei Edward Germanov',
inner_html: 'Sensei Edward Germanov' },
{ href: 'https://sesshinkan.pl/kontakt.html',
inner_text: 'Kontakt',
inner_html: 'Kontakt' } ],
social: [],
keywords: [] },
browsing_history: [ 'https://sesshinkan.pl' ],
websockets: {},
cookies:
[ { name: '__cfduid',
value: 'd41ec827fd4b52dd527010c050e046ba11571219993',
domain: 'sesshinkan.pl',
path: '/',
expires: 1602755992.652875,
size: 51,
httpOnly: true,
secure: true,
session: false,
expiresUTC: 2020-10-15T09:59:52.652Z,
expiresDays: 365,
log:
{ stack:
[ { fileName: 'https://sesshinkan.pl/',
source: 'set in Set-Cookie HTTP response header for https://sesshinkan.pl/' } ],
type: 'Cookie.HTTP',
timestamp: '2019-10-16T09:59:52.658Z',
location: 'about:blank' } } ],
local_storage: {},
beacons: [],
hosts:
{ requests:
{ count: 3,
entries: [ 'sesshinkan.pl', 'fonts.googleapis.com', 'fonts.gstatic.com' ] },
beacons: { count: 0, entries: [] },
cookies: { count: 1, entries: [ 'sesshinkan.pl' ] },
local_storage: { count: 0, entries: [] },
links: { count: 1, entries: [ 'sesshinkan.pl' ] } } }
Is there a possibility to get a valid JSON object? Or output in any other valid format?
Dear @gotar ,
thank you for your interest. The tool produces line-delimited JSON logs on STDERR for consumption with e.g. pino-pretty (see https://github.com/EU-EDPS/website-evidence-collector#use-pretty-printed-live-logs). At the end, the tool produces a JSON file on STDOUT.
You can find in the folder output
(if output is enabled) a number of files documented in the FAQ. To my knowledge, all generated output in files is standard compliant. You'll find a JSON and YAML output.
The "json" output you posted is generated by this line:
console.dir(output, {maxArrayLength: null, depth: null});
Indeed, it seems to produce invalid JSON.
I would suggest to disable this output by default and send the content of the JSON file in the output folder to STDOUT if --json
is used.
yeah, will be a lot better if just with --no-content
switch all data that normally goes to result file go to STDOUT