
Not working with NixCraft

Closed this issue · 6 comments

rdrview does not work with NixCraft articles (https://www.cyberciti.biz/).
Produces this output: rdrview: no content could be extracted.

It would be nice to be able to read NixCraft articles on my terminal.

I'm using a function to download the articles first and then passing them to rdrview, something like this:

function rdr {
  readonly u=${1:?"The url must be specified."}
  curl -A "Mozilla Firefox" -sL "$u" | rdrview -B lynx --disable-sandbox

I also found that cyberciti.biz is using CloudFlare and you need to pass a JavaScript challenge before loading the content:

<!DOCTYPE html>
<html lang="en-US">
    <title>Just a moment...</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge">
    <meta name="robots" content="noindex,nofollow">
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <link href="/cdn-cgi/styles/challenges.css" rel="stylesheet">

<body class="no-js">
    <div class="main-wrapper" role="main">
    <div class="main-content">
            <div id="challenge-error-title">
                <div class="h2">
                    <span class="icon-wrapper">
                        <div class="heading-icon warning-icon"></div>
                    <span id="challenge-error-text">
                        Enable JavaScript and cookies to continue
curl -A "Mozilla Firefox" -sL "$u" | rdrview -B lynx --disable-sandbox

This does not work. rdrview still cant extract content

I also found that cyberciti.biz is using CloudFlare and you need to pass a JavaScript challenge before loading the content:

Is there a way to pass the CloudFlare challenge? Or at least circumvent it?

I found a workaround to cloudflare, I used curl-impersonate and that seems to work.

Sorry for the long delay, but I guess you fixed this yourself and there's nothing much for me to say here.