Not working with NixCraft
Closed this issue · 6 comments
rdrview does not work with NixCraft articles (https://www.cyberciti.biz/).
Produces this output: rdrview: no content could be extracted
.
It would be nice to be able to read NixCraft articles on my terminal.
I'm using a function to download the articles first and then passing them to rdrview
, something like this:
function rdr {
readonly u=${1:?"The url must be specified."}
curl -A "Mozilla Firefox" -sL "$u" | rdrview -B lynx --disable-sandbox
}
I also found that cyberciti.biz is using CloudFlare and you need to pass a JavaScript challenge before loading the content:
<!DOCTYPE html>
<html lang="en-US">
<head>
<title>Just a moment...</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<meta name="robots" content="noindex,nofollow">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="/cdn-cgi/styles/challenges.css" rel="stylesheet">
</head>
<body class="no-js">
<div class="main-wrapper" role="main">
<div class="main-content">
<noscript>
<div id="challenge-error-title">
<div class="h2">
<span class="icon-wrapper">
<div class="heading-icon warning-icon"></div>
</span>
<span id="challenge-error-text">
Enable JavaScript and cookies to continue
</span>
</div>
</div>
</noscript>
curl -A "Mozilla Firefox" -sL "$u" | rdrview -B lynx --disable-sandbox
This does not work. rdrview still cant extract content
I also found that cyberciti.biz is using CloudFlare and you need to pass a JavaScript challenge before loading the content:
Is there a way to pass the CloudFlare challenge? Or at least circumvent it?
I found a workaround to cloudflare, I used curl-impersonate and that seems to work.
Sorry for the long delay, but I guess you fixed this yourself and there's nothing much for me to say here.