/cascadia

Go cascadia package command line CSS selector

Primary LanguageGoMIT LicenseMIT

cascadia

All Contributors

MIT License GoDoc Go Report Card Build Status PoweredBy WireFrame

TOC

cascadia - CSS selector CLI tool

The Go Cascadia package implements CSS selectors for html. This is the command line tool, started as a thin wrapper around that package, but growing into a better tool to test CSS selectors without writing Go code:

Usage

$ cascadia

cascadia wrapper
Version 1.3.0 built on 2023-06-30
Copyright (C) 2016-2023, Tong Sun

Command line interface to go cascadia CSS selectors package

Usage:
  cascadia -i in -c css -o [Options...]

Options:

  -h, --help        display help information 
  -i, --in         *The html/xml file to read from (or stdin) 
  -o, --out        *The output file (or stdout) 
  -c, --css        *CSS selectors (can provide more if not using --piece) 
  -t, --text        Text output for none-block selection mode 
  -R, --Raw         Raw text output, no trimming of leading and trailing white space 
  -p, --piece       sub CSS selectors within -css to split that block up into pieces
			format: PieceName=[PieceStyle:]selector_string
			 PieceStyle:
			  RAW : will return the selected as-is
			  ATTR : will return the value of attribute selector_string
			 Else the text will be returned 
  -d, --delimiter   delimiter for pieces csv output [=	]
  -w, --wrap-html   wrap up the output with html tags 
  -y, --style       style component within the wrapped html head 
  -b, --base        base href tag used in the wrapped up html 
  -q, --quiet       be quiet

Its output has two modes, none-block selection mode and block selection mode, depending on whether the --piece parameter is given on the command line or not.

For details about the concept of block and pieces, check out andrew-d/goscrape (in fact, cascadia was initially developed just for it, so that I don't need to tweak Go code, build & run it just to test out the block and pieces selectors). Here is the exception:

  • Inside each page, there's 1 or more blocks - some logical method of splitting up a page into subcomponents.
  • Inside each block, you define some number of pieces of data that you wish to extract. Each piece consists of a name, a selector, and what data to extract from the current block.

This all sounds rather complicated, but in practice it's quite simple. See the next section for details.

In summary,

  • The none-block selection mode will output the selection as HTML source by default
    • but if -t, or --text cli option is provided, the none-block selection mode will output as text instead.
      • By default, such text output will get their leading and trailing white space trimmed.
      • However, if -R, or --Raw cli option is provided, no trimming will be done.
  • The block selection mode will output HTML as text in a tsv/csv table form by default
    • if the --piece selection is prefixed with RAW:, then that specific block selection will output in HTML instead. See the following for details.

Examples

All the three -i -o -c options are required. By default it reads from stdin and output to stdout:

$ echo '<input type="radio" name="Sex" value="F" />' | tee /tmp/cascadia.xml | cascadia -i -o -c 'input[name=Sex][value=F]'
1 elements for 'input[name=Sex][value=F]':
<input type="radio" name="Sex" value="F"/>

Either the input or the output can be followed by a file name:

$ cascadia -i /tmp/cascadia.xml -o -c 'input[name=Sex][value=F]'
1 elements for 'input[name=Sex][value=F]':
<input type="radio" name="Sex" value="F"/>
$ cascadia -i /tmp/cascadia.xml -c 'input[name=Sex][value=F]' -o /tmp/out.html
1 elements for 'input[name=Sex][value=F]':

$ cat /tmp/out.html
<input type="radio" name="Sex" value="F"/>

More other options can be applied too:

# using --wrap-html
$ cascadia -i /tmp/cascadia.xml -c 'input[name=Sex][value=F]' -o /tmp/out.html -w
1 elements for 'input[name=Sex][value=F]':

$ cat /tmp/out.html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<base href="">

</head>
<body>
<input type="radio" name="Sex" value="F"/>
</body>

# using --wrap-html with --style
$ cascadia -i /tmp/cascadia.xml -c 'input[name=Sex][value=F]' -o /tmp/out.html -w -y '<link rel="stylesheet" href="styles.css">'
1 elements for 'input[name=Sex][value=F]':

$ cat /tmp/out.html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<base href="">
<link rel="stylesheet" href="styles.css">
</head>
<body>
<input type="radio" name="Sex" value="F"/>
</body>

Install Debian/Ubuntu package

sudo apt install -y cascadia

Download/install binaries

  • The latest binary executables are available as the result of the Continuous-Integration (CI) process.
  • I.e., they are built automatically right from the source code at every git release by GitHub Actions.
  • There are two ways to get/install such binary executables
    • Using the binary executables directly, or
    • Using packages for your distro

The binary executables

  • The latest binary executables are directly available under
    https://github.com/suntong/cascadia/releases/latest
  • Pick & choose the one that suits your OS and its architecture. E.g., for Linux, it would be the cascadia_verxx_linux_amd64.tar.gz file.
  • Available OS for binary executables are
    • Linux
    • Mac OS (darwin)
    • Windows
  • If your OS and its architecture is not available in the download list, please let me know and I'll add it.
  • The manual installation is just to unpack it and move/copy the binary executable to somewhere in PATH. For example,
tar -xvf cascadia_*_linux_amd64.tar.gz
sudo mv -v cascadia_*_linux_amd64/cascadia /usr/local/bin/
rmdir -v cascadia_*_linux_amd64

Distro package

The repo setup instruction url has been given above. For example, for Debian --

Debian package

curl -1sLf \
  'https://dl.cloudsmith.io/public/suntong/repo/setup.deb.sh' \
  | sudo -E bash

# That's it. You then can do your normal operations, like

sudo apt update
apt-cache policy cascadia

sudo apt install -y cascadia

Install Source

To install the source code instead:

go install github.com/suntong/cascadia@latest

Author

Tong SUN
suntong from cpan.org

Powered by WireFrame
PoweredBy WireFrame
the one-stop wire-framing solution for Go cli based projects, from init to deploy.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

suntong
suntong

💻 🤔 🎨 🔣 ⚠️ 🐛 📖 📝 💡 🔧 📦 👀 💬 🚧 🚇
Hosh
Hosh

💻 🐛 📓
mh-cbon
mh-cbon

🐛 🤔 📓
朱聖黎 (Zhu Sheng Li)
朱聖黎 (Zhu Sheng Li)

🐛 📓
himcc
himcc

💻 🐛 📓
Glenn 'devalias' Grant
Glenn 'devalias' Grant

💻 🐛 📓

This project follows the all-contributors specification. Contributions of any kind welcome!