main | ||
dev |
Pure Clojure implementations of the wcwidth
and wcswidth
POSIX functions (plus some other useful Unicode functions).
When printing Unicode characters to a fixed-width display device (e.g. a terminal), many Unicode code points have a well-defined "column width". This has been standardised in Unicode Technical Report #11, and implemented as the POSIX functions wcwidth
and wcswidth
.
Java doesn't provide these functions however, so applications that need to know these widths (e.g. for terminal screen formatting purposes) are left to their own devices. While there are Java libraries that have implemented this themselves (notably JLine), pulling in a large dependency when one only uses a very small part of it is sometimes overkill.
This library provides a small, zero-dependency, pure Clojure implementation of the rules described in UTR-11 (and updated for recent Unicode versions), to avoid having to do that. It also goes further by (optionally) also taking ANSI escape sequences into account.
Why not count
?
When supplied with a sequence of characters (normally a String
, though also a Java char[]
), count
simply counts the number of Java char
s in that sequence, which, due to a historical oddity of the JVM, is not necessarily the same thing as a Unicode code point (what we generally now think of as a "character"). Specifically, Java char
s are a 16 bit "code unit" from UTF-16, and Unicode code points in the supplementary planes are represented by two such code units (and therefore as 2 char
s on the JVM).
Furthermore, count
doesn't account for non-printing and zero-width Unicode code points; it counts them as char
s even though they take up zero width when printed.
clj-wcwidth
is available as a Maven artifact from Clojars.
$ clojure -Sdeps '{:deps {com.github.pmonks/clj-wcwidth {:mvn/version "#.#.#"}}}' # Where #.#.# is replaced with an actual version number (see badge above)
$ lein try com.github.pmonks/clj-wcwidth
(require '[wcwidth.api :as wcw] :reload-all)
(wcw/wcwidth \A)
; ==> 1
(wcw/wcwidth \©)
; ==> 1
(wcw/wcwidth 0x0000) ; ASCII NUL (zero width)
; ==> 0
(wcw/wcwidth 0x001B) ; ASCII ESC (non printing)
; ==> -1
(wcw/wcwidth 0x1F921) ; 🤡 (double width)
; ==> 2
(wcw/display-width "hello, world") ; all single width
; ==> 12
(wcw/display-width "hello, 🌏") ; mixed single and double width
; ==> 9
; Showing the difference between the POSIX wcswidth behaviour and the more
; useful in Clojure, but non-POSIX, display-width behaviour:
(let [example-string (str "hello, world" (wcw/code-point-to-string 0x0084))] ; non-printing code point
(wcw/display-width example-string)
; ==> 12
(wcw/wcswidth example-string)
; ==> -1
; Also show why clojure.core/count is inappropriate for determining display width:
(count example-string))
; ==> 13
; More examples showing why clojure.core/count is inappropriate for determining display width:
(let [example-string (wcw/code-point-to-string 0x10400)] ; 𐐀
(wcw/display-width example-string)
; ==> 1
(count example-string))
; ==> 2
(let [example-string "👍👍🏻"]
(wcw/display-width example-string)
; ==> 4
(count example-string))
; ==> 6
The functionality is provided by the wcwidth.api
namespace.
Require it in the REPL:
(require '[wcwidth.api :as wcw] :reload-all)
Require it in your application:
(ns my-app.core
(:require [wcwidth.api :as wcw]))
API documentation is available here. The unit tests provide comprehensive usage examples.
This project uses the git-flow branching strategy, with the caveat that the permanent branches are called main
and dev
, and any changes to the main
branch are considered a release and auto-deployed (JARs to Clojars, API docs to GitHub Pages, etc.).
For this reason, all development must occur either in branch dev
, or (preferably) in temporary branches off of dev
. All PRs from forked repos must also be submitted against dev
; the main
branch is only updated from dev
via PRs created by the core development team. All other changes submitted to main
will be rejected.
wcwidth
uses tools.build
. You can get a list of available tasks by running:
clojure -A:deps -T:build help/doc
Of particular interest are:
clojure -T:build test
- run the unit testsclojure -T:build lint
- run the linters (clj-kondo and eastwood)clojure -T:build ci
- run the full CI suite (check for outdated dependencies, run the unit tests, run the linters)clojure -T:build install
- build the JAR and install it locally (e.g. so you can test it with downstream code)
Please note that the deploy
task is restricted to the core development team (and will not function if you run it yourself).
Copyright © 2022 Peter Monks
Distributed under the Apache License, Version 2.0.
SPDX-License-Identifier: Apache-2.0