/clj-wcwidth

Pure Clojure implementations of wcwidth/wcswidth

Primary LanguageClojureMozilla Public License 2.0MPL-2.0

main CI Dependencies
dev CI Dependencies

Latest Version Open Issues License

clj-wcwidth

Pure Clojure implementations of the wcwidth and wcswidth POSIX functions (plus some other useful Unicode functions).

Why?

When printing Unicode characters to a fixed-width display device (e.g. a terminal), many Unicode code points have a well-defined "column width". This has been standardised in Unicode Technical Report #11, and implemented as the POSIX functions wcwidth and wcswidth.

Java doesn't provide these functions however, so applications that need to know these widths (e.g. for terminal screen formatting purposes) are left to their own devices. While there are Java libraries that have implemented this themselves (notably JLine), pulling in a large dependency when one only uses a very small part of it is sometimes overkill.

This library provides a small, zero-dependency, pure Clojure implementation of the rules described in UTR-11 (and updated for recent Unicode versions), to avoid having to do that. It also goes further by (optionally) also taking ANSI escape sequences into account.

Why not count?

When supplied with a sequence of characters (normally a String, though also a Java char[]), count simply counts the number of Java chars in that sequence, which, due to a historical oddity of the JVM, is not necessarily the same thing as a Unicode code point (what we generally now think of as a "character"). Specifically, Java chars are a 16 bit "code unit" from UTF-16, and Unicode code points in the supplementary planes are represented by two such code units (and therefore as 2 chars on the JVM).

Furthermore, count doesn't account for non-printing and zero-width Unicode code points; it counts them as chars even though they take up zero width when printed.

Installation

clj-wcwidth is available as a Maven artifact from Clojars.

Trying it Out

Clojure CLI

$ clojure -Sdeps '{:deps {com.github.pmonks/clj-wcwidth {:mvn/version "#.#.#"}}}'  # Where #.#.# is replaced with an actual version number (see badge above)

Leiningen

$ lein try com.github.pmonks/clj-wcwidth

Simple REPL Session

(require '[wcwidth.api :as wcw] :reload-all)

(wcw/wcwidth \A)
; ==> 1
(wcw/wcwidth \©)
; ==> 1
(wcw/wcwidth 0x0000)   ; ASCII NUL (zero width)
; ==> 0
(wcw/wcwidth 0x001B)   ; ASCII ESC (non printing)
; ==> -1
(wcw/wcwidth 0x1F921)  ; 🤡 (double width)
; ==> 2

(wcw/display-width "hello, world")  ; all single width
; ==> 12
(wcw/display-width "hello, 🌏")     ; mixed single and double width
; ==> 9

; Showing the difference between the POSIX wcswidth behaviour and the more
; useful in Clojure, but non-POSIX, display-width behaviour:
(let [example-string (str "hello, world" (wcw/code-point-to-string 0x0084))]   ; non-printing code point
  (wcw/display-width example-string)
  ; ==> 12
  (wcw/wcswidth example-string)
  ; ==> -1

  ; Also show why clojure.core/count is inappropriate for determining display width:
  (count example-string))
  ; ==> 13

; More examples showing why clojure.core/count is inappropriate for determining display width:
(let [example-string (wcw/code-point-to-string 0x10400)]  ; 𐐀
  (wcw/display-width example-string)
  ; ==> 1
  (count example-string))
  ; ==> 2

(let [example-string "👍👍🏻"]
  (wcw/display-width example-string)
  ; ==> 4
  (count example-string))
  ; ==> 6

Usage

The functionality is provided by the wcwidth.api namespace.

Require it in the REPL:

(require '[wcwidth.api :as wcw] :reload-all)

Require it in your application:

(ns my-app.core
  (:require [wcwidth.api :as wcw]))

API Documentation

API documentation is available here. The unit tests provide comprehensive usage examples.

Contributor Information

Contributing Guidelines

Bug Tracker

Code of Conduct

Developer Workflow

This project uses the git-flow branching strategy, with the caveat that the permanent branches are called main and dev, and any changes to the main branch are considered a release and auto-deployed (JARs to Clojars, API docs to GitHub Pages, etc.).

For this reason, all development must occur either in branch dev, or (preferably) in temporary branches off of dev. All PRs from forked repos must also be submitted against dev; the main branch is only updated from dev via PRs created by the core development team. All other changes submitted to main will be rejected.

Build Tasks

wcwidth uses tools.build. You can get a list of available tasks by running:

clojure -A:deps -T:build help/doc

Of particular interest are:

  • clojure -T:build test - run the unit tests
  • clojure -T:build lint - run the linters (clj-kondo and eastwood)
  • clojure -T:build ci - run the full CI suite (check for outdated dependencies, run the unit tests, run the linters)
  • clojure -T:build install - build the JAR and install it locally (e.g. so you can test it with downstream code)

Please note that the deploy task is restricted to the core development team (and will not function if you run it yourself).

License

Copyright © 2022 Peter Monks

Distributed under the Apache License, Version 2.0.

SPDX-License-Identifier: Apache-2.0