/UnicodeBOMInputStream

Doing things right, in the name of Sun / Oracle

Primary LanguageJavaOtherNOASSERTION

UnicodeBOMInputStream

A helper class to skip Unicode BOMs at the beginning of input streams.

I initially released this class as a Stack Overflow answer and it apparently got copy-pasted into several Java projects already. However, code put as answers on Stack Overflow is licensed under CC-BY-SA 3.0 which may not suit everybody.


Why?

Many years have passed since I wrote this class and today Java still doesn't properly deal with UTF-8 Unicode Byte Order Marks (BOMs) at the beginning of data. In 2001, someone opened bug JDK-4508058 with the sound expectation Java should detect and skip UTF-8 BOMs at the beginning of UTF-8 streams, the same way it does for e.g. UTF-16.

Bug JDK-4508058 remained open for a while, then got fixed and ultimately reverted because some other great programmers relied on that exact same bug:

the Java EE 5 RI and SJSAS 9.0 has been relying on detecting a BOM, setting the appropriate encoding, and discarding the BOM bytes before reading the input

See, they're complaining because shipped code breaks if/when JDK behavior changes. And instead of fixing JDK-4508058 and accept this would be an annoyance only for Java EE 5 RI and SJSAS 9.0 users, people in charge at Sun decided we're all living in a better world if JDK-4508058 gets closed as "won't fix". Because fuck you, just skip the BOM yourself.


Usage

Wrap any InputStream with UnicodeBOMInputStream and use getBOM() and/or skipBOM() methods. See UnicodeBOMInputStreamUsage.java.


If you find this library useful and decide to use it in your own projects please drop me a line @gpakosz.

If you use it in a commercial project, consider using Gittip.