Just some C/C++ scraps, written, or scraped from various sources, with an aim to understand, experiment with utf-8... a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes... all test, experimental, WIP, ...
Like, found a header only utf8 cpp source at https://sourceforge.net/projects/utfcpp/, and this small project generates a utf8-test app using that header source.
Others listed below...
- git - http://git-scm.com/downloads
- cmake - http://www.cmake.org/download/
- Native build tools to suit generator used.
This project uses the cmake build file generator.
- cd build
- cmake ..
- make
- cd build
- cmake ..
- cmake --build . --config Release
The 'build' directory contains convenient build scripts - build-me.bat and build-me.sh - It should be relatively easy to modify these to suit your particular environment.
Of course the cmake GUI can also be used, setting the source directory, and the binary directory to the 'build' folder. And in Windows, the MSVC IDE can be used if this is the chosen generator.
All binaries are experimental. They started as an exercise to understand utf-8 character sequencing. All are WIP!
Given an input file, check if there are any invalid utf-8 sequences in the file.
20171102: Some small fixes and update.
Given an a unicode input file, write out a utf-8.
More tests on being able to recognise, and correctly step over utf-8 character sequences.
For WIN32 only - Is a WinMain app, but opens a console, and re-sets stdout and stdin so printf output has somewhere to go.
Read the first up to 8 bytes from the input files, and attempt to identify if it starts with a known BOM.
Some experiments including and compiling and outputing UTF-8 characters. Used by chk-con only.
Given a decimal, or hexadecimal code point input, show, and output the utf-8 sequence...
Have FUN!
Geoff.
20190912 - 20171102 - 20160127 - 20151209 - 20150420
; eof