This is a c library which encodes C string into various formats, however till now, only UTF-8 encoding-decoding has been implemented.
Current version is 0.0.4 ALPHA
.
It is a cross platform C11 library which has been built and run successfully in Windows 10
and Ubuntu Linux
. Please raise concern in case of any issues.
You can clone the repository and simply use make
to build the library.
git clone https://github.com/antaripchatterjee/CEncode.git
cd CEncode
make
The above commands will only build the debug version of the library, and it will also build an executable to test it's functionalities.
Use the below command to test them.
./test_utf8
Incase of any failed test case, please raise a concern.
If everything is fine, just execute the below command to build the stable static libraries.
make install
Use below command to clear all the builds and objects.
make clear
To encode your string into UTF-8 format, add the library and header files into your project and then you must include utf8.h
header in your source code. It contains minimal number of C functions which let us encode the string data.
#include <utf8.h>
...
The below enum
are some methods which are used to identify the encoding or decoing operation.
enum __utf_method {
UNKNOWN, // Unknown method
ENC_UTF_8, // Encoded to UTF-8 text
DEC_UTF_8 // Decoded from UTF-8 text
};
typedef enum __utf_method utf_method; // Type definition of __utf_method
The C structure utf_t
is used as the return type of the function utf8_encode
and utf8_decode
function.
struct __utf_t {
wchar_t *w_str; // Holds the actual encoded or decoded text
size_t w_size; // Can be used to get the lenght of the property w_str
utf_method (*method)(); // Can be used to check if the property w_str is UTF-8 encoded or decoded string.
};
typedef struct __utf_t utf_t; // Type definition for __utf_t
C function utf8_encode
can be used to encode a text into UTF-8 format. The function signature is as below.
extern utf_t* utf8_encode(const wchar_t*, size_t);
It takes two arguments, the first argument is the pointer of wchar_t
and the second argument is the size of the input string, but if we keep it as 0
, then the function will recalculate the size of the input string.
C function utf8_decode
can be used to decode an encoded text into UTF-8 format. The function signature is as below.
extern utf_t* utf8_decode(const wchar_t*, size_t);
It also takes two arguments, the first argument is the pointer of wchar_t
which indeed an encoded text, and the second argument is the size of the encoded string, but if we keep it as 0
, then the function will recalculate the size of the encoded string.
This C function is used to convert a pointer of wchar_t
to the pointer of unsigned char
.
extern int copy_as_ustring(const wchar_t*, unsigned char*, size_t);
This C function does a opposite job that the C function copy_as_u_string
.
extern int copy_as_wstring(const unsigned char*, wchar_t*, size_t);
P.S: For both of the above functions, always add +1 to the length of the input string and pass it as a third argument.
This function stringify the encoding or decoding method.
extern const char* stringify_utf_method(utf_method);
As of now, the methods can be one of the below.
- "UNKNOW"
- "ENC_UTF_8"
- "DEC_UTF_8"
This C function free the allocated memory of a valid pointer of utf_t
. Signature is given below.
extern void utf_free(utf_t*);
This C macro takes a valid pointer of unsigned char
to calculate it's length.
#define ucslen(__ucs) strlen((const char*) __ucs)
This C library comes with MIT License.
Some major and minor changes will be applied in the future versions, however this library can be used safely.