/DjvuNet

DjvuNet is a cross platform fully managed .NET library for working with Djvu documents which can run on Linux, macOS and Windows. Library has been written in C# and targets .NET Core v3.0 and .NET Standard v2.1 or later. We intend to provide DjVu document processing capabilities on par with DjVuLibre reference library (or even better).

Primary LanguageC#MIT LicenseMIT

DjvuNet Library

CI status of master branch

Windows Linux macOS
Image Build Status Build Status

Introduction

DjvuNet is an open source library designed to process and create documents encoded with DjVu format. Library is written in C# for .NET platform with no external dependencies. Library supports Djvu format specification version 3 up to the minor version 26 (v3.26). The so called "Secure DjVu" format is not supported as this specification was never published. Project was started several years ago by Telavian and after remaining inactive for some time currently is continued at new GitHub DjvuNet repo location. Code is not production ready. There are known bugs but it should work on large number of djvu files (obviously it's still only a subset of all DjVu files which can be found out in the wild). Therefore, use it at your own risk and do not blame us for any of your problems.

Current Status

DjvuNet library is not ready for production use. There are several known bugs which need to be fixed and missing features which need to be implemented first before library could be treated as production ready or fully functional. Furthermore, there are some bugs in image decoder that leave some of images distorted making them useless.

Library supports full .NET Framework 4.7.2 or newer on Windows and .NET Core 3.0.0 or newer on Windows, Linux, macOS.

Project undergoes several architectural and implementations changes, which are done in "dev" branch.

  • DjVu file format parser was optimized and refactored what so far resulted in more than 10x speedup.

  • Image data decoding and encoding with Interpolated Dubuc-Deslauriers-Lemire (DDL) (4, 4) Discrete Wavelet Transform is close to be finished but still has couple bugs which need to be fixed.

  • There was very limited optimization work done in this area with some 30 - 40% improvements in performance and identification of several next optimization targets.

  • ZP arithmetic coder and BZZ encoding/decoding is fully implemented and reached binary compatibility with DjvuLibre. It still awaits final optimizations.

  • JB decoding is implemented but not optimized, encoding is not implemented.

  • Image segmentation for Mixed Raster Content done in DjvuLibre with ColorPalette histogram calculation will be entirely rewritten as there was significant progress in image segmentation algorithms in the last two decades.

  • Support for some DjvuLibre masked image formats is not implemented yet.

  • Test framework is systematically developed and is composed of unit and functional tests. It covers project in top down way and provides around 85% code coverage using 2 586 test cases with implementation target being more than 90% code coverage.

  • Performance tests are based on DjvuNetTest project with some additional benchmarks planned for implementation soon.

DjVu Format Support Validation

Full library format handling validation is realized by using DjVuLibre reference library implementation of DjVu format and supporting tools. .NET Bindings for majority of C API are available in DjvuNet.DjvuLibre project. It builds for x86 and x64 targets only. Perhaps AnyCPU target will be available via NuGet packaging or alternatively via embedding of native binaries in managed assembly - the issue is still open.

DjVuLibre was modified by creating libdjvulibre build integration with DjvuNet projects and modifying library by expanding some C APIs through addition of memory management functions exports, implementation of Json formatted output from some dump functions and tools (djvudump), and addition of functions bypassing s-expressions formatting used in text retrieval.

Modified library used for testing DjvuNet implementation of DjVu format is available here: DjVuLibre for DjvuNet.

Due to more restrictive licensing conditions of DjVuLibre .NET bindings project DjvuNet.DjvuLibre is double licensed under MIT and GPL v2 licenses.

DjvuNet is developed as part of larger effort to create scientific information analysis and understanding framework.

Steps in data analysis comprise data retrieval, reading of data and data conversion into format which later can be processed further. This project covers ssmall part of the pipline dealing with input of data encoded in DjVu format.

License

DjvuNet is licensed under MIT license.

DjvuNet.DjvuLibre is double licensed under MIT license and GPL v2 or later.

DjVuLibre used for format support validation is licensed under GPL v2 or later.

Building

Windows for .NET Core 3.0 and later

Prerequisites

  • Visual Studio 2019 RTM v16.3 with at least following workloads: .NET desktop development, desktop development with C++, .NET Core cross-platform development

  • Git

  • Internet access for restoring dependencies

Building

Building from command line on Windows (tested on Windows 10 with Visual Studio 2019 v16.3 installed).

Open Visual Studio 2019 developer command prompt and clone repository

git clone https://github.com/DjvuNet/DjvuNet.git

Change directory to your repo

cd djvunet

Here one can run build.cmd script from command line (command accepts multiple configuration parameters)

build -p x86 -c Release -t Rebuild -Test -f netcoreapp3.0 (execute build -h to see all available options)

Available configurations:

Debug, Release  (example: -c Debug),  default value Debug

Available platforms: DjvuNet.DjvuLibre and libdjvulibre are built only for x86 and x64 platforms

x86, x64, arm, arm64 (example -p x64, default value AnyCPU is temporarily not supported for CI builds)

Available targets:

Clean, Build, Rebuild, Restore   (example -t Clean), default value Rebuild

To build with Visual Studio open DjvuNet.sln file located in root directory of DjvuNet cloned repository and build DjvuNet.csproj or entire solution.

Testing

Test data are stored in separate repository artifacts. Clone repository with git command (run it from DjvuNet repo root directory):

git clone --depth 1 https://github.com/DjvuNet/artifacts.git

Tests can be run by building and running tests from DjvuNet.Tests.dll and DjvuNet.Wavelet.Tests.dll assemblies under Visual Studio from Test Explorer or using xUnit test runner from command line.

All tests should pass except for skipped.

Performance tests can be run with help of DjvuNetTest project.

Windows for netstandard2.1 or netcoreapp3.0 target

Prerequisites

Visual Studio 2019 v16.3 with the following workloads: .NET desktop development, desktop development with C++, .NET Core cross-platform development VS 2019 versions can be installed side by side and preview version can be safely used side by side with RTM versions.

  • .NET Core 3.0.100 SDK

  • Git

  • Internet access for restoring dependencies

Building

Building from command line on Windows (tested on Windows 10 with Visual Studio 2019 v16.3 installed).

Open Visual Studio 2019 developer command prompt and clone repository

git clone https://github.com/DjvuNet/DjvuNet.git

Change directory to your repo

cd djvunet

Clone DjVuLibre from DjvuNet GutHub (this library was modified to integrate it into DjvuNet project)

git clone https://github.com/DjvuNet/DjVuLibre.git

DjVuLibre repo is now located in DjVuLibre directory of your DjvuNet repo.

From command prompt in DjvuNet root directory run:

 build -c {Configuration} -p {Platform} -f {Framework}

Testing

Test setup for all targets is not streamlined and is a bit involved. To avoid any problems use build script as follows:

build -c {Configuration} -p {Platform} -f {Framework} -Test

It is possible to skip building and testing of libdjvulibre and DjvuNet.DjvuLibre libraries by passing -sn or -SkipNative command line switches to build script:

build -c {Configuration} -p {Platform} -f {Framework} -Test -sn

Linux for netstandard2.1 / netcoreapp3.0 target

Temporarily not supported by scripts, however, one can build manually (below instructions are out of date)

Prerequisites

Tested on Ubuntu 18.04.

Install required tools and dependencies:

sudo apt-get update
sudo apt-get install git zip unzip curl libgdiplus

Install .NET Core 3.0.100 SDK (check for latest instructions here):

  1. Remove any previous preview versions of .NET Core from your system.
  2. Register the Microsoft Product key as trusted.
curl https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > microsoft.gpg
sudo mv microsoft.gpg /etc/apt/trusted.gpg.d/microsoft.gpg
  1. Set up the desired version host package feed.

Ubuntu 17.04

sudo sh -c 'echo "deb [arch=amd64] https://packages.microsoft.com/repos/microsoft-ubuntu-zesty-prod zesty main" > /etc/apt/sources.list.d/dotnetdev.list'
sudo apt-get update

Ubuntu 16.04 / Linux Mint 18

sudo sh -c 'echo "deb [arch=amd64] https://packages.microsoft.com/repos/microsoft-ubuntu-xenial-prod xenial main" > /etc/apt/sources.list.d/dotnetdev.list'
sudo apt-get update

Ubuntu 14.04 / Linux Mint 17

sudo sh -c 'echo "deb [arch=amd64] https://packages.microsoft.com/repos/microsoft-ubuntu-trusty-prod trusty main" > /etc/apt/sources.list.d/dotnetdev.list'
sudo apt-get update
  1. Install .NET Core.
sudo apt-get install dotnet-sdk-2.0.0
  1. Run the dotnet --version command to prove the installation succeeded.
dotnet --version

Building

Clone repository:

git clone https://github.com/DjvuNet/DjvuNet.git

Change directory to cloned DjvuNet repo:

cd DjvuNet

Restore dependencies

dotnet restore DjvuNet.Core.sln

Build either netstandard2.0 or netcoreapp2.0 target

# For netstandard2.0 build
cd DjvuNet.NETStandard2.0
dotnet build -c Release

# For netcoreapp2.0 build
cd DjvuNet.Core
dotnet build -c Release

Testing

Download required test artifacts into DjvuNet repository root and extract them to artifacts directory:

curl -L -o artifacts.zip -s https://github.com/DjvuNet/artifacts/releases/download/v0.7.0.11/artifacts.zip
unzip -q artifacts.zip -d artifacts

Build and run DjvuNet tests (commands are starting from repo root):

cd DjvuNet.Tests.Core
dotnet build -c Release
dotnet xunit -configuration Release -parallel none -nobuild -notrait Category=SkipNetCoreApp -framework netcoreapp2.0

# Return to repo root
cd ..

cd DjvuNet.Wavelet.Tests.Core
dotnet build -c Release
dotnet xunit -configuration Release -parallel none -nobuild -notrait Category=SkipNetCoreApp -framework netcoreapp2.0

macOS for netstandard2.1 / netcoreapp3.0 target

Temporarily not supported by scripts, however, one can build manually (below instructions are out of date)

Supported on macOS 10.12 "Sierra" and later versions

Prerequisites

Download and install the .NET Core SDK from .NET Downloads.

Building and Testing

Follow Linux instructions for Building and Testing

Usage

using DjvuNet;

using(DjvuDocument doc = new DjvuDocument())
{
    doc.Load("Document.djvu");
    if (doc.Pages.Length > 0)
    {
        var firstPage = doc.Pages[0];
        var lastPage = doc.Pages[doc.Pages.Length - 1];

        using(System.Drawing.Bitmap pageImage = firstPage.BuildPageImage())
            firstPage.Save("DocumentTestImage1.png", ImageFormat.Png);

        string firstPageText = firstPage.Text;
        string lastPageText = lastPage.Text;
    }
}
using DjvuNet;

using(DjvuDocument doc = new DjvuDocument("Mcguffey's_Primer.djvu"))
{
    var page = doc.Pages[0];
    using(System.Drawing.Bitmap pageImage = page.BuildPageImage())
    {
        pageImage.Save("TestImage1.png", ImageFormat.Png);
        string pageText = page.Text;
    }
}

Known Issues

  • Tests for .NET Core cannot be run from Visual Studio

  • Tests for .NET Standard targeting libraries have to be compiled as netcoreapp3.0 binaries as xunit does not support netstandard2.1 binaries testing

Reporting Issues

In case of build, test or DjvuNet library usage problems open new issue in GitHub DjvuNet repo providing detailed information on error (logs, command line output, stack trace, minidump) and used system.

We will try to adress all problems quickly unless they depend on missing features or known bugs which will be implemented or fixed according to our roadmap.