/automation-bijoy-to-avro

ANSI and Unicode are encoding standards used across the world by writers and common users. ANSI is an older encoding version and is used in operating systems like Windows 95/ 98 and much older systems. Unicode is a newer version of encoding used in the current day operating systems

Primary LanguageJupyter NotebookMIT LicenseMIT

Bijoy To Avro

aka - ANSI To UNICODE conversion

GitHub license macOS made-with-python Only 32 Kb

Intro

ANSI and Unicode are encoding standards used across the world by writers and common users. ANSI is an older encoding version and is used in operating systems like Windows 95/ 98 and much older systems. Unicode is a newer version of encoding used in the current day operating systems

Installation

Open terminal

    git clone https://github.com/mohsin-riad/automation-bijoy-to-avro.git
    cd automation-bijoy-to-avro/Source/

Workflow

This repository contains conversion of legacy .doc version of ansi document to .docx version of unicode document. which ultimately being converted to .txt document.

Methodology

  • Initially choose Traget directory /Target_Path

  • Run all cells following instrcutions

  • Goto /Target_Path and CLI use command cd .. to backtrack to previous directory

  • Where you will find directory Name starts with /mod-*

  • Sample Traget directory structure

--- Root directory (name doesn't matter)
    |- Traget_directory
        |- a
        |- |- a1 
        |- |- |- a11 
        |- |- |- a12 
        |- |- a2 
        |- b
        |- |- b1 
        |- |- b2 
        |- |- b3 
        |- c
        |- d
        |- e
        |- f

Sample input directory visualization

Sample Output directory visualization

  • Here you can see that the files are being converted to .txt having UNICODE data

Sample Input File

Sample Output File


Conclusion

ভাষা হোক ঊন্মুক্ত