XML (Extensible Markup Language) is one of the most famous formats for storing and sharing information among different devices. Some text editors such as Sublime Text are able to parse such files and do some basic operations. In this project, the GUI (Graphical User Interface) based program is able to parse and visualize an XML file and also perform different operations on the XML file like:
- Detecting Errors
- Fixing the Errors that were detected
- Formating ( or Prettifying )
- Converting the XML to JSON (JavaScript Object Notation )
- Minifying
- Compressing the file
YOUTUBE LINK DEMONSTRATING THE GUI:
https://youtu.be/NnOtau86Bkg
The program can view, edit and save XML files. In addition, it has the ability to detect many errors in the XML file including missing tag terminator or in correct terminator and fix them. Another functionality is minifying and prettifying the XML file in which spaces before each tag is either removed to decrease file size or add to restore format respectively. Also it has the functionality of compressing the XML file to nearly 50% of its size. And it has the functionality of converting the XML file to a JSON file.
XML is a markup language created by the World Wide Web Consortium (W3C) to define a syntax for encoding documents that both humans and machines could read. It does this through the use of tags that define the structure of the document, as well as how the document should be stored and transported.
It's probably easiest to compare it to another markup language with which you might be familiar—the Hypertext Markup Language (HTML) used to encode web pages. HTML uses a pre-defined set of markup symbols (short codes) that describe the format of content on a web page.
Like XML, JSON is one of the ways of formatting the data. Such format of data is used by web applications to communicate with each other.
Although, it is light weight text based data interchange format which means, it is simpler to read and write when compared to XML.
The JSON format is syntactically identical to the code for creating JavaScript objects therefor a JavaScript program can easily convert JSON data into native JavaScript objects.
The program is known as XML-Editor, where it can view, edit and save XML files.
In addition, it has the ability to detect many errors in the XML file including missing tag terminator or in correct terminator and fix them.
Another functionality is minifying and prettifying the XML file in which spaces before each tag is either removed to decrease file size or add to restore format respectively.
Also it has the functionality of compressing the XML file to nearly 50% of its size.
And it has the functionality of converting the XML file to a JSON file.
The XML-Editor is designed and implemented using QtCreator v5.15.1 and C++11 language.
Different IDEs where using during development including Eclipse, VScode & VS studio.
QtCreator is used to read the XML file and convert it to a Qstring in which it is converted to a normal string.
XML_Parser function:
The normal string is then passed to XML_Parser function which extracts the information from the string into a vector of strings.
Each string in the vector represents one line in the text where that line will either be an opening tag, body or a tag terminator.
Error Checking and Error Fixing_XML_fixError function:_
The Read vector will be passed to the XML_fixError Function which is responsible of detecting different types of error including missing terminal tags and incorrect terminal tags.
The XML_fixError function will return two vectors, the first one is XML_Original which has the same lines of the Read vector but with a string flag pointed to the lines which have errors.
The other vector is XML_fixed vector, this vector has all the error fixed.
XML_Indent function:
All lines of both vectors include no indents and here comes the XML_Indent function.
This function is responsible of production correct indentation for the XML_file, it takes the desired vector (Original or Fixed) and outputs a vector of spaces that can be combined with the desired vector to produce an XML file with perfect format.
This sequence results in Auto Indentation Fix when viewing an XML file which increases file readability without overwriting the Original XML file permanently.
The user can save the new fixed indentations by using the save button.
XML_Minify function:
Responsible of the production of an XML file with no indentations which decreases the total file size.
The current way of separating the indentations from its lines into different vectors facilitates the minifying process very much.
The XML_Minify function just takes the desired vector (Original or Fixed) and outputs its content in a file directly without using the space vector which results in an XML file with no indentations or extra spaces.
PrintCompressedTree Function:
The function takes the fixed vector and turns it back to a string where Huffman coding Algorithm is used on it.
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The most frequent character gets the smallest code and the least frequent character gets the largest code.
The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one character is not the prefix of code assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity when decoding the generated bitstream.
Steps to build Huffman Tree
1-Input is an array of unique characters along with their frequency of occurrences and output is Huffman Tree.
2-Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent character is at root)
Extract two nodes with the minimum frequency from the min heap.
3-Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap.
4-Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node and the tree is complete.
Some of the binary assigned to each character of file after encoding and compression:
m => 111111 E => 001000011010 7 => 11111011 j => 1000001100
9 => 11111010 H => 001000010
_ => 1111100 , => 00100000111
= => 111001 8 => 0010001 / => 111000 v => 1000111 i => 11011 ' => 001000001101 > => 11010 ` => 0010000011001 < => 11001 S => 0010000000 a => 11000 ) => 10111101010 2 => 1011111 x => 100001
D => 101111011 C => 00100001100
I => 10111101001 h => 0101110
O => 10111101000 e => 000
- => 101111001 k => 100000111
. => 1011110001 P => 0010000111
N => 10111100001111 u => 100010
G => 10111100001110 ? => 00100000011
V => 10111100001101 M => 0010000011000
U => 10111100001100 r => 0110
F => 1011110000101 A => 0010000010
B => 1011110000100 f => 01010
: => 101111000001 ! => 00100000010
R => 1011110000001 p => 00101
" => 11101 t => 0011 q => 1011110000000 n => 0100 y => 101110 c => 010110 s => 10110 5 => 0101111 l => 10101 1 => 011100 d => 10100 4 => 0111010 o => 11110 6 => 0111011 => 1001 0 => 01111
3 => 1000110 z => 10000011010 ( => 10111101011 g => 1000000 w => 001001 b => 10000010
W => 001000011011 T => 10000011011
This will result in a file compression of around 50% of its original size.
XML_JSON function:
JSON (Javascript Object Notation) is another format that is used to represent data. It's helpful to convert the XML into JSON, especially when using javascript as there's tons of libraries and tools that use json notation.
The function takes the fixed vector of strings and each string represents a line in xml file and returns a vector of strings where each string represents one line of the JSON file.
Each opening tag is written and its contents are written after that between braces, if the tags are repeated they aren't written again instead braces of first tag is closed and then comma and open new braces for the content of the new repeated tag, closing tags aren't written instead braces are closed.
- Starting the program.
- Opening an existing XML.
- Using (Filter by Extension) option.
- Open an XML file that has some format errors.
- Push "Check For Errors" button
A dialog appear where errors are pointed to by "< ------error here" with the option to Fix the errors or Cancel and keep the file as it is.
-Selecting "Fix" button.
-Selecting "Minify" button.
-Using "Format (Prettify)" button to add correct indents again.
-Using "Compress" Button.
-Using "Convert To JSON" button.
XML converted to JSON
XML_Parser (Reading XML): O(n)
XML_fixError (Checking errors and fixing them): O(n)
XML_Indent (Adding indentations): O(n)
XML_Minify (Removing indentations and extra spaces): O(n)
XML_to_JSON (Converting XML to JSON): O(n)
PrintCompressedTree (Compressing XML): O(nlogn)