XML - Extensible Markup Language

  1. SGML- Standard Generalized Markup Language
  2. W3C - World Wide Web Consortium
  3. IDL - Interface description language, any computer language used to describe a software component's interface

https://www.w3schools.com/xml/default.asp

XML versus HTML

  1. XML was designed to carry data - with focus on what data is
  2. HTML was designed to display data - with focus on how data looks
  3. XML tags are not predefined like HTML tags are

http://courses.cs.vt.edu/~cs1204/XML/htmlVxml.html

XML Simplifies Things

  1. It simplifies data sharing
  2. It simplifies data transport
  3. It simplifies platform changes
  4. It simplifies data availability

http://xml.silmaril.ie/whyxml.html

The document type (DOCTYPE) declaration

Consists of an internal, or references an external Document Type Definition (DTD). It can also have a combination of both internal and external DTDs. The DTD defines the constraints on the structure of an XML document. It declares all of the document's element types , children element types, and the order and number of each element type. It also declares any attributes, entities, notations, processing instructions, comments, and PE references in the document.

https://en.wikipedia.org/wiki/Document_type_declaration

A Uniform Resource Identifier (URI)

Is a string of characters which identifies an Internet Resource. The most common URI is the Uniform Resource Locator (URL) which identifies an Internet domain address. Another, not so common type of URI is the Uniform Resource Name (URN).

https://en.wikipedia.org/wiki/Uniform_Resource_Identifier

The XML DOM is a standard

For how to get, change, add, or delete XML elements.

  1. xmlDoc - the XML DOM object created by the parser.
  2. getElementsByTagName("title")[0] - get the first <title> element
  3. childNodes[0] - the first child of the <title> element (the text node)
  4. nodeValue - the value of the node (the text itself)

https://www.w3schools.com/xml/dom_intro.asp

UTF-8 is the default character encoding for XML documents.

A character in UTF8 can be from 1 to 4 bytes long. UTF-8 can represent any character in the Unicode standard. UTF-8 is backwards compatible with ASCII. UTF-8 is the preferred encoding for e-mail and web pages

https://en.wikipedia.org/wiki/UTF-8

Shift JIS (Shift Japanese Industrial Standards)

Is a character encoding for the Japanese language

https://en.wikipedia.org/wiki/Shift_JIS

Tools for parsing

  1. Through parsers using the API Java API for XML Processing (JAXP), two parsers are provided with the above API : Simple API for XML (SAX) & Document Object Model (DOM).
  2. Through the new API Java Architecture for XML Binding (JAXB): Using JDOM an open-source API & Using Apache Xerces 

Transform XML into XHTML using XSLT

https://www.w3schools.com/xml/xsl_transformation.asp

Parsing, Searching elements

https://www.tutorialspoint.com/java_xml/index.htm http://homepage.cs.latrobe.edu.au/mjsutherland/WS/current/notes/lecture060_XML_050.html