/xml-to-jsonl

Primary LanguagePythonMIT LicenseMIT

Converting XML to JSONL with Python

This script converts an XML file to JSONL format, with each item represented as a JSON object on a separate line.

Required Libraries

  • xml.etree.ElementTree: Parses the XML file.
  • json: Converts Python dictionaries to JSON strings.

Function: xml_to_jsonl

This function takes the paths to the XML and JSONL files as arguments and performs the conversion:

  1. Parses the XML file using ET.parse.
  2. Gets the root element of the XML tree.
  3. Opens the JSONL file for writing.
  4. Iterates through each item element within the channel:
    • Creates an empty dictionary item_dict to store the item's data.
    • Iterates through each child element of the item:
      • Extracts the tag name, removing any namespace prefix.
      • Adds the tag name and its text content to item_dict.
    • Converts item_dict to a JSON string using json.dumps.
    • Writes the JSON string to the JSONL file, followed by a newline character (\n).

Usage

  1. Set the xml_file and jsonl_file paths to your actual file paths.
  2. Call the xml_to_jsonl function to perform the conversion.

Example:

xml_file = "your_xml_file.xml"
jsonl_file = "output.jsonl"

xml_to_jsonl(xml_file, jsonl_file)

This will create a JSONL file where each line contains a JSON object representing an item from the XML.