This script converts an XML file to JSONL format, with each item represented as a JSON object on a separate line.
xml.etree.ElementTree
: Parses the XML file.json
: Converts Python dictionaries to JSON strings.
This function takes the paths to the XML and JSONL files as arguments and performs the conversion:
- Parses the XML file using
ET.parse
. - Gets the root element of the XML tree.
- Opens the JSONL file for writing.
- Iterates through each
item
element within thechannel
:- Creates an empty dictionary
item_dict
to store the item's data. - Iterates through each child element of the
item
:- Extracts the tag name, removing any namespace prefix.
- Adds the tag name and its text content to
item_dict
.
- Converts
item_dict
to a JSON string usingjson.dumps
. - Writes the JSON string to the JSONL file, followed by a newline character (
\n
).
- Creates an empty dictionary
- Set the
xml_file
andjsonl_file
paths to your actual file paths. - Call the
xml_to_jsonl
function to perform the conversion.
Example:
xml_file = "your_xml_file.xml"
jsonl_file = "output.jsonl"
xml_to_jsonl(xml_file, jsonl_file)
This will create a JSONL file where each line contains a JSON object representing an item from the XML.