XML to JSON Converter: Transform XML Data to JSON
From XML's Dominance to JSON's Takeover — and Why You Still Need to Convert Between Them
W3C publishes XML 1.0. It becomes the universal format for data interchange. SOAP web services, RSS feeds, WSDL service descriptions, SVG graphics, and enterprise middleware all run on XML. Every major platform — Java, .NET, SAP — builds XML processing at its core.
JSON emerges. Douglas Crockford popularizes JSON as a lightweight alternative. json.org launches in 2002. RFC 4627 formalizes the format in 2006. Early adopters: Yahoo!, del.icio.us, Flickr.
JSON displaces XML in new APIs. Twitter, GitHub, Stripe, Twilio all launch with JSON APIs. REST replaces SOAP as the dominant API style. Stack Overflow's developer surveys begin tracking JSON as the most common data format.
XML remains entrenched in large swaths of the industry. Healthcare (HL7 FHIR XML, CDA documents), finance (FpML, XBRL financial reporting), publishing (DITA, DocBook), government (NIEM, GML geospatial data), and any enterprise system built before 2010 still run on XML. RSS and Atom feeds are XML. SVG is XML. Maven's pom.xml is XML. Android layouts are XML.
The result: developers in 2026 regularly need to convert XML to JSON — reading a SOAP legacy API response, parsing an RSS feed, consuming an external partner's data in XML format, or migrating data from an enterprise system to a modern JSON API. The conversion is not always straightforward. XML has structural features that have no direct JSON equivalent, and the conversion decisions you make affect how downstream code consumes the data.
Key Takeaways
- ▸XML-to-JSON conversion is inherently lossy for XML attributes, namespaces, comments, and processing instructions — JSON has no equivalent constructs for these.
- ▸The biggest conversion trap: an XML element with a single child becomes an object, but the same element with two children becomes an array — most libraries have an option to force arrays to avoid this inconsistency.
- ▸Use fast-xml-parser for Node.js (5–10× faster than xml2js), xmltodict for Python, and
encoding/xmlfor Go. - ▸For multi-GB XML files, never load the entire document into memory — use SAX-based streaming parsers (node-expat, saxes in Node.js; xml.sax in Python).
- ▸Always validate your JSON output against your intended schema — XML attributes and element values merge in unexpected ways depending on which library you use.
XML vs. JSON: The Structural Mismatch
XML and JSON were designed for different purposes. XML was designed for document markup — it inherits from SGML and HTML, where mixed content (text interleaved with markup tags) is fundamental. JSON was designed for data interchange — pure data structures with no document semantics. This philosophical difference creates several structural mismatches:
| XML Feature | JSON Equivalent | Conversion Approach | Information Loss? |
|---|---|---|---|
| Attributes | None | Prefix with @ or $, or merge into element object | No (if preserved) |
| Namespaces | None | Keep prefix in key name, or strip prefixes entirely | Yes (if stripped) |
| Comments | None | Dropped — JSON has no comment syntax | Yes (always) |
| Processing instructions | None | Dropped — no JSON equivalent | Yes (always) |
| Mixed content | None (complex) | Special "#text" key for text nodes alongside element children | Structural change |
| CDATA sections | String | Delimiters stripped, content becomes string value | No |
| Document type (DOCTYPE) | None | Dropped — JSON has no schema reference syntax | Yes (schema info) |
| Element ordering | Preserved | JSON object keys are unordered (per spec) | Yes (order semantics) |
| Multiple same-name children | Array | Collected into array | No (if done correctly) |
| Text + attributes on same element | None | Special "#text" key + "@attr" keys | Structural change |
The Three Conversion Problems That Break Downstream Code
Problem 1: The One-Child-vs-Many-Children Array Inconsistency
This is the most common source of bugs in XML-to-JSON conversion. When an XML element has a single child of a given type, most libraries represent it as an object. When it has multiple children, they become an array. The result: your JSON structure changes based on the data, not the schema.
<!-- XML: one item vs many items -->
<cart>
<item>Laptop</item> <!-- single item -->
</cart>
<cart>
<item>Laptop</item> <!-- multiple items -->
<item>Mouse</item>
</cart>
// xml2js default output — structure changes!
// Single item → object:
{ cart: { item: "Laptop" } }
// Multiple items → array:
{ cart: { item: ["Laptop", "Mouse"] } }
// Your downstream code that worked for one item:
cart.item.toUpperCase() // works when item is a string
cart.item.toUpperCase() // TypeError: cart.item.toUpperCase is not a function
// when item becomes an arrayThe fix: use the library's "force array" option, or normalize after conversion:
// xml2js: explicitArray: true forces all values to arrays
const xml2js = require('xml2js')
const parser = new xml2js.Parser({ explicitArray: true })
// Result is always an array — consistent regardless of child count:
{ cart: { item: ["Laptop"] } } // one item
{ cart: { item: ["Laptop", "Mouse"] } } // two items
// fast-xml-parser: isArray callback for targeted forcing
const { XMLParser } = require('fast-xml-parser')
const parser = new XMLParser({
isArray: (tagName) => ['item', 'product', 'order'].includes(tagName),
ignoreAttributes: false,
attributeNamePrefix: '@_',
})Problem 2: Attributes vs. Element Values on the Same Element
XML elements can have both attributes and text content simultaneously. JSON objects cannot natively express this:
<!-- XML: element with attributes AND text content -->
<price currency="USD" vat="false">19.99</price>
<!-- The challenge: currency and vat are attributes; 19.99 is the element value.
JSON has no "element + attributes" concept. -->
// Approach 1: @-prefix for attributes, #text for text content
{
"price": {
"@currency": "USD",
"@vat": "false",
"#text": "19.99"
}
}
// Approach 2: Flat merge (loses attribute/element distinction)
{
"price": {
"currency": "USD",
"vat": "false",
"_value": "19.99"
}
}
// Approach 3: Discard attributes (most lossy but simplest)
{ "price": "19.99" }The @-prefix convention (used by BadgerFish, JAXB, and many libraries) is the most portable choice — it preserves information while being predictable for downstream consumers.
Problem 3: XML Namespaces
<!-- SOAP envelope with namespaces -->
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soap:Body>
<GetOrderResponse xmlns="http://example.com/orders">
<Order>
<xsi:type>StandardOrder</xsi:type>
<orderId>12345</orderId>
</Order>
</GetOrderResponse>
</soap:Body>
</soap:Envelope>
// If you strip namespace prefixes:
{
"Envelope": {
"Body": {
"GetOrderResponse": {
"Order": { "type": "StandardOrder", "orderId": "12345" }
}
}
}
}
// Problem: if another namespace uses a "type" element, it collides with xsi:type
// If you preserve namespace prefixes as keys:
{
"soap:Envelope": {
"soap:Body": {
"GetOrderResponse": {
"Order": { "xsi:type": "StandardOrder", "orderId": "12345" }
}
}
}
}
// Problem: colon in JSON key requires bracket notation: obj["soap:Envelope"]
// Most languages handle this, but it's awkwardFor SOAP API integration, the most practical approach is to parse the XML with a namespace-aware library and explicitly map the elements you care about to a clean JSON structure — rather than relying on generic XML-to-JSON conversion.
XML to JSON Conversion Code in Three Languages
Python: xmltodict
xmltodict is the most Pythonic XML-to-JSON library — it produces a dictionary that mirrors the XML structure using the @-prefix convention for attributes. Per its PyPI page, xmltodict has over 5 million monthly downloads.
# pip install xmltodict
import xmltodict
import json
def xml_to_json(xml_string: str, indent: int = 2) -> str:
"""Convert XML string to JSON string using xmltodict."""
# force_list ensures elements always become lists, not dicts
# when there's only one child — prevents the inconsistency bug
force_list = ('item', 'product', 'order', 'entry', 'record')
data = xmltodict.parse(
xml_string,
force_list=force_list,
attr_prefix='@', # attributes prefixed with @
cdata_key='#text', # text content stored under #text key
)
return json.dumps(data, indent=indent, ensure_ascii=False)
# Example: Parse an RSS feed
rss_xml = """
<rss version="2.0">
<channel>
<title>My Blog</title>
<item>
<title>First Post</title>
<link>https://example.com/post/1</link>
<pubDate>Mon, 21 Apr 2026 10:00:00 GMT</pubDate>
</item>
<item>
<title>Second Post</title>
<link>https://example.com/post/2</link>
<pubDate>Tue, 22 Apr 2026 10:00:00 GMT</pubDate>
</item>
</channel>
</rss>
"""
json_output = xml_to_json(rss_xml)
data = json.loads(json_output)
# Access feed items safely:
items = data['rss']['channel']['item'] # always a list (due to force_list)
for item in items:
print(item['title'], item['link'])
# Convert from file:
with open('feed.xml', 'rb') as f:
data = xmltodict.parse(f)
json_output = json.dumps(data, indent=2)
# Reverse: JSON dict back to XML
xml_output = xmltodict.unparse(data, pretty=True, indent=' ')For SOAP APIs, use xml.etree.ElementTree from the standard library — it gives you explicit namespace handling:
import xml.etree.ElementTree as ET
import json
def parse_soap_order(xml_string: str) -> dict:
"""Parse a SOAP GetOrderResponse without generic conversion."""
ns = {
'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
'orders': 'http://example.com/orders'
}
root = ET.fromstring(xml_string)
# Navigate with explicit namespace awareness
body = root.find('soap:Body', ns)
response = body.find('orders:GetOrderResponse', ns)
order = response.find('orders:Order', ns)
return {
'orderId': order.findtext('orders:orderId', namespaces=ns),
'status': order.findtext('orders:status', namespaces=ns),
'amount': float(order.findtext('orders:amount', namespaces=ns, default='0')),
}Node.js: fast-xml-parser
fast-xml-parser benchmarks 5–10× faster than the older xml2js library for large documents and has no external dependencies. It is the recommended choice for new Node.js projects. According to its npm page, it processes over 90 million downloads per month as of 2025.
// npm install fast-xml-parser
import { XMLParser, XMLBuilder, XMLValidator } from 'fast-xml-parser'
// Validate before parsing
const isValid = XMLValidator.validate(xmlString)
if (isValid !== true) {
throw new Error(`Invalid XML: ${isValid.err.msg} at line ${isValid.err.line}`)
}
const parser = new XMLParser({
ignoreAttributes: false, // preserve attributes
attributeNamePrefix: '@_', // prefix attributes with @_
allowBooleanAttributes: true,
parseAttributeValue: true, // parse "123" attribute values as numbers
parseTagValue: true, // parse element text as numbers/booleans where appropriate
// Force specific tags to always be arrays — critical for stability
isArray: (tagName: string) =>
['item', 'product', 'entry', 'record', 'row'].includes(tagName),
})
const result = parser.parse(xmlString)
const json = JSON.stringify(result, null, 2)
// xml2js (older, slower, but widely used in legacy code):
import xml2js from 'xml2js'
const parser2 = new xml2js.Parser({
explicitArray: true, // always arrays — avoids the one-child bug
mergeAttrs: false, // keep attributes separate from element values
attrkey: '@', // attribute key prefix
charkey: '#text', // text content key
explicitCharkey: true, // always include #text even when no attributes
})
const result2 = await parser2.parseStringPromise(xmlString)
// Streaming for large XML files (SAX-based):
import { createReadStream } from 'fs'
import { SAXParser } from 'sax'
const saxStream = createReadStream('large-data.xml')
const sax = new SAXParser(true, {}) // strict mode
sax.onopentag = (node) => { /* handle element start */ }
sax.ontext = (text) => { /* handle text content */ }
sax.onclosetag = (name) => { /* handle element end */ }
saxStream.pipe(sax)Go: encoding/xml
Go's standard library encoding/xml provides the most control — you define Go structs that exactly match the XML structure, ensuring type-safe conversion. This is more code but produces the cleanest, most predictable JSON output:
package main
import (
"encoding/json"
"encoding/xml"
"fmt"
"strings"
)
// Define structs matching the XML structure
type RSSFeed struct {
XMLName xml.Name `xml:"rss" json:"-"`
Version string `xml:"version,attr" json:"version"`
Channel Channel `xml:"channel" json:"channel"`
}
type Channel struct {
Title string `xml:"title" json:"title"`
Link string `xml:"link" json:"link"`
Description string `xml:"description" json:"description"`
Items []Item `xml:"item" json:"items"`
}
type Item struct {
Title string `xml:"title" json:"title"`
Link string `xml:"link" json:"link"`
Description string `xml:"description" json:"description"`
PubDate string `xml:"pubDate" json:"pubDate"`
GUID string `xml:"guid" json:"guid"`
}
func ConvertRSSToJSON(xmlData string) (string, error) {
var feed RSSFeed
if err := xml.NewDecoder(strings.NewReader(xmlData)).Decode(&feed); err != nil {
return "", fmt.Errorf("xml decode: %w", err)
}
jsonData, err := json.MarshalIndent(feed, "", " ")
if err != nil {
return "", fmt.Errorf("json marshal: %w", err)
}
return string(jsonData), nil
}
// Generic approach using xml.Token for unknown XML:
func GenericXMLToJSON(xmlData string) (map[string]interface{}, error) {
decoder := xml.NewDecoder(strings.NewReader(xmlData))
// Note: generic XML-to-map conversion in Go requires building
// a recursive decoder — use the etree or xmlquery library for this:
// go get github.com/beevik/etree
// go get github.com/antchfx/xmlquery
return nil, nil // implement with etree/xmlquery for production
}Online XML to JSON Converter Tools Compared
For one-off conversions or testing, online tools are faster than writing code. Here's an honest comparison of the main options based on features, not ranking:
| Tool | Strengths | Weaknesses | Privacy |
|---|---|---|---|
| FreeFormatter.com | Widely known, many output options, handles large inputs | Sends data to server; ad-heavy UI | Server-processed |
| ConvertSimple.com | Clean UI, fast, handles CDATA and attributes | Limited error messages on malformed XML | Server-processed |
| CodeBeautify.org | Many format tools in one place, batch conversion | Cluttered UI, slow on large files | Server-processed |
| BytePane XML to JSON | In-browser processing (no upload), fast, clean output with attribute support | Less configurable than CLI tools; no namespace handling | Client-side only |
| transform.tools | Open source, many format conversions, handles namespaces | Occasional inconsistencies with edge cases | Client-side |
For sensitive data (internal API responses, PII-containing XML), prefer client-side tools or local CLI conversion. Server-processed tools upload your data to a third-party server.
After converting XML to JSON, use the BytePane JSON formatter to validate the JSON syntax and explore the structure. For the reverse operation, the YAML to JSON guide covers similar conversion patterns with YAML-specific edge cases.
Real-World XML to JSON Use Cases
RSS and Atom Feed Aggregation
RSS (Really Simple Syndication, now version 2.0) and Atom are XML formats. According to the W3C Validator service, RSS 2.0 remains the most widely deployed feed format in 2025, with millions of active feeds. Aggregators, podcast apps, and content pipeline tools all convert RSS XML to JSON for storage and APIs.
# Fetch and convert an RSS feed to JSON
import feedparser # pip install feedparser
import json
# feedparser handles both RSS and Atom, normalizes the output
feed = feedparser.parse('https://example.com/feed.xml')
# feedparser already gives you a dict — just serialize to JSON
articles = [
{
'title': entry.title,
'link': entry.link,
'published': entry.published,
'summary': entry.summary,
'author': getattr(entry, 'author', None),
}
for entry in feed.entries
]
print(json.dumps(articles, indent=2))SOAP to REST API Migration
Enterprise integration projects frequently involve wrapping a SOAP (XML) service with a REST (JSON) facade — so modern frontend applications can consume legacy backend services. According to a MuleSoft Connectivity Benchmark 2025, 72% of enterprises still maintain at least one SOAP-based integration. A typical facade pattern:
// Express REST endpoint wrapping a SOAP service
import express from 'express'
import axios from 'axios'
import { XMLParser } from 'fast-xml-parser'
const router = express.Router()
const xmlParser = new XMLParser({
ignoreAttributes: false,
attributeNamePrefix: '@_',
isArray: (tagName) => ['Order', 'LineItem'].includes(tagName),
})
router.get('/orders/:id', async (req, res) => {
const soapEnvelope = `
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<GetOrder xmlns="http://example.com/orders">
<OrderId>${req.params.id}</OrderId>
</GetOrder>
</soap:Body>
</soap:Envelope>
`
const soapResponse = await axios.post(process.env.SOAP_ENDPOINT!, soapEnvelope, {
headers: { 'Content-Type': 'text/xml; charset=utf-8', 'SOAPAction': 'GetOrder' },
})
const parsed = xmlParser.parse(soapResponse.data)
// Navigate to the response body, stripping SOAP envelope
const order = parsed?.['soap:Envelope']?.['soap:Body']?.GetOrderResponse?.Order
if (!order) {
return res.status(404).json({ error: 'Order not found' })
}
// Return clean JSON, not raw XML structure
res.json({
id: order.OrderId,
status: order.Status,
total: parseFloat(order.Total),
createdAt: order.CreatedDate,
items: (order.LineItem || []).map((item: any) => ({
sku: item.SKU,
quantity: parseInt(item.Quantity),
unitPrice: parseFloat(item.UnitPrice),
}))
})
})Maven/Gradle Build Configuration Parsing
Maven's pom.xml files are XML. Build analysis tools, dependency audit systems, and monorepo management tools often convert pom.xml to JSON for programmatic inspection. The Apache Maven Project has over 400,000 published artifacts in Maven Central, all with XML descriptors.
When NOT to Convert XML to JSON
Not every XML document should be converted to JSON. Some XML is genuinely document-centric and loses critical information in conversion:
- SVG files — SVG is XML, but converting SVG to JSON destroys the document. Browsers render SVG natively; embed it as-is.
- DocBook and DITA documentation — technical documentation formats where element ordering, mixed content, and semantic markup are all meaningful. Converting to JSON loses document structure.
- XHTML — HTML content in XML format. Use a DOM parser or HTML-specific tools, not generic XML-to-JSON conversion.
- XML with complex namespaces — When namespace URIs carry semantic meaning (XBRL financial data, HL7 CDA clinical documents), generic conversion loses the namespace information required to interpret the data correctly.
- XSD-validated enterprise XML — If the consuming system can accept XML natively, don't convert. Conversion introduces a transformation layer that can go wrong.
For a broader comparison of data formats including when XML, JSON, and YAML each make sense, the JSON vs YAML vs XML comparison covers the trade-offs in detail. If you need to validate the XML before converting, use the BytePane XML formatter to check syntax first.
Frequently Asked Questions
Can all XML be converted to JSON?
Not without loss. XML features with no JSON equivalent — attributes (handled via @-prefix conventions), namespaces (kept as key prefixes or stripped), comments (dropped), processing instructions (dropped), and mixed content (element text interleaved with child elements) — require lossy conversion decisions. Simple data-oriented XML converts cleanly; document-centric XML does not.
How are XML attributes converted to JSON?
No single standard exists. Common approaches: prefix with @ (most widely adopted — JAXB, BadgerFish, fast-xml-parser's attributeNamePrefix option), merge into the element object (flat), or create a special $ key for attributes. The @-prefix is recommended — it preserves information and is predictable for downstream consumers.
Why do APIs still use XML in 2026?
Enterprise systems built on SOAP and legacy middleware remain in production. Healthcare (HL7, CDA), finance (FpML, XBRL), government (NIEM), and publishing (DITA) standardized on XML before JSON existed. Per the MuleSoft Connectivity Benchmark 2025, 72% of enterprises maintain at least one SOAP integration. These ecosystems move slowly.
What is the fastest XML to JSON converter for Node.js?
fast-xml-parser benchmarks 5–10× faster than xml2js for large documents, processes ~40MB/s on typical XML, and has zero external dependencies. For multi-GB files, use a SAX-based streaming approach with node-expat or saxes to avoid loading the full document into memory. For most production workloads under 10MB, any maintained library works adequately.
What is CDATA in XML and how does it convert to JSON?
CDATA sections (written as <![CDATA[...]]>) mark text that should not be parsed as XML markup — useful for embedding HTML, JavaScript, or SQL inside XML. When converting to JSON, the CDATA delimiters are stripped and the content becomes a plain JSON string. No information is lost; only the CDATA wrapping disappears.
How do XML namespaces affect JSON conversion?
XML namespaces have no JSON equivalent. Converters either strip namespace prefixes (losing info when same local names exist across namespaces) or keep prefixes as part of key names ("soap:Body" becomes a JSON key "soap:Body"). For complex namespace usage like SOAP or XBRL, use a namespace-aware parser and explicitly map the fields you need rather than relying on generic conversion.
Convert XML to JSON Instantly
Paste your XML and get clean JSON output in your browser — no server uploads, processes entirely client-side. Handles attributes, CDATA, and nested elements.