Knowledge Base · Data formats

What is XML?

XML is the markup language behind feeds, documents, and config the world over. Here is the syntax, well-formed vs valid, namespaces, and where XML still beats JSON.

What XML is

XML (Extensible Markup Language) is a text format for representing structured data as a tree of elements. Unlike HTML, it has no fixed tag set, you invent element names that describe your data. For two decades it was the backbone of data interchange, document formats, and config, and it still runs huge swaths of the web: every RSS and Atom feed, office document formats, SVG, SOAP web services, and countless enterprise systems are XML.

The syntax: elements & attributes

An XML document has exactly one root element containing nested child elements. Each element is a tag pair; the opening tag may carry attributes as quoted name/value pairs.

<?xml version="1.0" encoding="UTF-8"?>
<book category="reference">
  <title lang="en">Structure and Interpretation</title>
  <authors>
    <author>Abelson</author>
    <author>Sussman</author>
  </authors>
  <year>1985</year>
</book>

Here book is the root, category is an attribute, and title, authors, and year are child elements. The convention: data in elements, metadata in attributes.

Well-formed vs valid

Well-formedValid
MeansFollows XML syntax rulesWell-formed and matches a schema
ChecksOne root, all tags closed, proper nesting, quoted attributesAllowed elements, order, and types (via XSD/DTD)
Needs a schema?NoYes (XSD or DTD)

Every XML parser requires well-formed input. Validation is an extra step you opt into when the exact shape matters. Five characters must be escaped in text: &amp;, &lt;, &gt;, &quot;, &apos;.

Namespaces

When one document combines vocabularies, namespaces (declared with xmlns) keep element names from colliding, so a title defined by one standard never clashes with a title from another. They are what make formats like Atom, SOAP, and SVG able to mix cleanly, and a common source of confusion for newcomers, because the same tag name can mean different things under different namespaces.

The XML behind RSS

Feeds are the most familiar XML most people meet. An RSS feed is just an XML document with an agreed vocabulary, that shared structure is exactly why any reader can parse any feed:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>KB Cafe</title>
    <item>
      <title>A new reference is live</title>
      <link>https://kbcafe.com/what-is-xml</link>
    </item>
  </channel>
</rss>

Check one against the spec with the RSS validator, or read the full guide to RSS, Atom & OPML.

XML vs JSON

XMLJSON
ShapeDocument treeObjects & arrays
AttributesYesNo
NamespacesYesNo
CommentsYesNo
VerbosityHeavierLighter
Best forDocuments, feeds, mixed content, validationAPIs, config, data interchange

Same data, different jobs. See what JSON is for the other side, the rule of thumb is documents and feeds lean XML, program-to-program data leans JSON.

FAQ

What does XML stand for?

Extensible Markup Language. 'Extensible' is the point: unlike HTML, XML has no fixed set of tags, you define element names that fit your data, and the structure is a tree of those elements.

What is the difference between an element and an attribute?

An element is a tag pair with content between it, like <title>Hello</title>. An attribute is a name="value" pair inside the opening tag, like <title lang="en">. Rule of thumb: data goes in elements, metadata about that data goes in attributes.

What does 'well-formed' vs 'valid' mean?

Well-formed means the XML follows the basic syntax rules: one root element, every tag closed, proper nesting, quoted attributes. Valid is stronger: the document is well-formed AND conforms to a schema (XSD or DTD) that defines which elements and types are allowed. All valid XML is well-formed; not all well-formed XML is valid.

Is RSS just XML?

Yes. RSS and Atom feeds are XML documents with an agreed-upon set of elements (channel, item, title, link, and so on). That is why a feed reader can parse any feed, it is parsing XML against a known vocabulary.

XML vs JSON, when do I still use XML?

Use XML when you need its specific strengths: mixed content (markup embedded in text), attributes alongside elements, namespaces to combine vocabularies, comments, or mature schema validation via XSD. For lightweight data interchange between programs, JSON is usually the better default.

What is a namespace in XML?

A way to avoid name collisions when documents mix vocabularies. A namespace (declared with xmlns) prefixes element names so a <title> from one standard does not clash with a <title> from another, essential in formats like Atom, SOAP, and SVG.

Why does my XML fail to parse?

Almost always a well-formedness slip: an unclosed tag, more than one root element, an unescaped & or < in text (use &amp; and &lt;), mismatched nesting, or an unquoted attribute. A parser will point you at the exact line and column.

☕ KB Cafe Classic

XML was the water KB Cafe swam in, the original site ran on RSS, Atom, and the XML feed tooling of the open web. This is the modern restoration: what XML is, how it stays well-formed, and why the feed layer it powers is still here in 2026.