JSON vs YAML: when to use which, and the footguns of each

JSON and YAML model the same data, but their failure modes differ sharply: YAML's type coercion and whitespace traps versus JSON's missing comments and verbosity.

영문 본문을 표시하고 있습니다. 번역은 준비 중입니다.

JSON and YAML describe the same shapes — maps, lists, and scalars — and the choice between them is usually framed as a style preference. It is not. They have different failure modes, and those failures land in different places: JSON's bite at edit time, YAML's bite silently at parse time. This post covers where each format earns its keep, the side-by-side mechanics, and the footguns that turn a working config into a 2 a.m. incident.

The relationship

YAML 1.2 is, for practical purposes, a superset of JSON. Any valid JSON document is also a valid YAML document, because YAML's spec adopted JSON's syntax as a subset. A YAML parser will read this verbatim:

{"service": "api", "replicas": 3, "ports": [80, 443]}

That matters more than it sounds. It means YAML inherits JSON's data model entirely — there is nothing JSON can represent that YAML cannot. The differences are not in what the formats can hold but in how humans write them and how parsers interpret the ambiguity that YAML's looser grammar introduces.

Side by side

The same document, first in JSON:

{
  "service": "api",
  "replicas": 3,
  "image": "registry.example.com/api:1.4.0",
  "ports": [80, 443],
  "env": {
    "LOG_LEVEL": "info",
    "REGION": "us-east-1"
  }
}

And in YAML:

# deployment config for the api service
service: api
replicas: 3
image: registry.example.com/api:1.4.0
ports:
  - 80
  - 443
env:
  LOG_LEVEL: info
  REGION: us-east-1

The YAML version drops the braces, the brackets, the quotes around keys and most strings, the commas, and adds a comment. Structure is carried by indentation instead of punctuation. For a human editing a config by hand this is genuinely more pleasant. The catch is that every piece of that convenience is also a place where a parser has to guess, and the guesses are where things go wrong.

Feature comparison

Feature JSON YAML
Comments No (spec forbids) Yes (#)
Trailing commas No (spec forbids) N/A (no commas)
Multiline strings Escaped \n only Block scalars (`
Anchors / aliases No Yes (&, *, <<)
Parsing strictness High — one grammar Low — context-sensitive, version-dependent
Type inference None (explicit) Aggressive (implicit)
Ubiquity Universal Wide but uneven
Attack surface Minimal Anchors, custom tags, deep nesting

YAML's footguns

This is the part worth memorizing, because the failures are quiet.

The Norway problem

YAML 1.1 treats a long list of bare words as booleans: yes, no, on, off, true, false, y, n. Many widely deployed parsers still default to 1.1 behavior. So this:

country: NO

parses NO as the boolean false, not the string "NO" (Norway's country code — hence the name). The same class of bug:

version: 1.0        # parses as the float 1.0, drops the trailing zero
build: 1.20         # becomes 1.2
zip: 02134          # leading zero: octal in 1.1, or stripped to 2134
git_sha: 1234567    # an all-digit SHA prefix becomes an integer
mac: 12:34:56       # 1.1 reads colons as base-60 (sexagesimal)
enabled: off        # the string "off" becomes false

A ZIP code, a git SHA, a version string, a MAC address, a phone number, a serial — anything that looks numeric but is semantically a string is a candidate for silent coercion. The fix is mechanical: quote it. country: "NO", version: "1.0", zip: "02134". Quoting forces the scalar to a string regardless of parser version. Beyond that, prefer a parser configured for YAML 1.2 / "core schema," which narrows the boolean set to true/false only and removes the sexagesimal and octal traps.

Significant whitespace

Indentation is structure, so indentation errors are structural errors. Tabs are not allowed for indentation in YAML at all — a stray tab from an editor that didn't convert it is a hard parse error, and the message rarely points at the real line. Copy-pasting a block into a different indentation context shifts its meaning. A list item that loses two spaces silently becomes a sibling of its former parent rather than a child. None of this is caught by the structure being "valid"; it parses fine, just into a different document than you meant.

Anchors, aliases, and the merge key

YAML lets you define a node once and reuse it:

defaults: &defaults
  retries: 3
  timeout: 30

prod:
  <<: *defaults
  timeout: 60

&defaults anchors the map, *defaults references it, and << merges it in. This is genuinely useful for DRY config — and it is also a readability cliff once a file leans on it heavily, because the effective value of a key now lives somewhere else in the document. Worse, recursive aliases enable the "billion laughs" denial-of-service attack: a handful of nested anchors that each reference the previous one expand exponentially and exhaust memory at parse time. Parsers that don't cap expansion are vulnerable. If you accept YAML from untrusted sources, use a parser with alias-expansion limits (or a "safe" loader) and consider disabling anchors entirely.

Type coercion, generally

The unifying theme is that YAML tries to be helpful by inferring types, and inference is a guess. The defenses are the same in every case: quote anything that is conceptually a string, and load with a strict 1.2 parser instead of the permissive default. In Python that is yaml.safe_load rather than yaml.load; in other ecosystems, look for the schema or strict mode the library exposes.

JSON's limits

JSON's strictness is the reason it almost never surprises you at parse time — but the same strictness makes it a poor format for humans to maintain.

  • No comments. The spec has none, full stop. This is why JSONC (VS Code settings), JSON5, and the // comment hacks exist. A config you can't annotate is a config future-you can't safely change.
  • No trailing commas. Add a line to an array or object and you must remember to add a comma to the line above. This is the single most common hand-edited-JSON error.
  • Verbose for hand editing. Braces, brackets, and quoted keys are noise when a human is the editor. At config scale it adds up.
  • No native date or comment types. Dates are strings by convention, and you carry no metadata about them.

None of these matter for machine-to-machine traffic, where nobody is hand-editing the payload. They matter enormously for files a person opens in an editor every week.

When to use which

The split follows the failure modes. Use JSON wherever a machine produces and consumes the data and a human rarely touches it: API request and response bodies, interchange between services, log records, anything serialized programmatically. Strictness is a feature there — you want exactly one interpretation, and you don't need comments because the schema is the documentation.

Use YAML for configuration that humans edit by hand: CI pipelines, Kubernetes manifests, Ansible playbooks, application config. Comments, multiline strings, and the lighter syntax pay off precisely where a person is in the loop.

The uncomfortable caveat is that YAML's footguns bite hardest in exactly this case. The human-edited config files where YAML shines are the same files where an unquoted NO, a tab, or a mis-indented list item slips through review and ships. So the discipline that makes YAML safe — quote your stringy scalars, lint indentation in CI, validate against a schema — is non-negotiable for the workloads YAML is best at.

That validation step is worth building into the pipeline: a schema catches the coercion bugs a yaml.load will never warn you about. We cover the mechanics in validating with JSON Schema, which works against either format once parsed.

If you're moving a document between the two — porting a JSON API fixture into a YAML config, or flattening a manifest back to JSON for a tool that demands it — our JSON ⇄ YAML converter handles the round-trip and preserves the structure, and the JSON formatter will lint and pretty-print the JSON side before you commit it.