The LLM JSON problem: why output won't parse

The shape of the failure

You write a prompt asking for structured data. "Return a JSON object with three fields." The model obliges. Your code does this:

const text = response.choices[0].message.content;
const data = JSON.parse(text);
// SyntaxError: Unexpected token 'H', "Here is yo..."
//   is not valid JSON

What happened is that the model returned "Here is your JSON: {\"foo\": \"bar\"}" instead of "{\"foo\": \"bar\"}". JSON.parse is correct to refuse. The string is not, in fact, JSON. It's English prose containing a JSON expression as a substring.

This is the most basic version of the failure. The actual range of failure modes is wider than most teams realize. Over the past 18 months building tools that consume LLM output as structured input, four categories cover almost everything we've seen:

Wrapper bugs. The JSON is correct; it's just wrapped in something that isn't.
Syntax bugs. The JSON tries to be JSON, but bends rules in ways the parser refuses.
Schema bugs. The JSON parses fine, but the structure isn't what you asked for.
Truncation bugs. The JSON was correct and well-formed, but the model ran out of token budget mid-emission.

Each category has a different fix. Mixing the responses up — applying a syntax fix to a schema bug, applying retries to a truncation problem — wastes tokens and leaves the actual issue intact.

Wrapper bugs

The most common failure, and almost always the easiest to fix. The model returns valid JSON wrapped in something that isn't valid JSON. Three things wrap it, almost always:

Markdown code fences. Every model is heavily fine-tuned on documentation, where JSON appears wrapped in triple-backticks. Even when you tell it not to, it will sometimes still emit ```json...```. ChatGPT does this. Claude does this. Gemini does this. The smaller open-source models do this even more enthusiastically. Strip the fences before parsing.

Conversational preamble. "Sure, here's the data you requested:" — followed by the JSON. Sometimes the model will go further: "I've structured this according to your three required fields. Please let me know if you need any changes!" The opening clause is the dangerous one because it's hard to detect generally; the closing clause is similar. The reliable signal is that the JSON itself starts at { or [, so anything before the first { or [ that looks like prose can be stripped, and anything after the matching closing brace that looks like prose can be stripped too.

Reasoning blocks. Reasoning-capable models (Claude with extended thinking, GPT o1, DeepSeek-R1, Gemini Thinking) sometimes leak their internal reasoning into the output. You'll see <thinking>...</thinking>, <reasoning>...</reasoning>, <scratchpad>...</scratchpad>, or sometimes a custom tag specific to the model's training. These should be stripped first, before you even attempt to find the JSON, because the reasoning text often itself contains { characters that throw off naive bracket-finding.

The fix for all three: a preprocessing layer that runs before JSON.parse(). Detect each wrapper, strip it, repeat until the input stabilizes, then parse. The four-step pipeline that handles the majority of real-world output is: strip reasoning tags, strip markdown fences, strip preamble, strip trailing prose. Run it in that order so that an outer fence's enclosure of inner reasoning gets handled correctly. (This is exactly what the JSON Repair tool does as its first pass — the wrappers are stripped, the fixes are listed for transparency, then the actual repair logic runs on the cleaned text.)

Syntax bugs

The model emits something close to JSON but not quite. The four common bends:

Trailing commas. Models trained on JavaScript (which is most of them) often emit {"a": 1, "b": 2,}. JS allows this. Strict JSON does not. Some parsers (json5, jsonc) accept it; JSON.parse() does not.

Single quotes. Same root cause. The model writes {'a': 1} because Python and JavaScript both accept it. JSON requires double quotes for strings and double quotes for keys. A find-and-replace on ' isn't safe (apostrophes inside strings need different handling), but a tokenizer-aware replacement is straightforward.

Python literals. True, False, None instead of true, false, null. Models that have seen Python dictionaries serialized as if they were JSON will emit this. Lower-cased substitution is safe as long as you're not inside a string.

Unquoted keys. {name: "Alice"} instead of {"name": "Alice"}. JavaScript object literal syntax leaking through. The fix requires a real parser, not a regex — you have to track string boundaries to avoid quoting things that shouldn't be quoted.

For all of these, the right tool is a forgiving parser. The npm package jsonrepair handles all four cases (and several more obscure ones) with about 200KB of code and no dependencies. Roll-your-own is possible but tedious; the edge cases compound. If you ship LLM-consuming code to production, vendor a forgiving parser as the second-pass fallback after strict JSON.parse() fails.

Schema bugs

The output parses cleanly. But you asked for {"users": [{...}, {...}]} and got [{...}, {...}]. Or you asked for {"price": 12.50} and got {"price": "12.50"}. Or you asked for an integer ID and got the string "42". Or you specified that tags is a list of strings and got a comma-separated string instead.

This is a fundamentally different problem from syntax. The JSON is valid. Your validator is the only thing that knows the shape is wrong. The fix is structural validation — a JSON Schema validator, a Zod schema, a Pydantic model — applied right after parse. The validator's error message tells you exactly which field has the wrong type or which required field is missing.

Three escalation paths from there, in order of preference:

Coerce. If you got "42" instead of 42, and the value cleanly parses as a number, accept the coercion silently. Document it in your schema (Zod's z.coerce.number(), Pydantic's strict-vs-lax mode). Most schema bugs are coercion-fixable. Don't retry the model for them — coerce, log, move on.

Repair-prompt. If the structure is meaningfully wrong (missing required fields, extra fields, list instead of object), send a follow-up message to the model: "Your previous response had this issue: [validator error]. Please return the corrected JSON only." Almost all current models handle a single round of repair-prompt successfully. Two rounds get you to >99%. Three is wasted tokens; if you're still failing after three, the prompt itself is the bug.

Use structured outputs. OpenAI (since GPT-4o-mini-2024-07-18), Anthropic (since Claude 3.5), and Google (Gemini 1.5+) all support a "constrained generation" mode where you provide a JSON Schema at request time and the model is forced to emit only token sequences that conform to that schema. This is a much stronger guarantee than prompting alone — you trade a small latency cost (the model has to check schema constraints during decoding) for elimination of the entire schema-bug category. If you control the API call and the model supports it, use it. The remaining 10% of LLM JSON problems disappear.

Truncation bugs

The output starts well. The first 95% of the JSON parses cleanly. Then the string ends mid-array, mid-object, or mid-value, with no closing brace. JSON.parse() fails with "Unexpected end of JSON input."

The cause is almost always the model hitting its max_tokens limit before finishing the response. The fix is the cause:

Raise max_tokens. Inspect the request. If you're capped at 1024 tokens and the response is a list of 50 items, you'll get truncation almost every time. Most providers default to a fairly low ceiling; raising it explicitly is usually the only fix needed.

Check finish_reason. Every reasonable LLM API returns a stop reason: "stop", "length", "content_filter". If finish_reason === "length", the model didn't finish — it got cut off. This is a much stronger signal than parsing the output: you know it's truncated before you even try to use it. Treat it like a network error and retry with a larger budget.

Stream and accumulate. If your output is large and the model emits structured items one at a time (a list of products, a series of records), use streaming and accumulate. Detect each complete item as it streams (} at depth 1) and emit it to your downstream consumer immediately. This eliminates the truncation problem entirely for streamable workloads.

One specific anti-pattern worth naming: trying to "complete" a truncated JSON string by appending closing braces. It almost works for the simplest cases — one missing }. It catastrophically fails the moment the truncation happens inside a string, inside an escape sequence, or partway through a Unicode codepoint. If you're tempted to write this code, use a forgiving parser that does it correctly (jsonrepair handles truncation as a first-class case) or fix the underlying token-budget problem instead.

The order of operations

For production code that consumes LLM JSON, the right pipeline is:

Make the request with structured outputs enabled if your model supports it. (Skip steps 2–3 if so.)
Check finish_reason. If "length", retry with more tokens.
Preprocess: strip reasoning tags, strip markdown fences, strip preamble, strip trailing prose.
Try strict JSON.parse().
If that fails, try forgiving parse (jsonrepair or equivalent).
Validate against schema (Zod, Pydantic, JSON Schema).
Coerce types where possible.
If structure still wrong, repair-prompt (max 2 attempts).
If still failing, log everything and surface a real error.

The pipeline is verbose because the failure modes are heterogeneous. A single try/catch around JSON.parse() hides all four bug categories under one error message — you'll spend hours debugging "invalid JSON" before realizing the actual issue was that you forgot to set response_format: { type: "json_object" }.

Companion tool

If you're holding a broken LLM response right now and want to see what category of bug it is: JSON Repair. Paste the raw output, including any wrappers and prose. It detects each category, lists the fixes it applied (so you can adjust your prompt or pipeline upstream), and emits the cleaned JSON. Everything runs in your browser; the input never leaves your machine, which matters for tokens that may contain customer PII or proprietary prompts.

For the wider context — what's actually inside a JSON document, when to use JSON vs YAML vs TOML — see the JSON vs YAML vs TOML piece. For converting between JSON and other formats with strict validation, the Format Flipper tool handles 90 conversion directions including JSON↔YAML↔CSV↔XML.