PDF → RAG

How Parsade compares.

LlamaParse and Marker are the other tools built for this. We note where we lose as clearly as where we win.

Feature comparison

Parsade has two tiers: a free browser app and a paid self-host deployment. The table shows what each unlocks.

Parsade Competitors
Free Private Self-host LlamaParse Marker / Datalab
Cost Free License $1.25–$56 / 1k pages ~$3 / 1k pages
Data location Your device, never uploaded Your cloud, VPC, or on-prem LlamaIndex cloud Datalab cloud
Setup Open a URL Docker / Helm API key API key
Input formats PDF, images PDF, Office, HTML, MD, CSV, AsciiDoc, LaTeX, images, A/V 90+ formats PDF, Office, HTML, EPUB
Output formats Markdown, JSON, DocTags, chunks Markdown, JSON, HTML, DocTags Markdown, JSON Markdown, JSON, HTML
Processing engine Granite Docling 258M · WebGPU Full Docling pipeline · your GPU Frontier VLM (cloud) Surya OCR + optional LLM
Scanned PDFs / OCR Limited · 258M VLM Full OCR pipeline Best on hard docs Surya OCR · 90+ langs
Bbox provenance On-device Yes · non-fast tiers Yes · in JSON
Structure-aware chunking HybridChunker JSON, you chunk it JSON, you chunk it
Deterministic output LLM variance Yes (no-LLM mode)
Table extraction Via VLM TableFormer High (VLM) Good (+--use_llm)
No vendor dependency Once cached Your cloud / air-gap
License MIT + Apache 2.0 MIT + CDLA / Apache Proprietary SaaS GPL-3 + RAIL-M (commercial license >$2M rev)
SLA / compliance Standard; SLA + BAA on request SOC2 partial SOC2 Type 2 · BAA

Parsade Private (self-host) is in early access; air-gap, custom SLA, and BAA are available on request. Need a hosted API for non-sensitive workloads? Ask us. See pricing →

Why it matters

Where Parsade has an edge.

Most of the feature list is parity. Provenance, structure-aware output, and deterministic parsing all exist elsewhere. The real edge is data control: your documents never leave your hands, and the stack is yours under a permissive license.

LlamaParse

Cloud only

No on-device or self-host option. Documents go to LlamaIndex's servers, and provenance lives behind a paid cloud tier.

Marker / Datalab

Restricted self-host

On-prem exists, but GPL-3 code + RAIL-M weights: copyleft, use-restricted, and a paid commercial license above $2M revenue. No in-browser tier.

Clear win

Parsade

Yours, end to end

On-device in the browser for free, or self-hosted in your cloud, VPC, or air-gapped network on the paid tier, all under MIT + CDLA-Permissive / Apache-2.0. Same provenance and structure-aware chunking, running on your hardware instead of someone else's.

Honest scope: Datalab leads on compliance certifications and support maturity. The win is data control and license freedom, not a claim to out-support them.

Honest assessment

Where others win.

Better to say it plainly than have you find out after switching.

Hard scans · complex charts

LlamaParse with a frontier VLM is the benchmark on visually complex or heavily scanned documents. Parsade's free tier uses Granite Docling 258M; the paid tier uses the full Docling pipeline. Neither has been benchmarked head-to-head, so test on your own documents before deciding.

Format breadth

The free tier is PDF and images. The paid self-host tier adds Office, HTML, Markdown, CSV, AsciiDoc, LaTeX, and audio/video through the full Docling pipeline. LlamaParse still claims the widest long-tail list (90+ formats).

A note on Docling

Parsade is built on IBM's open-source Docling (MIT + Apache 2.0). The free tier runs Granite Docling 258M in-browser via WebGPU. The paid tier runs the full pipeline inside your own cloud or infrastructure, so documents never leave your boundary.

Try it before deciding.

Free, no account. Drop a PDF and see the Markdown, JSON, and chunk output for yourself. First load fetches ~1.15 GB of model weights, cached after that.