PDF → RAG
LlamaParse and Marker are the other tools built for this. We note where we lose as clearly as where we win.
Parsade has two tiers: a free browser app and a paid self-host deployment. The table shows what each unlocks.
| Parsade | Competitors | |||
|---|---|---|---|---|
| Free | Private Self-host | LlamaParse | Marker / Datalab | |
| Cost | Free | License | $1.25–$56 / 1k pages | ~$3 / 1k pages |
| Data location | Your device, never uploaded | Your cloud, VPC, or on-prem | LlamaIndex cloud | Datalab cloud |
| Setup | Open a URL | Docker / Helm | API key | API key |
| Input formats | PDF, images | PDF, Office, HTML, MD, CSV, AsciiDoc, LaTeX, images, A/V | 90+ formats | PDF, Office, HTML, EPUB |
| Output formats | Markdown, JSON, DocTags, chunks | Markdown, JSON, HTML, DocTags | Markdown, JSON | Markdown, JSON, HTML |
| Processing engine | Granite Docling 258M · WebGPU | Full Docling pipeline · your GPU | Frontier VLM (cloud) | Surya OCR + optional LLM |
| Scanned PDFs / OCR | Limited · 258M VLM | Full OCR pipeline | Best on hard docs | Surya OCR · 90+ langs |
| Bbox provenance | On-device | Yes · non-fast tiers | Yes · in JSON | |
| Structure-aware chunking | HybridChunker | JSON, you chunk it | JSON, you chunk it | |
| Deterministic output | — LLM variance | Yes (no-LLM mode) | ||
| Table extraction | Via VLM | TableFormer | High (VLM) | Good (+--use_llm) |
| No vendor dependency | Once cached | Your cloud / air-gap | — | — |
| License | MIT + Apache 2.0 | MIT + CDLA / Apache | Proprietary SaaS | GPL-3 + RAIL-M (commercial license >$2M rev) |
| SLA / compliance | — | Standard; SLA + BAA on request | SOC2 partial | SOC2 Type 2 · BAA |
Parsade Private (self-host) is in early access; air-gap, custom SLA, and BAA are available on request. Need a hosted API for non-sensitive workloads? Ask us. See pricing →
Why it matters
Most of the feature list is parity. Provenance, structure-aware output, and deterministic parsing all exist elsewhere. The real edge is data control: your documents never leave your hands, and the stack is yours under a permissive license.
Cloud only
No on-device or self-host option. Documents go to LlamaIndex's servers, and provenance lives behind a paid cloud tier.
Restricted self-host
On-prem exists, but GPL-3 code + RAIL-M weights: copyleft, use-restricted, and a paid commercial license above $2M revenue. No in-browser tier.
Yours, end to end
On-device in the browser for free, or self-hosted in your cloud, VPC, or air-gapped network on the paid tier, all under MIT + CDLA-Permissive / Apache-2.0. Same provenance and structure-aware chunking, running on your hardware instead of someone else's.
Honest scope: Datalab leads on compliance certifications and support maturity. The win is data control and license freedom, not a claim to out-support them.
Honest assessment
Better to say it plainly than have you find out after switching.
LlamaParse with a frontier VLM is the benchmark on visually complex or heavily scanned documents. Parsade's free tier uses Granite Docling 258M; the paid tier uses the full Docling pipeline. Neither has been benchmarked head-to-head, so test on your own documents before deciding.
The free tier is PDF and images. The paid self-host tier adds Office, HTML, Markdown, CSV, AsciiDoc, LaTeX, and audio/video through the full Docling pipeline. LlamaParse still claims the widest long-tail list (90+ formats).
A note on Docling
Parsade is built on IBM's open-source Docling (MIT + Apache 2.0). The free tier runs Granite Docling 258M in-browser via WebGPU. The paid tier runs the full pipeline inside your own cloud or infrastructure, so documents never leave your boundary.
Free, no account. Drop a PDF and see the Markdown, JSON, and chunk output for yourself. First load fetches ~1.15 GB of model weights, cached after that.