WYSIWYD API
Deterministic PDF table extraction. Same PDF in, byte-identical output out. Base URL:
https://wysiwyd-api.fly.dev. All endpoints under /api/v1/.
Designed for headless callers including AI coders, CLI scripts, and direct HTTP integrations.
Quickstart
Three ways to use WYSIWYD. Pick whichever fits your stack.
# Upload, scan page 1, download CSV — three calls, ~3 seconds.
SID=$(curl -s -X POST -F "file=@invoice.pdf" \
https://wysiwyd-api.fly.dev/api/v1/upload | jq -r .session_id)
curl -s -X POST -H "Content-Type: application/json" \
-d "{\"session_id\":\"$SID\",\"page_number\":1}" \
https://wysiwyd-api.fly.dev/api/v1/detect-boxes
curl -s -X POST -H "Content-Type: application/json" \
-d "{\"session_id\":\"$SID\"}" \
https://wysiwyd-api.fly.dev/api/v1/download-csvimport requests
API = "https://wysiwyd-api.fly.dev"
with open("invoice.pdf", "rb") as fh:
r = requests.post(f"{API}/api/v1/upload", files={"file": fh})
session_id = r.json()["session_id"]
r = requests.post(f"{API}/api/v1/detect-boxes",
json={"session_id": session_id, "page_number": 1})
print(r.json())
csv = requests.post(f"{API}/api/v1/download-csv",
json={"session_id": session_id}).text
print(csv)const API = 'https://wysiwyd-api.fly.dev';
const fd = new FormData();
fd.append('file', file); // file = a File or Blob (browser)
// or: fs.readFileSync('invoice.pdf') in Node
const upload = await fetch(`${API}/api/v1/upload`, { method: 'POST', body: fd });
const { session_id } = await upload.json();
const detect = await fetch(`${API}/api/v1/detect-boxes`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ session_id, page_number: 1 }),
});
console.log(await detect.json());
const csvRes = await fetch(`${API}/api/v1/download-csv`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ session_id }),
});
console.log(await csvRes.text());MCP server (the AI coder path)
The fastest way to give Claude Desktop, Cursor, Cline, Continue, or any other MCP-compatible
client native access to WYSIWYD. After the one-time setup, the AI can call
extract_pdf_tables(file_path, page=1) as a built-in tool.
Install
pip install mcp
curl -L https://raw.githubusercontent.com/ekras-doloop/wysiwyd/main/mcp/wysiwyd_mcp.py \
-o ~/.local/bin/wysiwyd-mcp
chmod +x ~/.local/bin/wysiwyd-mcp
Claude Desktop config
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or the equivalent on your OS:
{
"mcpServers": {
"wysiwyd": {
"command": "python",
"args": ["/Users/you/.local/bin/wysiwyd-mcp"],
"env": {
"WYSIWYD_API_URL": "https://wysiwyd-api.fly.dev"
}
}
}
}
Tools exposed
| Tool | Args | Returns |
|---|---|---|
extract_pdf_tables | file_path, page=1, output_format='csv' | CSV or JSON text |
check_usage | (none) | {ip, month, pages_used, pages_limit, pdfs_uploaded} |
api_health | (none) | {status, version, features, free_tier, ...} |
CLI
Standalone Python command-line client. Stdlib only — zero pip dependencies.
curl -L https://raw.githubusercontent.com/ekras-doloop/wysiwyd/main/cli/wysiwyd \
-o ~/.local/bin/wysiwyd
chmod +x ~/.local/bin/wysiwyd
wysiwyd invoice.pdf -o invoice.csv # CSV to file
wysiwyd invoice.pdf --json # JSON to stdout
wysiwyd invoice.pdf --all-pages # every page
wysiwyd --usage # check monthly quota
wysiwyd --health # API status
Authentication
Free tier requires no authentication. Rate limits apply per source IP (see below).
When the Pro tier ships, pass an API key in the Authorization: Bearer <token>
header, or set WYSIWYD_API_KEY env var for the CLI and MCP server.
Rate limits
Free tier:
- 10 pages per IP per month. First page of any PDF counts as 1.
- Counter resets on the first day of each calendar month (UTC).
- Each
POST /detect-boxescall increments the counter by 1. - Upload, create-box, templates, download endpoints do NOT count against pages.
- When the limit is reached,
/detect-boxesreturns HTTP429with a structured body.
Source IP is detected from Fly-Client-IP, CF-Connecting-IP, then X-Forwarded-For, then remote_addr.
Errors
All error responses are JSON with at least an error field:
{"error": "Invalid session ID"} // 400
{"error": "Template not found"} // 404
{
"error": "free_tier_limit_reached",
"pages_used": 10,
"pages_limit": 10,
"message": "Free tier limit reached..."
} // 429
GET /api/v1/health
Health check + feature flags + free-tier config. No auth.
{
"status": "healthy",
"service": "WYSIWYD Production API (smart_detector)",
"version": "3.2.0",
"features": {
"pattern_detection": true,
"pdfplumber_tables": true,
"ocr_tesseract": true,
"confidence_scoring": true,
"user_drawn_boxes": true,
"template_save_match": true,
"free_tier_rate_limit": true
},
"free_tier": {"pages_per_month": 10},
"templates_loaded": 0,
"active_sessions": 0
}
GET /api/v1/usage
Current-month free-tier usage for the calling IP. No auth.
{
"ip": "2600:1700:4a30:1690:...",
"month": "2026-06",
"pages_used": 3,
"pages_limit": 10,
"pdfs_uploaded": 3,
"first_seen": 1780640971,
"last_seen": 1780640979
}
POST /api/v1/upload
Upload a PDF. Returns a session_id used by subsequent calls. Files retained at most 24 hours.
Request: multipart/form-data with file field set to a PDF.
Response 200:
{
"session_id": "9f9f6ea0-1b4a-4adf-be4c-564533de2369",
"filename": "invoice.pdf",
"n_pages": 1
}
POST /api/v1/detect-boxes
Detect table cells on a page. Counts 1 page against the monthly cap.
// Request body
{"session_id": "uuid", "page_number": 1}
// Response 200
{
"boxes": [
{
"box_id": "box_0_0_0", "bbox": [481, 31, 510, 41],
"x": 481, "y": 31, "w": 29, "h": 10,
"text": "Global", "confidence": 95,
"row": 0, "col": 0, "table": 0
},
...
],
"vertical_lines": [],
"horizontal_lines": [],
"method": "smart_detector",
"n_boxes": 162,
"n_words": 162
}
POST /api/v1/create-box
Extract text inside a user-drawn bbox. Does NOT count against the page cap.
// Request
{"session_id": "uuid", "page_number": 1, "bbox": [0, 0, 400, 80], "label": "header"}
// Response 200
{
"box_id": "user_e293e3a1",
"text": "Azure Interior 4557 De Silva St",
"label": "header", "n_words": 6,
"bbox": [0.0, 0.0, 400.0, 80.0]
}
GET / POST /api/v1/templates
List saved templates (in-memory, per-server).
[
{"id": "tpl_8f6dca60", "name": "Invoice layout", "n_cols": 12,
"page_width": 595.0, "page_height": 842.0,
"created_at": "2026-06-04T21:00:42", "use_count": 1}
]
Save current page's layout as a reusable template.
// Request
{"session_id": "uuid", "page_num": 1, "name": "Invoice layout"}
// Response
{"id": "tpl_8f6dca60", "name": "Invoice layout", "n_cols": 12, "saved": true}
POST /api/v1/match-template
Check whether a saved template matches the current page; if so, return its detected boxes.
// Request
{"session_id": "uuid", "page_number": 1, "template_id": "tpl_8f6dca60"}
// Response 200 (matched)
{
"matched": true,
"match_confidence": 1.0,
"template_id": "tpl_8f6dca60",
"template_name": "Invoice layout",
"boxes": [...],
"n_boxes": 162
}
// Response 200 (not matched)
{"matched": false, "match_confidence": 0.32,
"reason": "column structure differs (mean IoU 0.32 < 0.5)"}
POST /api/v1/download-csv
Stream the extracted CSV for all pages scanned in the session.
// Request
{"session_id": "uuid"}
// Response 200 — text/csv stream
POST /api/v1/download-json
Same data as CSV but as structured JSON with coordinates per cell.
Determinism
Same input file (byte-identical) returns byte-identical output. Verified across 90
extractions on 9 documents (10 runs each) with zero variance. See
determinism_certificate.json in the repo. The deterministic guarantee
covers the standard extraction flow (Tier 3 word-clustering); future Tier 4 LLM
fallback would carry a labeled non-deterministic flag.
SR 26-2 framing
On April 17, 2026, the Federal Reserve, OCC, and FDIC issued SR 26-2 (guidance PDF), replacing SR 11-7. The new framework explicitly excludes deterministic rule-based processes and software from the definition of a "model" and therefore from the full model-validation burden. WYSIWYD is built to qualify for that exclusion by design.
If your model-risk team needs audit-grade artifacts (the determinism certificate, a signed accuracy audit, change-history exports), email hello@doloop.io.