Paperless-ngx is a self-hosted document management system that scans, OCRs, tags, and indexes your paper documents into a searchable digital archive, as outlined in the Docker documentation. Drop a scanned PDF or photo of a receipt, bill, tax form, or letter into Paperless-ngx, and it automatically extracts the text via OCR, suggests tags and correspondents based on content, assigns a date, and files it in your archive. Every document becomes instantly searchable by keyword, date range, tag, or correspondent, eliminating the filing cabinet and the “where did I put that document” problem permanently.
Paperless-ngx is the community-maintained successor to Paperless and Paperless-ng, with active development, a modern web interface, and an extensive feature set that handles everything from single-page receipts to multi-hundred-page contracts. Running it with Docker Compose takes 10 minutes and gives you a complete document management system that rivals commercial solutions costing $15 to $30 per month. Here is the setup guide, workflow optimization tips, and the exact Docker Compose configuration.
What Paperless-ngx Does After You Set It Up
The daily workflow is simple: scan a document (using your phone’s camera, a flatbed scanner, or a document scanning app like Adobe Scan or Genius Scan), save the PDF to Paperless-ngx’s consumption folder, and walk away. Paperless-ngx handles everything else automatically.
The processing pipeline runs in this order: the file is detected in the consumption folder, the OCR engine (Tesseract) extracts all text from the document, the content analysis engine matches the text against your existing tags and correspondents to suggest classifications, the date extraction engine finds dates in the document text and assigns the most likely document date, and finally the processed document appears in your web dashboard ready for review.
The web dashboard shows all your documents in a filterable, sortable list. Filter by tag (tax, medical, insurance, receipts), by correspondent (landlord, employer, utility company), by date range (all 2025 documents), or by full-text search (find every document containing “invoice 2847”). The search function works across OCR’d text in every document, so even handwritten text that Tesseract recognizes becomes searchable.
Installing Paperless-ngx With Docker Compose
Paperless-ngx’s official documentation provides a Docker Compose file that deploys three services: the Paperless-ngx web application, a PostgreSQL database for metadata storage, and a Redis instance for task queue management. The Gotenberg and Tika services are optional additions that improve document conversion capabilities.
Create a directory for your Paperless-ngx installation on your server. Inside it, create the Docker Compose file and an environment file. The compose file defines the three core services with their port mappings, volume mounts, and dependencies. The environment file sets your timezone, OCR language (English by default, multiple languages supported), admin credentials, and the consumption directory path.
Volume configuration matters for long-term use. Mount three directories from your host system: the data directory (where processed documents are stored permanently), the media directory (where original and archived versions of documents live), and the consumption directory (where you drop new documents for processing). Place the data and media directories on reliable storage with backups. The consumption directory can be anywhere accessible, including a network share or a Syncthing-synced folder from your phone.
After configuring the compose file, bring up the stack with the Docker Compose up command in detached mode. The first startup takes 2 to 3 minutes as Docker pulls the images and initializes the database. Access the web interface at your server’s IP address on port 8000 (configurable). Log in with the admin credentials from your environment file.
Configuring OCR for Maximum Accuracy
Paperless-ngx uses Tesseract OCR, the most widely used open-source text recognition engine. Default OCR works well for clearly printed documents in English. For non-English documents, handwritten text, or poor-quality scans, additional configuration significantly improves recognition accuracy.
Set the OCR language in your environment file. Tesseract supports 100+ languages. For multilingual document collections, specify multiple languages separated by plus signs. Paperless-ngx attempts each language and uses the result with the highest confidence score.
The OCR mode setting controls how Paperless-ngx handles documents that already contain embedded text (like digitally-created PDFs from email). The default “skip” mode preserves existing text layers and only OCRs image-based pages. The “redo” mode forces re-OCR on all pages, which is useful when existing text layers are incomplete or corrupted. The “skip_noarchive” mode is the best balance for most users: it skips OCR on text-containing documents but still creates an archived version.
For the best OCR results on phone-scanned documents, use a scanning app that applies perspective correction, contrast enhancement, and binarization before saving the PDF. Adobe Scan (free), Genius Scan, and Apple’s built-in document scanner in Notes all produce scanner-quality PDFs from phone camera captures. These pre-processed images give Tesseract dramatically better recognition than raw camera photos.
Setting Up Automatic Document Ingestion
The consumption folder is Paperless-ngx’s inbox. Any PDF, image (JPEG, PNG, TIFF), or Office document dropped into this folder gets automatically processed. The power of Paperless-ngx emerges when you connect multiple document sources to this single consumption folder.
Phone scanning: Install Syncthing on your phone and your Paperless-ngx server. Configure a shared folder that maps your phone’s scan output to Paperless-ngx’s consumption directory. Scan a document on your phone, and it appears in Paperless-ngx within seconds over your local network. Tailscale enables this sync to work from anywhere, not just your home WiFi.
Email ingestion: Paperless-ngx can monitor an IMAP email account and automatically consume PDF attachments. Configure a dedicated email address (like [email protected]) in Paperless-ngx’s mail settings. Forward receipts, invoices, and statements to this address, and they appear in your document archive automatically. Filter by sender, subject line, or attachment type to control which emails get processed.
Scanner integration: Configure your flatbed or ADF scanner to save directly to a network share that maps to the consumption folder. Dedicated document scanners from Fujitsu (ScanSnap series), Brother, and Epson support scan-to-folder functionality that deposits PDFs directly into your Paperless-ngx consumption pipeline.
Organizing Documents With Tags and Correspondents
Paperless-ngx uses three primary organization methods: tags (categories like “tax,” “medical,” “insurance”), correspondents (entities like “Amazon,” “State Farm,” “City Water”), and document types (classifications like “receipt,” “invoice,” “contract,” “letter”).
The auto-tagging engine learns from your manual classifications. After you tag the first 5 to 10 documents from Amazon as correspondent “Amazon” with tag “receipts,” Paperless-ngx automatically suggests these classifications for future Amazon receipts. The matching algorithms use content analysis (looking for keywords like “amazon.com” or “order confirmation” in the OCR text) and learn from your correction patterns.
Create a tag hierarchy that reflects how you search for documents. A practical starting set: financial (tax, receipts, invoices, bank statements), personal (medical, insurance, identity, property), work (contracts, pay stubs, employment), and household (utilities, repairs, warranties). Add tags as needed rather than pre-creating an elaborate taxonomy. Paperless-ngx’s full-text search reduces the importance of perfect tagging because you can always find documents by searching their content.
Backing Up Your Document Archive
Your Paperless-ngx instance stores two types of critical data: the document files themselves (PDFs and images in the media directory) and the metadata database (tags, correspondents, notes, and processing history in PostgreSQL).
Back up the media directory using any file-level backup tool: rsync to a second drive, Syncthing to a remote location, or Backblaze B2 for cloud backup. The media directory contains the original uploaded files and the archived (OCR’d) versions. Losing this directory means losing the actual documents.
Back up the PostgreSQL database using Paperless-ngx’s built-in export command, which creates a complete database dump in JSON format. Schedule this export to run nightly via a cron job. The export file is small (typically under 50MB even for thousands of documents) and contains all metadata, tags, correspondents, and processing history needed to rebuild the entire archive from the document files.
Is Paperless-ngx free?
Paperless-ngx is 100 percent free and open-source under the GPL-3.0 license. No subscription, no premium tier, no feature limits. Active community development on GitHub with regular releases. The only cost is the hardware to run it, which can be as simple as a Raspberry Pi 4 or an existing home server.
Can Paperless-ngx read handwritten documents?
Tesseract OCR can recognize printed text and some clearly written handwriting. Cursive handwriting and messy handwriting have lower recognition accuracy. For critical handwritten documents, the text may not be fully searchable, but the document itself is stored and viewable. Consider adding manual tags to handwritten documents for searchability.
How many documents can Paperless-ngx handle?
Paperless-ngx handles tens of thousands of documents without performance issues on modest hardware. Users on the Paperless-ngx subreddit report smooth operation with 50,000+ documents on Raspberry Pi 4 hardware. The PostgreSQL database and full-text search index scale well. Processing speed for new documents is the main bottleneck on slower hardware.
Does Paperless-ngx work on a Raspberry Pi?
Yes. Paperless-ngx runs on Raspberry Pi 4 (4GB+ recommended) and Pi 5 via Docker. OCR processing is slower on ARM hardware compared to x86 (roughly 2 to 3 times slower per page), but since processing runs in the background, this delay is not noticeable in daily use. The web interface remains responsive regardless of background processing load.
Can multiple people use one Paperless-ngx instance?
Yes. Paperless-ngx supports multiple user accounts with permission controls. Each user can have their own document visibility settings, allowing shared household document management where some documents are visible to all users and others are restricted to specific accounts. The admin account manages user creation and permission levels.




