Syncs scanned documents from Paperwork to Nextcloud via WebDAV.
  • Python 99.4%
  • Makefile 0.6%
Find a file
François Schmidts 8f18159d79 feat(sync): add tag-based filter-in/filter-out for selective sync
Filter-in keeps only documents matching specified tags (unmatched are
deleted from remote). Filter-out excludes documents from syncing but
leaves them untouched on remote. Overlap between both lists fails fast.

Uses the_conf list parameters (SYNC_FILTERIN_0, SYNC_FILTEROUT_0, ...).
Also bumps the-conf to 1.1.0 for improved list handling, extracts
_ensure_base_directory helper, and demotes a noisy tagged-path warning
to debug level.
2026-02-24 12:19:44 +01:00
.claude feat(sync): add tag-based filter-in/filter-out for selective sync 2026-02-24 12:19:44 +01:00
papernext feat(sync): add tag-based filter-in/filter-out for selective sync 2026-02-24 12:19:44 +01:00
.gitignore chore(utilities) 2026-02-05 21:30:54 +01:00
CLAUDE.md feat(cli): add validation checks and command-line args 2026-02-05 21:58:14 +01:00
Makefile chore(utilities) 2026-02-05 21:30:54 +01:00
poetry.lock feat(sync): add tag-based filter-in/filter-out for selective sync 2026-02-24 12:19:44 +01:00
pyproject.toml feat(sync): add tag-based filter-in/filter-out for selective sync 2026-02-24 12:19:44 +01:00
README.md feat(cli): add validation checks and command-line args 2026-02-05 21:58:14 +01:00

Paperwork to Nextcloud Sync

A Python tool to sync scanned documents from Paperwork to Nextcloud, preserving tags and OCR data.

Features

  • Unidirectional sync: Paperwork → Nextcloud (local is source of truth)
  • Tag preservation: Paperwork labels are converted to Nextcloud tags with colors
  • OCR text sync: .words files are uploaded as .txt for Nextcloud full-text search
  • Deletion propagation: Files deleted locally are removed from Nextcloud
  • Dry-run mode: Test sync without making changes
  • WebDAV-based: Works with Nextcloud AIO and standard installations

Why This Tool?

Paperwork makes many filesystem accesses during operation, making direct GVFS/WebDAV mounting slow. This tool:

  1. Keeps Paperwork on fast local storage
  2. Syncs to Nextcloud in one batch operation
  3. Makes documents searchable and accessible via Nextcloud's web interface

Installation

Requirements

  • Python 3.10+
  • Paperwork installed and configured
  • Nextcloud instance (tested with AIO)

Install Dependencies

pip install requests the_conf

Or with pipx (recommended):

pipx install papernext  # When packaged
# For now:
pip install --user requests the_conf

Configuration

The tool uses the_conf for flexible configuration via:

  1. Environment variables
  2. Configuration files
  3. Command-line arguments

Configuration Files

Config files are checked in order:

  • /etc/papernext/config.json
  • ~/.config/papernext.json

Configuration Options

Parameter Env Variable Default Description
check CHECK false Validate configuration and exit (check paths and credentials)
paperwork.dir PAPERWORK_DIR ~/Documents/papers Local Paperwork documents directory
nextcloud.url NEXTCLOUD_URL required Nextcloud base URL (e.g., https://cloud.example.com)
nextcloud.username NEXTCLOUD_USERNAME required Nextcloud username
nextcloud.password NEXTCLOUD_PASSWORD required Nextcloud password (use env var)
nextcloud.path NEXTCLOUD_PATH /Paperwork Target path in Nextcloud
sync.thumbnails SYNC_THUMBNAILS false Sync thumbnail files (*.thumb.jpg)
sync.ocrtext SYNC_OCRTEXT true Sync OCR text files (*.words as *.txt)
sync.deletemissing SYNC_DELETEMISSING true Delete remote files not present locally
sync.dryrun SYNC_DRYRUN false Simulate sync without making changes
logging.level LOGGING_LEVEL INFO Log level (DEBUG, INFO, WARNING, ERROR, FATAL)

Example Configuration File

~/.config/papernext.json:

{
  "paperwork": {
    "dir": "~/Documents/papers"
  },
  "nextcloud": {
    "url": "https://cloud.example.com",
    "username": "myuser",
    "path": "/Documents/Paperwork"
  },
  "sync": {
    "thumbnails": false,
    "ocrtext": true,
    "deletemissing": true,
    "dryrun": false
  },
  "logging": {
    "level": "INFO"
  }
}

Environment Variables

Recommended: Store password as an environment variable:

export NEXTCLOUD_PASSWORD="your_password_here"

All config options can be set via env vars (see table above for names):

export NEXTCLOUD_URL="https://cloud.example.com"
export NEXTCLOUD_USERNAME="myuser"
export NEXTCLOUD_PASSWORD="your_password_here"
export NEXTCLOUD_PATH="/Documents/Paperwork"
export PAPERWORK_DIR="~/Documents/papers"
export SYNC_DRYRUN="True"           # Boolean: use "True"/"False" or "1"/"0"
export SYNC_OCRTEXT="True"
export SYNC_DELETEMISSING="False"
export LOGGING_LEVEL="DEBUG"

Usage

Check Configuration

Before running your first sync, validate your configuration:

papernext --check True

Or with environment variable:

CHECK="True" papernext

This will:

  • Verify the Paperwork directory exists and contains valid documents
  • Test Nextcloud credentials and connectivity
  • Exit with status 0 if all checks pass, 1 if any fail

Basic Sync

papernext

Dry Run (Test Without Changes)

Using command-line flag:

papernext --sync-dryrun True

Or environment variable:

SYNC_DRYRUN="True" papernext

Or config file:

# Add to config file: "sync": {"dryrun": true}
papernext

Command-Line Options

All configuration options can be passed via command line:

papernext --help  # Show all available options

# Examples:
papernext --nextcloud-url https://cloud.example.com \
          --nextcloud-username myuser \
          --nextcloud-password mypass \
          --paperwork-dir ~/Documents/papers \
          --sync-dryrun True

papernext --check True  # Validate configuration
papernext --log-level DEBUG  # Enable debug logging

Priority: Command-line args > Environment variables > Config files

Automated Sync with Cron

Run sync every hour:

crontab -e

Add:

0 * * * * /usr/bin/python3 /path/to/papernext.py

Create ~/.config/systemd/user/paperwork-sync.service:

[Unit]
Description=Paperwork to Nextcloud Sync

[Service]
Type=oneshot
ExecStart=/usr/bin/python3 /home/youruser/devel/papernext/papernext.py
EnvironmentFile=%h/.config/papernext.env

Create ~/.config/systemd/user/paperwork-sync.timer:

[Unit]
Description=Paperwork Sync Timer

[Timer]
OnCalendar=hourly
Persistent=true

[Install]
WantedBy=timers.target

Enable and start:

systemctl --user enable --now paperwork-sync.timer
systemctl --user status paperwork-sync.timer

How It Works

Document Structure

Paperwork stores documents in directories like 20221118_0000_00_1/:

20221118_0000_00_1/
├── paper.1.jpg           # Scanned image (synced)
├── paper.1.thumb.jpg     # Thumbnail (optional)
├── paper.1.words         # OCR text (synced as .txt)
└── labels                # Tags: "bank,rgb(252,233,79)"

Sync Process

  1. Scan local Paperwork directory

    • Find all document directories
    • Parse labels files
    • Identify files to sync
  2. Create Nextcloud structure

    • Create document directories via WebDAV
    • Upload files (images + OCR text)
  3. Apply tags

    • Create tags in Nextcloud if they don't exist
    • Apply tags to all files in each document
    • Preserve label colors
  4. Handle deletions

    • List remote documents
    • Delete documents not present locally

Tag Mapping

Paperwork labels are converted to Nextcloud system tags:

Paperwork Nextcloud
bank,rgb(252,233,79) Tag "bank" with color #fce94f
relevé,rgb(196,160,0) Tag "relevé" with color #c4a000

Tags are:

  • User-visible in Nextcloud UI
  • User-assignable
  • Searchable via Nextcloud search

OCR Text Handling

Paperwork's .words files are uploaded as .txt files:

  • Benefit: Nextcloud's full-text search indexes .txt files automatically
  • No duplicate OCR: Paperwork's OCR work is reused
  • Searchable: Find documents by content in Nextcloud search

Troubleshooting

Authentication Errors

Error: 401 Unauthorized

Solution: Check credentials:

export NEXTCLOUD_PASSWORD="your_password"
echo $NEXTCLOUD_PASSWORD  # Verify it's set

WebDAV Errors

Error: Failed to create directory: 405

Reason: Directory already exists (this is normal, ignore it)

Tag Creation Fails

Error: Failed to create tag: 403

Reason: User doesn't have permission to create tags

Solution:

  • Check Nextcloud admin settings
  • Ensure user can create/manage tags
  • Try creating a tag manually in Nextcloud UI first

Slow Sync

Symptoms: Sync takes a long time

Solutions:

  • Don't sync thumbnails: "sync.thumbnails": false
  • Run less frequently (e.g., every 4 hours instead of hourly)
  • Check Nextcloud server logs for bottlenecks

Dry Run Not Working

Symptom: Changes are made despite dry-run mode

Check: Ensure environment variable is set:

export SYNC_DRYRUN="True"
papernext

Look for [DRY RUN] in log output.

Development

Project Structure

papernext/
├── papernext.py   # Main script
├── README.md                      # This file
└── CLAUDE.md                      # Design documentation

Running Tests

# Dry run to test configuration
export SYNC_DRYRUN="True"
export LOGGING_LEVEL="DEBUG"
papernext

Debug Mode

export LOGGING_LEVEL="DEBUG"
papernext

License

MIT License (or specify your license)

Contributing

Contributions welcome! Please:

  1. Test with dry-run mode first
  2. Add logging for new features
  3. Update documentation

FAQ

Q: Will this work with ownCloud? A: Likely yes (Nextcloud forked from ownCloud), but untested.

Q: Can I sync from Nextcloud back to Paperwork? A: No, this is unidirectional only (Paperwork → Nextcloud).

Q: What happens if I edit a document in Nextcloud? A: Next sync will overwrite it with the local version. Paperwork is the source of truth.

Q: Can I exclude certain documents? A: Not currently, but could be added (e.g., via a .syncignore file).

Q: Does this require Nextcloud full-text search? A: No, but highly recommended for searching OCR text.