- Python 99.4%
- Makefile 0.6%
Filter-in keeps only documents matching specified tags (unmatched are deleted from remote). Filter-out excludes documents from syncing but leaves them untouched on remote. Overlap between both lists fails fast. Uses the_conf list parameters (SYNC_FILTERIN_0, SYNC_FILTEROUT_0, ...). Also bumps the-conf to 1.1.0 for improved list handling, extracts _ensure_base_directory helper, and demotes a noisy tagged-path warning to debug level. |
||
|---|---|---|
| .claude | ||
| papernext | ||
| .gitignore | ||
| CLAUDE.md | ||
| Makefile | ||
| poetry.lock | ||
| pyproject.toml | ||
| README.md | ||
Paperwork to Nextcloud Sync
A Python tool to sync scanned documents from Paperwork to Nextcloud, preserving tags and OCR data.
Features
- Unidirectional sync: Paperwork → Nextcloud (local is source of truth)
- Tag preservation: Paperwork labels are converted to Nextcloud tags with colors
- OCR text sync:
.wordsfiles are uploaded as.txtfor Nextcloud full-text search - Deletion propagation: Files deleted locally are removed from Nextcloud
- Dry-run mode: Test sync without making changes
- WebDAV-based: Works with Nextcloud AIO and standard installations
Why This Tool?
Paperwork makes many filesystem accesses during operation, making direct GVFS/WebDAV mounting slow. This tool:
- Keeps Paperwork on fast local storage
- Syncs to Nextcloud in one batch operation
- Makes documents searchable and accessible via Nextcloud's web interface
Installation
Requirements
- Python 3.10+
- Paperwork installed and configured
- Nextcloud instance (tested with AIO)
Install Dependencies
pip install requests the_conf
Or with pipx (recommended):
pipx install papernext # When packaged
# For now:
pip install --user requests the_conf
Configuration
The tool uses the_conf for flexible configuration via:
- Environment variables
- Configuration files
- Command-line arguments
Configuration Files
Config files are checked in order:
/etc/papernext/config.json~/.config/papernext.json
Configuration Options
| Parameter | Env Variable | Default | Description |
|---|---|---|---|
check |
CHECK |
false |
Validate configuration and exit (check paths and credentials) |
paperwork.dir |
PAPERWORK_DIR |
~/Documents/papers |
Local Paperwork documents directory |
nextcloud.url |
NEXTCLOUD_URL |
required | Nextcloud base URL (e.g., https://cloud.example.com) |
nextcloud.username |
NEXTCLOUD_USERNAME |
required | Nextcloud username |
nextcloud.password |
NEXTCLOUD_PASSWORD |
required | Nextcloud password (use env var) |
nextcloud.path |
NEXTCLOUD_PATH |
/Paperwork |
Target path in Nextcloud |
sync.thumbnails |
SYNC_THUMBNAILS |
false |
Sync thumbnail files (*.thumb.jpg) |
sync.ocrtext |
SYNC_OCRTEXT |
true |
Sync OCR text files (*.words as *.txt) |
sync.deletemissing |
SYNC_DELETEMISSING |
true |
Delete remote files not present locally |
sync.dryrun |
SYNC_DRYRUN |
false |
Simulate sync without making changes |
logging.level |
LOGGING_LEVEL |
INFO |
Log level (DEBUG, INFO, WARNING, ERROR, FATAL) |
Example Configuration File
~/.config/papernext.json:
{
"paperwork": {
"dir": "~/Documents/papers"
},
"nextcloud": {
"url": "https://cloud.example.com",
"username": "myuser",
"path": "/Documents/Paperwork"
},
"sync": {
"thumbnails": false,
"ocrtext": true,
"deletemissing": true,
"dryrun": false
},
"logging": {
"level": "INFO"
}
}
Environment Variables
Recommended: Store password as an environment variable:
export NEXTCLOUD_PASSWORD="your_password_here"
All config options can be set via env vars (see table above for names):
export NEXTCLOUD_URL="https://cloud.example.com"
export NEXTCLOUD_USERNAME="myuser"
export NEXTCLOUD_PASSWORD="your_password_here"
export NEXTCLOUD_PATH="/Documents/Paperwork"
export PAPERWORK_DIR="~/Documents/papers"
export SYNC_DRYRUN="True" # Boolean: use "True"/"False" or "1"/"0"
export SYNC_OCRTEXT="True"
export SYNC_DELETEMISSING="False"
export LOGGING_LEVEL="DEBUG"
Usage
Check Configuration
Before running your first sync, validate your configuration:
papernext --check True
Or with environment variable:
CHECK="True" papernext
This will:
- Verify the Paperwork directory exists and contains valid documents
- Test Nextcloud credentials and connectivity
- Exit with status 0 if all checks pass, 1 if any fail
Basic Sync
papernext
Dry Run (Test Without Changes)
Using command-line flag:
papernext --sync-dryrun True
Or environment variable:
SYNC_DRYRUN="True" papernext
Or config file:
# Add to config file: "sync": {"dryrun": true}
papernext
Command-Line Options
All configuration options can be passed via command line:
papernext --help # Show all available options
# Examples:
papernext --nextcloud-url https://cloud.example.com \
--nextcloud-username myuser \
--nextcloud-password mypass \
--paperwork-dir ~/Documents/papers \
--sync-dryrun True
papernext --check True # Validate configuration
papernext --log-level DEBUG # Enable debug logging
Priority: Command-line args > Environment variables > Config files
Automated Sync with Cron
Run sync every hour:
crontab -e
Add:
0 * * * * /usr/bin/python3 /path/to/papernext.py
Systemd Timer (Recommended)
Create ~/.config/systemd/user/paperwork-sync.service:
[Unit]
Description=Paperwork to Nextcloud Sync
[Service]
Type=oneshot
ExecStart=/usr/bin/python3 /home/youruser/devel/papernext/papernext.py
EnvironmentFile=%h/.config/papernext.env
Create ~/.config/systemd/user/paperwork-sync.timer:
[Unit]
Description=Paperwork Sync Timer
[Timer]
OnCalendar=hourly
Persistent=true
[Install]
WantedBy=timers.target
Enable and start:
systemctl --user enable --now paperwork-sync.timer
systemctl --user status paperwork-sync.timer
How It Works
Document Structure
Paperwork stores documents in directories like 20221118_0000_00_1/:
20221118_0000_00_1/
├── paper.1.jpg # Scanned image (synced)
├── paper.1.thumb.jpg # Thumbnail (optional)
├── paper.1.words # OCR text (synced as .txt)
└── labels # Tags: "bank,rgb(252,233,79)"
Sync Process
-
Scan local Paperwork directory
- Find all document directories
- Parse
labelsfiles - Identify files to sync
-
Create Nextcloud structure
- Create document directories via WebDAV
- Upload files (images + OCR text)
-
Apply tags
- Create tags in Nextcloud if they don't exist
- Apply tags to all files in each document
- Preserve label colors
-
Handle deletions
- List remote documents
- Delete documents not present locally
Tag Mapping
Paperwork labels are converted to Nextcloud system tags:
| Paperwork | Nextcloud |
|---|---|
bank,rgb(252,233,79) |
Tag "bank" with color #fce94f |
relevé,rgb(196,160,0) |
Tag "relevé" with color #c4a000 |
Tags are:
- User-visible in Nextcloud UI
- User-assignable
- Searchable via Nextcloud search
OCR Text Handling
Paperwork's .words files are uploaded as .txt files:
- Benefit: Nextcloud's full-text search indexes
.txtfiles automatically - No duplicate OCR: Paperwork's OCR work is reused
- Searchable: Find documents by content in Nextcloud search
Troubleshooting
Authentication Errors
Error: 401 Unauthorized
Solution: Check credentials:
export NEXTCLOUD_PASSWORD="your_password"
echo $NEXTCLOUD_PASSWORD # Verify it's set
WebDAV Errors
Error: Failed to create directory: 405
Reason: Directory already exists (this is normal, ignore it)
Tag Creation Fails
Error: Failed to create tag: 403
Reason: User doesn't have permission to create tags
Solution:
- Check Nextcloud admin settings
- Ensure user can create/manage tags
- Try creating a tag manually in Nextcloud UI first
Slow Sync
Symptoms: Sync takes a long time
Solutions:
- Don't sync thumbnails:
"sync.thumbnails": false - Run less frequently (e.g., every 4 hours instead of hourly)
- Check Nextcloud server logs for bottlenecks
Dry Run Not Working
Symptom: Changes are made despite dry-run mode
Check: Ensure environment variable is set:
export SYNC_DRYRUN="True"
papernext
Look for [DRY RUN] in log output.
Development
Project Structure
papernext/
├── papernext.py # Main script
├── README.md # This file
└── CLAUDE.md # Design documentation
Running Tests
# Dry run to test configuration
export SYNC_DRYRUN="True"
export LOGGING_LEVEL="DEBUG"
papernext
Debug Mode
export LOGGING_LEVEL="DEBUG"
papernext
Related Projects
- Paperwork - Document management system
- Nextcloud - Self-hosted cloud platform
- the_conf - Configuration management library
License
MIT License (or specify your license)
Contributing
Contributions welcome! Please:
- Test with dry-run mode first
- Add logging for new features
- Update documentation
FAQ
Q: Will this work with ownCloud? A: Likely yes (Nextcloud forked from ownCloud), but untested.
Q: Can I sync from Nextcloud back to Paperwork? A: No, this is unidirectional only (Paperwork → Nextcloud).
Q: What happens if I edit a document in Nextcloud? A: Next sync will overwrite it with the local version. Paperwork is the source of truth.
Q: Can I exclude certain documents?
A: Not currently, but could be added (e.g., via a .syncignore file).
Q: Does this require Nextcloud full-text search? A: No, but highly recommended for searching OCR text.