Replace shared volume with database for QGIS job file exchange¶
Context and Problem Statement¶
The backend and qfield-qgis containers exchanged input/output files for QGIS
project generation via a shared Docker volume (/opt/qfield). This worked for
Docker Compose but required a ReadWriteMany PVC in Kubernetes - meaning an
EFS driver, NFS provisioner, or equivalent. That added deployment complexity
and infrastructure cost, especially for self-hosted instances.
Considered Options¶
- S3 / MinIO object storage - both containers read/write files via S3 API
- PostgreSQL as shared state - store files in a
qgis_jobstable - HTTP request/response body - send input files in the POST body, receive output files in the response
Decision Outcome¶
PostgreSQL as shared state via a qgis_jobs table.
How it works¶
- Backend writes input files (XLSForm bytes, features/tasks GeoJSON) to the
qgis_jobstable, keyed by a UUID job ID. - Backend POSTs just the
job_idand processing parameters to the QGIS wrapper HTTP service. - QGIS wrapper connects to the same database, reads inputs to a local temp
directory, runs QGIS processing, then writes the output files back to the
output_filesJSONB column (as a{filename: base64_content}dict). - Backend reads the output files from the DB row, writes them to a temp directory for QFieldCloud SDK upload, then deletes the row.
Why not S3¶
S3 (or MinIO/Garage) would work well but requires users to deploy and configure an object store. Field-TM aims to be simple to self-host; adding an S3 dependency for a once-a-day operation was not justified.
Why not HTTP request/response body¶
Initially implemented but rejected for future-proofing:
- Memory: base64-in-JSON loads entire files into memory multiple times. A 300 MB mbtiles basemap (planned feature) would cause ~1-1.5 GB peak memory.
- No resilience: the HTTP connection stays open for the entire processing time. If anything fails, all data must be re-sent from scratch.
- Not streamable:
BaseHTTPRequestHandler.rfile.read()buffers the full body.
The DB approach keeps memory flat (reads/writes happen in controlled chunks), survives container crashes (inputs persist for retry), and scales to larger files without architectural changes.
Crash recovery¶
Rows are short-lived (seconds to minutes) and always deleted by the backend after processing. If a crash leaves orphaned rows, they can be cleaned up with:
DELETE FROM qgis_jobs WHERE created_at < NOW() - INTERVAL '1 hour';
This can be run on application startup or as a periodic task.
Future: large basemap files¶
When basemap support is added (100-300 MB .mbtiles), the BYTEA column can be
replaced with PostgreSQL large objects (lo_create/lo_read/lo_write) which
stream in chunks instead of loading the full blob into memory. The current
BYTEA approach works fine for files under ~50 MB.
Consequences¶
- Good: eliminates the ReadWriteMany PVC requirement in Kubernetes.
- Good: no new infrastructure - reuses the existing PostgreSQL database.
- Good: crash-resilient - inputs survive container restarts for retry.
- Good: simpler deployment for self-hosted instances.
- Good: HTTP payloads are now tiny (just job ID + parameters).
- Neutral: QGIS container now depends on psycopg (small addition to Dockerfile).
- Neutral: rows are transient and cleaned up immediately; no DB bloat at current scale (≤1 project/day).
- Bad: if files grow very large (>500 MB), should migrate to large objects for streaming reads/writes.