Documents in godocs move through three phases: ingestion, active editing, and archival. This document describes the full lifecycle and the archival design.
Ingress folder Document folder Archive folder
┌─────────────┐ ┌──────────────────────┐ ┌──────────────────────┐
│ New files │───>│ Active documents │───>│ Archive pending │
│ │ │ (view, tag, edit) │ │ (files + metadata) │
└─────────────┘ └──────────────────────┘ └──────────────────────┘
Ingestion Active phase │
(existing) (existing) │
v
External backup
tool moves files
│
v
┌──────────────────┐
│ Archived │
│ (metadata only │
│ in DB, frozen) │
└──────────────────┘
Files arrive in the ingress folder and are processed in three steps:
L/00/12/34/001234.orig.pdf), hash verified, source deleted.tags.json applied, search index updatedNo changes needed here.
Documents in the document folder can be:
No changes needed here.
Archival removes documents from day-to-day use while preserving all metadata for audit and recovery. It is a two-stage process: archive pending then archived.
A new config value ARCHIVE_PATH (default: archive/ sibling of DOCUMENT_PATH). The archive folder mirrors the nested directory structure of the document folder:
documents/L/00/12/34/001234.orig.pdf → archive/L/00/12/34/001234.orig.pdf
documents/L/00/12/34/001234.ocr.txt → archive/L/00/12/34/001234.ocr.txt
documents/L/00/12/34/001234.thumb.png → archive/L/00/12/34/001234.thumb.png
documents/L/00/12/34/001234.tags.json → archive/L/00/12/34/001234.tags.json
archive/L/00/12/34/001234.lifecycle.json (new)
A new .lifecycle.json sidecar is written at archive time. This keeps the .tags.json file unchanged (frozen) and records archive-specific metadata separately:
{
"archived_at": "2026-02-25T14:30:00Z",
"archived_by": "godocs",
"archive_reason": "user-initiated",
"original_path": "L/00/12/34/001234.orig.pdf",
"hash": "d41d8cd98f00b204e9800998ecf8427e",
"ulid": "01JFXYZ...",
"db_id": 1234,
"schema_version": "1"
}
This means:
.tags.json is copied as-is (frozen at archive time).lifecycle.json records when, why, and the document identity.lifecycle.json to verify integrity (hash) and track provenanceArchival uses two states tracked via a dedicated archive_status column on the documents table (not a tag — see rationale below):
| State | archive_status |
Files on disk | Visible in UI | Editable |
|---|---|---|---|---|
| Active | NULL |
document folder | Yes | Yes |
| Archive pending | 'pending' |
archive folder | Only in archive view | No (frozen) |
| Archived | 'archived' |
removed from archive folder (by external tool) | No | No (frozen) |
Why a column, not a tag? The “Archive Pending” concept needs to:
archived_at)However, an “Archive Pending” system tag is also created (like the existing “Hide” tag) so the archive state is visible in the tag UI and in .tags.json exports. The tag is applied automatically when archival begins and is the mechanism by which users can select documents for archival via the existing bulk-edit multi-select.
Uses the existing multi-select system:
?select=1When the user confirms:
archive_status = 'pending', archived_at = NOW() in DB.tags.json (includes the Archive Pending tag).lifecycle.json sidecar.orig.*, .ocr.txt, .thumb.png, .tags.json, .lifecycle.json) to the archive folder, preserving nested structuredocuments.path to point to archive locationarchive_status IS NOT NULL)An external program (backup tool, rsync script, cloud uploader) is responsible for moving files from the archive folder to long-term storage. Once files are moved:
PUT /api/document/{ulid}/archive-confirmarchive_status = 'archived'archive_status IS NOT NULL/archive page lists archived documents (metadata only, no file access)/archive page shows: name, date, tags, hash, archived_at, archive_statusIf files need recovery before the external tool has moved them (i.e. still in archive folder):
PUT /api/document/{ulid}/unarchivearchive_status, archived_atarchive_status = 'archived' (files already gone)ALTER TABLE documents ADD COLUMN archive_status TEXT; -- NULL, 'pending', 'archived'
ALTER TABLE documents ADD COLUMN archived_at TIMESTAMP; -- when archival began
Migration creates the “Archive Pending” tag:
INSERT INTO tags (name, color, description, tag_group, sort_order, created_at, updated_at)
VALUES ('Archive Pending', '#95a5a6', 'Document queued for archival', 'System', 10,
CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
ON CONFLICT (name) DO NOTHING;
ARCHIVE_PATH=archive # relative or absolute; default: sibling of DOCUMENT_PATH
| Method | Path | Purpose |
|---|---|---|
POST |
/api/documents/archive |
Archive documents (body: {"ulids": [...]}) |
PUT |
/api/document/{ulid}/archive-confirm |
External tool confirms files moved |
PUT |
/api/document/{ulid}/unarchive |
Undo archive-pending (if files still exist) |
GET |
/api/documents/archived |
List archived document metadata |
| Page | Change |
|---|---|
| Bulk edit | Add “Archive Selected” button |
| Home/search | Filter out archive_status IS NOT NULL (like Hide filtering) |
New /archive page |
Read-only list of archived documents with metadata |
| Document edit | Reject edits if archive_status is set; show “Archived” banner |
archive_status and archived_at columns (migration)ARCHIVE_PATH to config.lifecycle.json, update DB)/archive list pagearchive-confirm endpoint for external toolsarchive_status IS NOT NULL during orphan scanning. Do not delete archive-pending files from the archive folder./archive page could have its own search.The external tool is expected to:
.lifecycle.json files.lifecycle.json to get hash, ULID, and document identity.orig.*, .ocr.txt, .thumb.png, .tags.json, .lifecycle.json) to backup storage.orig.* matches .lifecycle.json hashPUT /api/document/{ulid}/archive-confirm to mark as archivedThe tool never needs to understand nested paths or canonical naming — it just processes whatever it finds in the archive folder.