Instructions for building a client that uploads only new documents to godocs.
POST /api/document/upload
Content-Type: multipart/form-data
Form field: file (the document file)
Responses:
| Status | Meaning | Body |
|---|---|---|
| 201 | Created — new document ingested | {"ulid": "01J...", "name": "file.pdf", "hash": "abc123...", "id": 42} |
| 409 | Conflict — duplicate already exists | {"error": "duplicate document", "hash": "abc123...", "ulid": "01J...", "name": "file.pdf", "id": 42} |
| 400 | Bad request — sidecar file rejected | {"error": "cannot upload sidecar files directly; ..."} |
| 200 | Ingested but ULID lookup failed | {"path": "/ingress/file.pdf"} |
The server computes the MD5 hash of the uploaded bytes and checks the database before writing to disk. A 409 response includes the existing document’s ULID, so the client can proceed with metadata/tag operations without re-uploading.
Supported file types: .pdf, .jpg, .jpeg, .png, .tiff, .doc, .docx, .odf, .rtf, .txt
Rejected sidecar extensions: .ocr.txt, .thumb.png, .tags.json, .tn_256.png
GET /api/document/lookup?hash=<md5_hex>
Responses:
| Status | Meaning | Body |
|---|---|---|
| 200 | Found | {"ulid": "01J...", "name": "file.pdf", "path": "L/00/00/42/000042.orig.pdf", "id": 42, "hash": "abc123..."} |
| 404 | Not found | — |
PUT /api/document/:ulid/ocr
Content-Type: application/json
Body: {"text": "extracted text content"}
PUT /api/document/:ulid/metadata
Content-Type: application/json
Body: {"author": "...", "source": "scanner", ...}
Also auto-generates the thumbnail.
POST /api/documents/:ulid/tags
Content-Type: application/json
Body: {"tag_id": 1}
GET /api/tags
Returns array of {"id": 1, "name": "Finance", "color": "#3273dc", ...}.
MD5, lowercase hex string (32 characters). Example: d41d8cd98f00b204e9800998ecf8427e.
Go: crypto/md5 — the server uses github.com/drummonds/godocs-hash.
import "crypto/md5"
func hashFile(path string) (string, error) {
f, err := os.Open(path)
if err != nil {
return "", err
}
defer f.Close()
h := md5.New()
if _, err := io.Copy(h, f); err != nil {
return "", err
}
return fmt.Sprintf("%x", h.Sum(nil)), nil
}
Shell: md5sum file.pdf | cut -d' ' -f1
MD5 throughput is ~1.5 GB/s on modern CPUs. A 50 MB file hashes in ~26 ms. On a Raspberry Pi, expect 50–200 MB/s — still under 1 second for large files.
Upload every file. If the server returns 409, use the ULID from the response body to continue with metadata/tag operations. No client-side hashing needed.
for each file:
resp = POST /api/document/upload with file
if resp.status == 201:
ulid = resp.body.ulid # new document
elif resp.status == 409:
ulid = resp.body.ulid # already exists
else:
handle error
# continue with OCR, metadata, tags using ulid
This is simplest but transfers every file over the network.
Compute MD5 locally, check via lookup endpoint, skip upload if the document already exists.
for each file:
hash = md5(file)
resp = GET /api/document/lookup?hash={hash}
if resp.status == 200:
ulid = resp.body.ulid # already on server
else:
resp = POST /api/document/upload with file
ulid = resp.body.ulid # 201 created
# continue with OCR, metadata, tags using ulid
For repeated syncs, maintain a local manifest (path → {size, mtime, md5}) to skip files that haven’t changed since the last sync.
for each file:
stat = os.Stat(file)
if manifest[path].mtime == stat.mtime && manifest[path].size == stat.size:
skip # unchanged since last sync
hash = md5(file)
if manifest[path].hash == hash:
update manifest mtime, skip # content unchanged despite mtime change
resp = GET /api/document/lookup?hash={hash}
if resp.status == 200:
ulid = resp.body.ulid
else:
resp = POST /api/document/upload with file
ulid = resp.body.ulid
update manifest {path, size, mtime, hash}
# continue with OCR, metadata, tags using ulid
func uploadDocument(client *http.Client, baseURL, filePath string) (string, error) {
// 1. Hash locally
hash, err := hashFile(filePath)
if err != nil {
return "", err
}
// 2. Check if already on server
resp, err := client.Get(baseURL + "/api/document/lookup?hash=" + hash)
if err != nil {
return "", err
}
defer resp.Body.Close()
if resp.StatusCode == 200 {
var doc struct{ ULID string `json:"ulid"` }
json.NewDecoder(resp.Body).Decode(&doc)
return doc.ULID, nil // already exists
}
// 3. Upload
body := &bytes.Buffer{}
writer := multipart.NewWriter(body)
part, _ := writer.CreateFormFile("file", filepath.Base(filePath))
f, _ := os.Open(filePath)
io.Copy(part, f)
f.Close()
writer.Close()
resp, err = client.Post(baseURL+"/api/document/upload", writer.FormDataContentType(), body)
if err != nil {
return "", err
}
defer resp.Body.Close()
var result struct {
ULID string `json:"ulid"`
Error string `json:"error"`
}
json.NewDecoder(resp.Body).Decode(&result)
switch resp.StatusCode {
case 201:
return result.ULID, nil // new document
case 409:
return result.ULID, nil // race condition: duplicate appeared between lookup and upload
default:
return "", fmt.Errorf("upload failed: %d %s", resp.StatusCode, result.Error)
}
}
Filename + filesize pre-filter — Not reliable. Same content can have different names; different content can share names.
Streaming hash during upload — Wastes bandwidth on duplicates. Hash-before-upload avoids the transfer entirely.
Partial hashing (first N bytes) — Not collision-safe for scanned documents with identical headers. Full MD5 is already milliseconds.