Does songkong create tons of html files for each report? I’d imagine they are large and inefficient. I can see why you would have decided to use that approach so that the files interact with one another but for users worth large amounts of files this isn’t going to be a good method. I’m wondering if referencing data purely from the database would work better and reduce the size? Then it would only be the database that needs drastically reducing.
I asked ai for solutions and it gave some really effective suggestions:
Proposal: A Scalable, Low-Footprint Database & Reporting Architecture for SongKong
Context
For users with very large media libraries, SongKong’s database and reports can grow to impractical sizes. This affects long-term use, undo capabilities, and the deduplication subsystem. If users are expected to clear these files regularly to reclaim space, the value of having an undo history and persistent database becomes significantly reduced.
The following proposal outlines a set of architectural improvements designed to:
• Maintain full undo capability long-term
• Reduce database size by 70–95%
• Reduce or eliminate massive report files
• Improve performance for large libraries
• Keep SongKong viable for collections with hundreds of thousands or millions of files
⸻
- Replace Full Snapshots with Delta-Based Undo Logs
Currently, SongKong appears to store large metadata snapshots per file per operation. For big libraries, this rapidly becomes multi-gigabyte.
Proposed approach:
• Store only the changed fields between versions (a delta)
• Use a version chain: v1 → v2 → v3, reconstructable on demand
• Store deltas in compressed form (Zstandard or LZ4)
Benefits:
• Reduces undo storage dramatically
• Enables infinite undo with tiny storage footprint
• Already proven in: Git, Lightroom, Adobe DAMs, and DB journaling
⸻
- Use Compressed Database Storage (Zstandard/LZ4)
Metadata compresses extremely well. Integrating compression at the storage layer offers huge wins.
Two simple options:
• Store text/JSON metadata as compressed blobs within SQLite
• Use a compression-enabled database such as RocksDB or LMDB for undo logs
Expected savings:
70–95% reduction in storage footprint.
⸻
- Deduplicate Internal Metadata
Large libraries repeat a huge amount of metadata (labels, genres, release countries, formats, etc.).
Introduce a dedupe table:
• Store each unique string once
• Reference via numeric IDs
• Apply to artists, albums, genres, labels, musicbrainz IDs, etc.
Result:
Database size shrinks massively with no behavioural changes.
⸻
- Store Heavy Objects Externally (Cover Art, Fingerprints)
If SongKong stores any of the following:
• Cover art
• Acoustic fingerprints
• Full MusicBrainz release objects
…these should be externalized to a cache folder instead of being embedded in the DB.
Store hashes + pointers in SQLite rather than the full objects.
⸻
- Auto-Cleanup + Retention Policies
SongKong should ship with optional cleanup logic:
• Automatically purge orphaned undo entries
• Retain full undo for recent sessions
• Retain compressed deltas for older sessions
• Purge logs older than X days if desired
This allows users to preserve undo history forever without runaway growth.
⸻
- Lightweight Reports Generated From the Database
Currently SongKong produces heavy HTML reports that can reach hundreds of MB.
Proposal:
• Store zero or minimal report data
• Generate reports dynamically from the DB when the user wants them
• Use lazy-loading UI (load details only when expanded)
• Provide an optional “export full report to HTML” button for users who really need a full file
Outcome:
• Report files remain small
• The database becomes the single source of truth
• Reports can be regenerated at any time
• Undo remains consistent
This completely solves the “report bloat” problem long-term.
⸻
- Add a “Lite Database Mode” for Users Who Don’t Need Heavy History
To support all types of users:
Lite Mode
• Only stores recent changes
• No long-term undo
• Near-zero storage impact
Standard Mode
• Full undo
• Delta-based
• Compressed storage
This avoids forcing heavy storage requirements on casual users.
⸻
- Optional Remote Storage for Historical Reports
Large installations may prefer storing:
• Undo logs
• Reports
• Historical sessions
…on SMB/NFS/S3 instead of local disk.
SongKong could support remote storage paths for these assets.
This keeps local disk usage near zero.
⸻
Core Benefits of This Entire Approach
Improvement and Impact:
Delta-based undo
Massive reduction in database size
Compressed metadata
70–95% storage reduction
DB-driven reports
Eliminates huge HTML report files
Externalised heavy data
Smaller DB and faster load times
Dedupe tables
Minimal repeated metadata
Auto-cleanup policies
Long-term, sustainable usage
Lite mode
Ideal for small collections
This is a balanced architecture that supports both casual users and power users with multi-TB libraries, without requiring constant manual database purges.
I hope this helps you Paul and gets you excited about the potential of the ideas here.