SongKong Jaikoz

SongKong and Jaikoz Music Tagger Community Forum

Noob help: Can i achieve my goals with SongKong and if so how?

Does songkong create tons of html files for each report? I’d imagine they are large and inefficient. I can see why you would have decided to use that approach so that the files interact with one another but for users worth large amounts of files this isn’t going to be a good method. I’m wondering if referencing data purely from the database would work better and reduce the size? Then it would only be the database that needs drastically reducing.

I asked ai for solutions and it gave some really effective suggestions:

Proposal: A Scalable, Low-Footprint Database & Reporting Architecture for SongKong

Context
For users with very large media libraries, SongKong’s database and reports can grow to impractical sizes. This affects long-term use, undo capabilities, and the deduplication subsystem. If users are expected to clear these files regularly to reclaim space, the value of having an undo history and persistent database becomes significantly reduced.

The following proposal outlines a set of architectural improvements designed to:
• Maintain full undo capability long-term
• Reduce database size by 70–95%
• Reduce or eliminate massive report files
• Improve performance for large libraries
• Keep SongKong viable for collections with hundreds of thousands or millions of files

  1. Replace Full Snapshots with Delta-Based Undo Logs

Currently, SongKong appears to store large metadata snapshots per file per operation. For big libraries, this rapidly becomes multi-gigabyte.

Proposed approach:
• Store only the changed fields between versions (a delta)
• Use a version chain: v1 → v2 → v3, reconstructable on demand
• Store deltas in compressed form (Zstandard or LZ4)

Benefits:
• Reduces undo storage dramatically
• Enables infinite undo with tiny storage footprint
• Already proven in: Git, Lightroom, Adobe DAMs, and DB journaling

  1. Use Compressed Database Storage (Zstandard/LZ4)

Metadata compresses extremely well. Integrating compression at the storage layer offers huge wins.

Two simple options:
• Store text/JSON metadata as compressed blobs within SQLite
• Use a compression-enabled database such as RocksDB or LMDB for undo logs

Expected savings:
70–95% reduction in storage footprint.

  1. Deduplicate Internal Metadata

Large libraries repeat a huge amount of metadata (labels, genres, release countries, formats, etc.).

Introduce a dedupe table:
• Store each unique string once
• Reference via numeric IDs
• Apply to artists, albums, genres, labels, musicbrainz IDs, etc.

Result:
Database size shrinks massively with no behavioural changes.

  1. Store Heavy Objects Externally (Cover Art, Fingerprints)

If SongKong stores any of the following:
• Cover art
• Acoustic fingerprints
• Full MusicBrainz release objects

…these should be externalized to a cache folder instead of being embedded in the DB.

Store hashes + pointers in SQLite rather than the full objects.

  1. Auto-Cleanup + Retention Policies

SongKong should ship with optional cleanup logic:
• Automatically purge orphaned undo entries
• Retain full undo for recent sessions
• Retain compressed deltas for older sessions
• Purge logs older than X days if desired

This allows users to preserve undo history forever without runaway growth.

  1. Lightweight Reports Generated From the Database

Currently SongKong produces heavy HTML reports that can reach hundreds of MB.

Proposal:
• Store zero or minimal report data
• Generate reports dynamically from the DB when the user wants them
• Use lazy-loading UI (load details only when expanded)
• Provide an optional “export full report to HTML” button for users who really need a full file

Outcome:
• Report files remain small
• The database becomes the single source of truth
• Reports can be regenerated at any time
• Undo remains consistent

This completely solves the “report bloat” problem long-term.

  1. Add a “Lite Database Mode” for Users Who Don’t Need Heavy History

To support all types of users:

Lite Mode
• Only stores recent changes
• No long-term undo
• Near-zero storage impact

Standard Mode
• Full undo
• Delta-based
• Compressed storage

This avoids forcing heavy storage requirements on casual users.

  1. Optional Remote Storage for Historical Reports

Large installations may prefer storing:
• Undo logs
• Reports
• Historical sessions

…on SMB/NFS/S3 instead of local disk.

SongKong could support remote storage paths for these assets.

This keeps local disk usage near zero.

Core Benefits of This Entire Approach

Improvement and Impact:
Delta-based undo
Massive reduction in database size

Compressed metadata
70–95% storage reduction

DB-driven reports
Eliminates huge HTML report files

Externalised heavy data
Smaller DB and faster load times

Dedupe tables
Minimal repeated metadata

Auto-cleanup policies
Long-term, sustainable usage

Lite mode
Ideal for small collections

This is a balanced architecture that supports both casual users and power users with multi-TB libraries, without requiring constant manual database purges.

I hope this helps you Paul and gets you excited about the potential of the ideas here.

Hi, so generating reports from the database is probably possible but has a number of disadvantages and only one advantage (reduced disk space), disadvantages are because the reports do not exist standalone:

  • They cannot be viewed outside of SongKong
  • When accessing within SongKong access will be slower because the report pages have to be generated they dont already exist
  • When upgrade SongKong the database is re-created (because of changes in database schema between versions) so this means existing reports no longer available. Allowing upgrade to modify existing database rather than recreating is tricky and error prone.
  • Create Support Files send the reports to help with diagnostics in resolving users issues (I would not have been able to resolve your previous issues without reports) so if we implemented this change and no longer sent reports i would not be able to support my customers properly. Alternatively if physical reports were created from database then Create Support Files would take much longer to run, and would fail if you did not have sufficent disk space to create the reports.
  • It would be an awful lot of work to do this preventing me from working on more useful new features whereas the solution i suggest would resolve your disk space issue and is quite easy to implement.

The AI response is interesting and has some useful ideas but is flawed

Currently, SongKong appears to store large metadata snapshots per file per operation
Incorrect, when songs are first loaded into SongKong their metadata is stored, but we already use a delta system to track changes in metadata - https://hibernate.org/orm/envers/
We use a relational db called h2 this is totally different thing to the suggested RocksDB or LMDB, also it is pure Java so it is available on all platforms without issue.

Use Compressed Database Storage (Zstandard/LZ4)
Seems to think the metadata for any one song is stored as a single chunk, but it stored as name value pairs so individual edits can be retrieved

Deduplicate Internal Metadata
Possibly could do this, but would require sustantial changes and added extra complexty to database model, do not think it would lead to major reduction in database size.

Store Heavy Objects Externally (Cover Art, Fingerprints)
These are already stored in a cache, but cache still requires disk space

Auto-Cleanup + Retention Policies
This is essentially what I am recommending as solution

Lightweight Reports Generated From the Database
This has all the problems I list at the start

Add a “Lite Database Mode” for Users Who Don’t Need Heavy History
This doesnt help heavy users, who are more likely to have the issue

Optional Remote Storage for Historical Reports
This can already be done either by pointing reports folder to remote location (e.g Symbolic link) or by simply moving the reports to a new place since they are standalone.

So, thanks I think there are a few things for me to consider further but my proposed solution offers a relatively easy fix that lets me concentrate my effort on things that would benefit users more.