SongKong and Jaikoz Music Tagger Community Forum

steveyg777 · 16 November 2025 10:15

Does songkong create tons of html files for each report? I’d imagine they are large and inefficient. I can see why you would have decided to use that approach so that the files interact with one another but for users worth large amounts of files this isn’t going to be a good method. I’m wondering if referencing data purely from the database would work better and reduce the size? Then it would only be the database that needs drastically reducing.

I asked ai for solutions and it gave some really effective suggestions:

Proposal: A Scalable, Low-Footprint Database & Reporting Architecture for SongKong

Context
For users with very large media libraries, SongKong’s database and reports can grow to impractical sizes. This affects long-term use, undo capabilities, and the deduplication subsystem. If users are expected to clear these files regularly to reclaim space, the value of having an undo history and persistent database becomes significantly reduced.

The following proposal outlines a set of architectural improvements designed to:
• Maintain full undo capability long-term
• Reduce database size by 70–95%
• Reduce or eliminate massive report files
• Improve performance for large libraries
• Keep SongKong viable for collections with hundreds of thousands or millions of files

⸻

Replace Full Snapshots with Delta-Based Undo Logs

Currently, SongKong appears to store large metadata snapshots per file per operation. For big libraries, this rapidly becomes multi-gigabyte.

Proposed approach:
• Store only the changed fields between versions (a delta)
• Use a version chain: v1 → v2 → v3, reconstructable on demand
• Store deltas in compressed form (Zstandard or LZ4)

Benefits:
• Reduces undo storage dramatically
• Enables infinite undo with tiny storage footprint
• Already proven in: Git, Lightroom, Adobe DAMs, and DB journaling

⸻

Use Compressed Database Storage (Zstandard/LZ4)

Metadata compresses extremely well. Integrating compression at the storage layer offers huge wins.

Two simple options:
• Store text/JSON metadata as compressed blobs within SQLite
• Use a compression-enabled database such as RocksDB or LMDB for undo logs

Expected savings:
70–95% reduction in storage footprint.

⸻

Deduplicate Internal Metadata

Large libraries repeat a huge amount of metadata (labels, genres, release countries, formats, etc.).

Introduce a dedupe table:
• Store each unique string once
• Reference via numeric IDs
• Apply to artists, albums, genres, labels, musicbrainz IDs, etc.

Result:
Database size shrinks massively with no behavioural changes.

⸻

Store Heavy Objects Externally (Cover Art, Fingerprints)

If SongKong stores any of the following:
• Cover art
• Acoustic fingerprints
• Full MusicBrainz release objects

…these should be externalized to a cache folder instead of being embedded in the DB.

Store hashes + pointers in SQLite rather than the full objects.

⸻

Auto-Cleanup + Retention Policies

SongKong should ship with optional cleanup logic:
• Automatically purge orphaned undo entries
• Retain full undo for recent sessions
• Retain compressed deltas for older sessions
• Purge logs older than X days if desired

This allows users to preserve undo history forever without runaway growth.

⸻

Lightweight Reports Generated From the Database

Currently SongKong produces heavy HTML reports that can reach hundreds of MB.

Proposal:
• Store zero or minimal report data
• Generate reports dynamically from the DB when the user wants them
• Use lazy-loading UI (load details only when expanded)
• Provide an optional “export full report to HTML” button for users who really need a full file

Outcome:
• Report files remain small
• The database becomes the single source of truth
• Reports can be regenerated at any time
• Undo remains consistent

This completely solves the “report bloat” problem long-term.

⸻

Add a “Lite Database Mode” for Users Who Don’t Need Heavy History

To support all types of users:

Lite Mode
• Only stores recent changes
• No long-term undo
• Near-zero storage impact

Standard Mode
• Full undo
• Delta-based
• Compressed storage

This avoids forcing heavy storage requirements on casual users.

⸻

Optional Remote Storage for Historical Reports

Large installations may prefer storing:
• Undo logs
• Reports
• Historical sessions

…on SMB/NFS/S3 instead of local disk.

SongKong could support remote storage paths for these assets.

This keeps local disk usage near zero.

⸻

Core Benefits of This Entire Approach

Improvement and Impact:
Delta-based undo
Massive reduction in database size

Compressed metadata
70–95% storage reduction

DB-driven reports
Eliminates huge HTML report files

Externalised heavy data
Smaller DB and faster load times

Dedupe tables
Minimal repeated metadata

Auto-cleanup policies
Long-term, sustainable usage

Lite mode
Ideal for small collections

This is a balanced architecture that supports both casual users and power users with multi-TB libraries, without requiring constant manual database purges.

I hope this helps you Paul and gets you excited about the potential of the ideas here.

paultaylor · 16 November 2025 13:46

Hi, so generating reports from the database is probably possible but has a number of disadvantages and only one advantage (reduced disk space), disadvantages are because the reports do not exist standalone:

They cannot be viewed outside of SongKong
When accessing within SongKong access will be slower because the report pages have to be generated they dont already exist
When upgrade SongKong the database is re-created (because of changes in database schema between versions) so this means existing reports no longer available. Allowing upgrade to modify existing database rather than recreating is tricky and error prone.
Create Support Files send the reports to help with diagnostics in resolving users issues (I would not have been able to resolve your previous issues without reports) so if we implemented this change and no longer sent reports i would not be able to support my customers properly. Alternatively if physical reports were created from database then Create Support Files would take much longer to run, and would fail if you did not have sufficent disk space to create the reports.
It would be an awful lot of work to do this preventing me from working on more useful new features whereas the solution i suggest would resolve your disk space issue and is quite easy to implement.

The AI response is interesting and has some useful ideas but is flawed

Currently, SongKong appears to store large metadata snapshots per file per operation
Incorrect, when songs are first loaded into SongKong their metadata is stored, but we already use a delta system to track changes in metadata - https://hibernate.org/orm/envers/
We use a relational db called h2 this is totally different thing to the suggested RocksDB or LMDB, also it is pure Java so it is available on all platforms without issue.

Use Compressed Database Storage (Zstandard/LZ4)
Seems to think the metadata for any one song is stored as a single chunk, but it stored as name value pairs so individual edits can be retrieved

Deduplicate Internal Metadata
Possibly could do this, but would require sustantial changes and added extra complexty to database model, do not think it would lead to major reduction in database size.

Store Heavy Objects Externally (Cover Art, Fingerprints)
These are already stored in a cache, but cache still requires disk space

Auto-Cleanup + Retention Policies
This is essentially what I am recommending as solution

Lightweight Reports Generated From the Database
This has all the problems I list at the start

Add a “Lite Database Mode” for Users Who Don’t Need Heavy History
This doesnt help heavy users, who are more likely to have the issue

Optional Remote Storage for Historical Reports
This can already be done either by pointing reports folder to remote location (e.g Symbolic link) or by simply moving the reports to a new place since they are standalone.

So, thanks I think there are a few things for me to consider further but my proposed solution offers a relatively easy fix that lets me concentrate my effort on things that would benefit users more.

paultaylor · 28 November 2025 10:54

So we cant compress the reports, but we can remove the whitespace and newlines which was useful for me for reading the source but makes no difference to how the report looks, removing it has reduced size by 75%, will be in next release.

steveyg777 · 11 December 2025 21:56

Sounds good! I can’t believe how much cruft the whitespace was causing! I’m glad you discovered it.

It would also be useful to have a sticky toggle to allow the choice to enable or disable reports when I’m starting a task. Even to specify where reports are stored so i can store them on my hard drives (i run my containers on my ssd drives) which have more discs available.

paultaylor · 12 December 2025 08:57

I was equally suprised

Actually you can store the reports in a different location on MacOS by using a symbolic link, I already wrote an article on how to move the database on MacOS to different location, so take a look at this and all you have to do different is change the location from the Database location to the Reports location.

steveyg777 · 11 December 2025 21:56

i will bear this in mind for my mac, although neither my system drive or my external drive have tons of space left. i’d appreciate being able to do this on the container in the future though if it’s something you would be willing to implement?

steveyg777 · 12 December 2025 00:48

i wondered something - say if i had a docker image and i rebuilt the container, i would need to save my license inside the app again. but if my license had expired (which should mean i can’t update any further but should be able to use the latest version available before my license expired, right?) will songkong accept my license still and allow me to use THAT version of SK? likewise with the mac or windows version too…?

paultaylor · 12 December 2025 11:22

Hi, sorry I forgot you were primarily using Docker.

So with Docker version it is already recommended in the instructions you always configure the /songkong so it is mounted on to a real directory. This means the configuration files and license are preserved even if you destroy the container and install a new version.

But /songkong is also where the database and reports are stored (songkong/Database and songkong/Reports) so you can simply configure /songkong to point to a location on a large drive with plenty of space and then your SongKong reports, database and preferences are now all stored in that location.

However if you are currently using SongKong with /songkong not configured or pointing to a different location you might want to copy the existing contents before you change so you dont lose existing configuration.

Not if you had /songkong mounted to a real folder in original container, and pointed /songkong in new container to same place as described above.

No, download/install and licensing are two different things, for example you could install latest version and even if license had expired you can still use new version just in Lite mode. This should not be prevented, and because SongKong is available for a number of platforms we don’t attempt to do any inplace updates anyway because it would get too complex.

For all platforms, we only provide the latest version of the software, since we would prefer customers to move to the best version of the software with bugs resolved and new features. It doesn’t matter however old your license is you can always get full access to the latest version of SongKong with one purchase of One Year of Updates at low price.

If you really want to be able to install old versions of SongKong in the future then you need to make a backup of the installer or docker image as required.

Noob help: Can i achieve my goals with SongKong and if so how?