SongKong Jaikoz

SongKong and Jaikoz Music Tagger Community Forum

Can Jaikoz fix my 'mostly right' MP3 collection?

Hi, new Jaikoz customer here.

I have a large collection of MP3s - 14,412 files at last count. I just finished retrieving MusicIP Acoustic IDs for all files (about 12 hours!), MusicBrainz IDs and MusicBrainz data.

Most of my MP3 files started out with accurate tags, but I was missing some fields for many files (ie… genre, album cover art, year of recording) so I thought Jaikoz would be a great way to automatically fix up those problems.

What I found: MusicIP / MusicBrainz is about 95% accurate identifying my music. That’s good, but in a colleciton this size it means that there are about 720 incorrectly tagged files, randomly distributed throughout my collection. Simply applying the changes as-is will mess up those files’ tags. On the other hand, reviewing each file one by one is proving to be tedious and error prone. Plus on a collection this size, simply navigating from folder to folder in the view takes about 5sec. (on an Athlon x2 6000+ / Raptor HD system)

What I’d like to see is Jaikoz to apply some level of fuzzy matching between MusicBrainz retrieved data and the existing MP3 tags and/or existing folder hierarchy where the file is located and file name. I’d like it to use the following update logic:

If existing ID3 tags “fuzzy match” OK to the musicbrainz data
Update missing ID3 tags with musicbrainz data, leave existing tags alone
Or, if the filename/folder structure ‘fuzzy matches’ to the musicbrainz data, but the ID3 tags do not, update all ID3 tags to match what MusicBrainz has.

If the musicbrainz data does not fuzzy match to the ID3 tags, yet their audio fingerprints are the same, then flag the file in such a way the file’s id3 tags and directory location can be displayed for quick bulk reviews and a decision to apply or discard the musicbrainz id3 tags.

Hi, I think you can do most of what you want with the existing version of Jaikoz but like any program it can take a while to fully understand it. The algorithm you discuss above could be useful but the difficulty is that everybody wants a slightly different algorithm, so the idea with Jaikoz is to break down the algorithm into tasks - hence allowing you to modify how it works to suit your requirements.

The first thing to note is that the Acosustic Ids created by MusicIP are ALWAYS correct. Whether or not they match to any/incorrect musicbrainz tags the acoustic id for a file will not change, so immediately after getting acoustic ids for the number of files you have I would immediately save the chnages to your files, so you dont have to repeat this step. I am going to make this an option in a future version of Jaikoz.

In Action Settings/MusicBrainz Settings/Format you can specify for each field whether it should be overwritten always, overwritten if empty or never populated. The default setting is Overwritten Always, but I think you want Overwritten if Empty.

There is already a fuzzy match against the MusicBrainz data, and you can set how similar the match is using Action Settings/MusicBrainz Settings/AutoMatch/Minimum Rating required if AutoMatch only. This doesnt take into account the folder or filename but if this contains better data than the tag data you can run Action/File and Folder Correct/Correct Tags from Filename to extract info from the filename into the tag before you run the musicbrainz correct.

Hoewever Jaikoz doesnt worry about the fuzzy match when it has an acoustic id match because in my experience the acoustic id match always seem to identify (approximately) the correct track. It may not always get the album you want, but would not incorrectly identify the title. But if you are seeing different results to this it would be straightforward for me to change Jaikoz to apply a fuzzy match on records that were matched by acoustic id as well.

You say Musicbrainz correctly identifies 95%, does that mean it incorrectly identifies 5% OR it doesnt identify 5%. Because if it is the latter it would be easy to identify the files that require more work by simply sorting or filtering records that have no MusicBrainzId or MusicIPId. If it is the first I would be interested if you could give me some examples of where the match was poor. You might find the poor matches are where the acoustic id did not match and MusicBrainz had to match by metadata alone, you can disable matchimng of this sort by enabling Action Settings/MusicBrainz Settings/AutoMatch/Do not match if unable to find Acoustic Id Match

Paul, thank you for your detailed response.

OK now I understand that the MusicIP values are calculated and not the result of a matching exercise. So the MusicIP values are just fine but the MusicBrainz database matches are not always fine.

As you have noted, it almost always gets the track right, but it may get the album wrong. This is what I’m calling the 95%right / 5% wrong situation. Common scenario is to have most tracks on an album tagged correctly, but one or two are tagged with the name of the “Greatest Hits” album rather than the original album. Happens with a lot of my Beatles MP3s for example. I’ll switch to “Overwritten if empty”, that should fix that problem. If I can get new album art based on accurate Album titles, i’m happy.

By the way about the 12 hour process to download 14K MusicIP tags. I did save the files immediately after getting the MusicIP tags, so I’m in good shape there. I noted that the process utilized both my CPUs at approximately 50% for the entire time. Does it deliberately throttle CPU usage, or is it i/o bound?

[quote=dschwarz]As you have noted, it almost always gets the track right, but it may get the album wrong. but one or two are tagged with the name of the “Greatest Hits” album rather than the original album. Happens with a lot of my Beatles MP3s for example. I’ll switch to “Overwritten if empty”, that should fix that problem.
[/quote]
The problem with popular artists such as the Beatles is that exactly the same track may have occurred on many albums, but the MusicIP Id may only be linked to some of these albums, usually they have a link to the original album. If they link to multiple albums Jaikoz will use the existing metadata to find the best match so if the metadata contains the orioginal album name it should select that one. But there may NOT be a link from the MusicIPId to the original album in which case another will be selected. The whole issue of album matching is a complex one and I will be adding some new options to refine this aspect of matching.

There is no throttling, actually the musicip tag generation doesnt utilise multiple cpus (at the moment), probaly one cpu runs the main Jaikoz program and one cpu runs the Genpuid program (which is used to generate the ids). Generating acoustic ids is cpu intensive , but then the analysis are sent to the MusicIPServer, and it has to wait for the results. So depending on what is happening it can be CPU or I/O Bound.