SongKong Jaikoz

SongKong and Jaikoz Music Tagger Community Forum

Help tidying up Acoustid Database

SongKong and Jaikoz make extensive use of Acoustid to audio fingerprint songs, if the fingerprint is already in the Acoustid database we can then get metadata for the song. Hopefully this will include a link to a MusicBrainz recording id and then we can make use of the data provided by MusicBrainz, otherwise Acoustid only provides us with basic artist, title, and album names.

Acoustid/MusicBrainz pairings are submitted to Acoustid by users via various programs, each time a particular pairing is submitted its Sources count is increased. This includes SongKong and Jaikoz but mostly by the dedicated the fingerprinter tool tool provided by Acoustid.

Usually an Acoustid links to one MusicBrainz recording, but sometimes it can link to multiple MusicBrainz recordings. These may be different MusicBrainz Recordings that are actually the same song or they may be for a completely different song.

Sometimes applications submit incorrect pairings. This is fairly obvious when you look at an Acoustid page because the valid links will have similar titles and high source count, whereas the invalid ones will match to completely different title and have a source count of one.

For example in this screenshot below it is clear that Plasticine by Placebo is the correct track with 1083 sources, whereas Running up That Hill and Something Rotten only have 1 source, and they have already been disabled.

But is difficult to track down these invalid links.

So I have now created a series of Albunack Acoustid Reports that has found the links most likely to be bad.

The report shows cases where an Acoustid is linked to multiple MusicBrainz recordings with significantly different names, and one of the links only has 1 source.

It shows both a good match and potential bad match for these Acoustids in the form Artist Credit/Simplified Recording Name, sorted alphabetically by Good Match column. So if you have an interest in particular artist all the Acoustids linked to potentially bad MusicBrainz recordings can be found together in report.

But these potential bad links need manually checking before removing from database and it would be great if everyone could pick a favourite couple of artists and go through the report removing the bad links, that way in no time at all we could improve Acoustid database and as a result improve SongKong and Jaikoz results.

Procedure is as follows

  1. Create free account on Acoustid - easiest to use MusicBrainz account if have one.
  2. Find artist in report
  3. For each row in report where Good Artist starts with your chosen artist compare Good Artist with Bad Artist
  4. If they look significantly different click on the link (different artist and/or song name)
  5. Find the bad link, if still enabled and has Sources set to 1 select disable, if Sources is more than 1 then it has been submitted more than once and maybe valid and should probably be left untouched
  6. Comment can be left blank
  7. Select Submit

Note if having check a few matches for an artist you find the bad matches have already been disabled then it is likely someone else has already fixed songs for this artist, and would be best to look at a different artist.

e.g using screenshot above cliffs of dover by Eric Johnson is clearly different song to tribute to jerry reed. So I click on the Disable button next to the Tribute to Jerry Reed link

and then enter a comment, and submit

Now shown as disabled with a strike through line

We have many potential bad links in this series of reports. It clearly is not possible for me to do this myself, it would take years, but together I think we could make a big dent in this quite quickly. If you have 30 minutes to spare to fix a few artists that would be fantastic.

Please note this change is not fixing the Albunack database it is fixing the live Acoustid database so will benefit all applications that use Acoustid, not just SongKong and Jaikoz users.

1 Like

We have now broken the report into some smaller reports, please see http://www.albunack.net/reports.jsp for the full list.

In the first report it is dead easy to fix identify the report most of the time. It only shows cases where there are more than one song by the same artist linked to the same Acoustid, and one of those only has one source.

So the chances of an artist having two different songs that both resolve to the same Acoustid is virtually zero, so at least one of the songs is going to be a bad match unless they are both actually the same song but with a different name. This can happen, most commonly with translations of a song name but is rare and usually obvious if this is the case.

So how do you decide which are the wrong pairings(s), well usually it is the one with only one source but if the song with the most sources has less than five source it maybe that is the wrong one.

Comparing the length of the fingerprints for this acoustid to the length of the linked mbrecordings ids is very useful, if the difference is more than 10 seconds then the pairing is wrong (unless the length stored in MusicBrainz for the recording is actually wrong, again rare)

Looking at the user submitted data is also useful, if it matches with one pair but not the other then that likely shows the wrong pair.

During today I have processed nearly a whole page (1000 records), and found that in 90% of cases it is very obvious to spot errors, in 5% a little bit of thought/pattern recognition was required but still quite easy, in about 5% of cases it was unclear and I have left those ones alone.

e.g

In this case we can clearly see Who’s Gonna Take Me Home (The Rise and Fall of a Budding Gigolo) is the bad match, and there is alot of evidence to help us.

The combined number of sources for songs called Shake is 22, whereas the bad match only has 1
Fingerprints range from 2:33 to 2:39, whereas bad match is 4:40
Additional user submitted metadata clearly points towards Shake as the good matches

So as you can see easy to determine bad match in most cases.

Just updated the reports so they have latest Acoustid/MusicBrainz data

Good to see making steady progress, for example the Multiple recordings for same artist with different name has gone down from over 21,000 to just under 15,000 entries, so thanks to everyone who has helped with this.

Added the average fingerprint length (can span 7 seconds) and the Mb Recording lengths to the Acoustid reports.

I have been going through the first report that shows Acoustids linked to multiple songs with different names by same artist, this is clearly wrong in almost all cases, however when there is little difference in track lengths and no of sources between the good and match it is not always so clear which is the bad match

So I have now split the first report into

Multiple songs by same artist, song length doesn’t match fingerprint length

Multiple songs by same artist, song length matches fingerprint length

The first report has the easiest cases, the second report has potentially more difficult cases

Glad to report that the bulk of this has now been tidied up so this help improve matching by preventing bad Acoustid matches.