SongKong Jaikoz

SongKong and Jaikoz Music Tagger Community Forum

Improved matching algorithm for Jaikoz

[quote=paultaylor]
No these options are NOT acceptable , what customer wants is to select ‘Prefer Original releases …’ and for Jaikoz to do just that. [/quote]
OK, but if I am understanding the requirements correctly, the tracks that he wants to associate to the Original Release could initially contain meta data that are from a completely different release. Is that right?

In which case, the existing meta data fields (TrackNo, AlbumName, etc) are going to mislead the matching algorithm rather than assist it with matching to the desired “original” release. So I would argue that this usecase requires a different scoring algorithm than my usecase discussed previously. In this usecase I believe that TrackNo, AlbumName, etc should be completely excluded from the matching score since they are irrelevant to the choice of the correct release. Do you see my point?

He want to associates all tracks wit the original release, so some/most could have metadata that is for the the correct release, and some won’t we don’t know ahead of time.

I mean even if you don’t want to match earliest release the existing metadata in any file could help or hinder the process

Ive renamed this discussion and made it sticky coz important that other chime in before I start on this.

[quote=paultaylor]He want to associates all tracks wit the original release, so some/most could have metadata that is for the the correct release, and some won’t we don’t know ahead of time.

I mean even if you don’t want to match earliest release the existing metadata in any file could help or hinder the process [/quote]

No, I actually think some of the meta-data is less helpful in his case than my case. In my case I am taking tracks that have been ripped from an Album to contain a subset of meta data tags and then am trying to match those tags against a corresponding release in MB. If someone has submitted that Album/Release to MB in the past then there’s a good chance that my meta data will match best with that specific Release in MB rather than any others that might exist.
In his case, he is taking a set of tracks that have been ripped from various Albums. The meta data inside is going to match best with a whole plethora of different MB Releases. He doesn’t want that to happen, so he needs a different algorithm than I do.

Since he doesn’t want the tracks to be matched to the MB releases that they actually came from TrackNo, AlbumName and AlbumArtist meta data will almost certainly hinder (unless they happen to match by sheer chance). But luckily it sounds like it might not be necessary to even use those meta data fields in order to find the Original Release. I would need to clarify something first - for a given Track in MB is there a lookup function that would allow you to traverse to one and only one Original Release?

If this constraint generally holds true then I would propose that his algorithm does something like this:

  1. For each Track in the collection, match the local track to the MB track that gives the highest track-level score but TrackNo, AlbumName and AlbumArtist should be excluded from the scoring (i.e. 0 points). Points awarded for matching some other release-independent fields (e.g. MusicIP ID) would be beneficial and help ensure that we match to the correct track.
  2. Use the MusicBrainz API to lookup the Original MB Release for each MB Track and map the local track to the Original MB Release

By the way, here’s a query in MB to find all the Tracks called “Love Me Tender” where the Artist is “Elvis Presley”
http://musicbrainz.org/search/textsearch.html?query=“Love+Me+Tender”+AND+artist%3A"Elvis+Presley"&type=track&limit=25&adv=on&handlearguments=1

It looks to me like MB treats each “Love Me Tender” track as a separate Track object because they can each have a different TrackNo and Duration. Perhaps they can even be in a different language?!

How do you tell which one is the Original Release? Is there some way of finding the earliest release that contains a Track with a matching TrackName from the same Artist?

I would say this one http://musicbrainz.org/release/8af66f2d-f867-4805-861d-e280c17f75a7.html because its not a compilation and its the release with the earliest release date, but yes it is difficult and not clear cut.

@mjw

we have two different approachs. You collect Albums and they have to be constistent. I collect only songs and I’m not so interested in Albums but I’m interested in the first release date of that specific song, because I like to listen to music of a specific time-interval (or genre or mood etc.).

It certainly does seem difficult. I found this page that describes the current MB search syntax
http://musicbrainz.org/doc/TextSearchSyntax

but ideally you would want a much richer syntax for finding the earlier release. In SQL you would need to use expressions similar to this

SELECT * FROM Track WHERE Track.Name=‘Love Me Tender’
AND Track.Artist=‘Elvis Presley’
AND Track.Type != ‘Compilation’
ORDERBY SELECT Date FROM Release WHERE Release.ReleaseId=Track.ReleaseId

I suppose that complex queries like this are probably not possible via the current MB API. Presumably you have to make multiple queries and then refine the results on the client side.

Apparently MB are in the midst of implementing a richer relational data model and it sounds like the “Work” entity might be helpful in achieving Alfg’s objective since it provides link between all tracks that are variants of the same song.
http://wiki.musicbrainz.org/NextGenerationSchema

Sounds like the query language post NGS will still be a bit too primitive (uses Lucene) though:
http://wiki.musicbrainz.org/Next_Generation_Schema/SearchServerXML

[quote=Alfg]@mjw

we have two different approachs. You collect Albums and they have to be constistent. I collect only songs and I’m not so interested in Albums but I’m interested in the first release date of that specific song, because I like to listen to music of a specific time-interval (or genre or mood etc.).[/quote]
Yes, I think I understand now and agree that two approaches are required.

No, probably not - in most cases the data data is probably the same as yours and it will come from original albums its just that when it doesnt he still wants to map to the original album,. This information that you saying to not use might be required to get any match. If we cant match the track on name (because its missing/misspelt ectera) or puid then we fall back to the release.

Actually I think Ive mislead you slightly, with the ‘Prefer original albums’ option I’m not necessarily trying to return the earliest release that the track was on, just a studio album or single rather than a compilation. If there exist two releases with the same tracks, then the earliest one would be preferred , but if they differ slightly but one better matched the users tracks that would be the one picked. So actually this option only comes into play if the best release returned by your original album is a compilation, this is why in the current version of Jaikoz this functionality is basically implemented by denying compilation releases 13% of max score, but tis method is to crude.

So could actually use your algorithm, but if the top match is a compilation and the 'Prefer original album option ’ is set disregard the compilation, as long as there is an original album with a decent score for the ‘Track score’
But one trouble with this is representing this in manual match where it would be expected the highest score to be at the top and now the Is original release isn’t part of the scoring that might not be so. Perhaps in this particular case it could be recalculated by just discarding the release metadata from the score for the compilation matches only.

No, this doesnt exist. What we might be able to do in the next NGS release though is find all the releases that a particular track is on, and sort by date.

I’m taking about the recording entity in NGS. In current Musicbrainz a track is only associated with one album but in NGS, a recording links to all releases it is found on. Work is too high level as all versions , remixes of the same song would link to one work but we want to match on the correct recording.

[quote=paultaylor]Perhaps in this particular case it could be recalculated by just discarding the release metadata from the score for the compilation matches only.
[/quote]
Another idea - It might be make sense in the Manual Correct popup window to leave the scores as-is, but perhaps instead have a checkbox to hide/display tracks from compilation releases. If “Prefer Original” is enabled, then by default the compilation matches would be hidden, but a user could bring them back by disabling the the “Hide Compilation Releases” checkbox.

That sounds like it would be exactly what Alfg is looking for.

Yes, makes sense. I was just thinking that you might have trouble finding a the earliest Recording if the Duration of Alfg’s MP3 file doesn’t match with that of the earliest Recording. Probably requires some experimentation when NGS comes out.

A question about MB querying - Do you know if it’s possible in NGS to do queries like:
“Track=‘Love Me Tender’ AND Duration > 2:43 AND Duration < 2:50”

[quote=mjw][quote=paultaylor]Perhaps in this particular case it could be recalculated by just discarding the release metadata from the score for the compilation matches only.
[/quote]
Another idea - It might be make sense in the Manual Correct popup window to leave the scores as-is, but perhaps instead have a checkbox to hide/display tracks from compilation releases. If “Prefer Original” is enabled, then by default the compilation matches would be hidden, but a user could bring them back by disabling the the “Hide Compilation Releases” checkbox.
[/quote]
Yes, that works better, although needs finessing because if the only match is for a compilation I would want to display that, because it would for one it would be better than nothing and of because some tracks are only released on a compilation (usually as an incentive to buy the compilation even if you own the original albums)

You can do already do this in the current system, but you have to convert duration into 1/2 seconds segments (qdurs)

http://musicbrainz.org/ws/1/track/?type=xml&query=track:\"Love%20Me%20Tender\"%20AND%20qdur:[81%20TO%2085]

Paul wrote

Don’t hide a compilation, make the background in another color (maybe yellow)