Good news, I have been working on a new task Correct Metadata from Discogs that allows songs to be matched to Discogs releases which you can run over all your releases instead of just one release at a time as was the case with the Match Songs to One Discogs Release introduced in the previous release. Additonally the algorithm goes a step further than Match Songs to One Discogs Release and calculates both a track-level and release-level score very much as outlined in http://www.jthink.net/jaikozforum/posts/list/1136.page whereas Match Songs to One Discogs Release only does Track level score. The track level score works out which release matches the tracks best, the release level score considers the release properties such as was it released in a users preferred country, preferred media and was it earliest release of that release.
After getting some feedback from you guys I expect to change the algorithm of ‘Correct Musicbrainz from Metadata’ to use a similar algorithm.
But my one problem is how to deal with the concept of the ‘Fix Song by Song where Possible’ in the Autocorrecter. Correct Metadata from Discogs itself does not depend on all songs being processed before saving changes to some songs, it only needs to process all songs that appear to be from the same album due to their album metadata or subfolder. But if we run another task such as ‘Correct from Musicbrainz’ before the task and we dont wait for it to completely finish then this will cause ‘Correct Metadata from Discogs’ to give differing results because Correct from Musicbrainz could effect the album field and hence which songs are considered as potentially a member of the same group.
If we dropped the Fix Song by Song where Possible option then a user with a simple AutoCorrecter task list like:
Retrieve Acoustic Ids
Correct Metadata from Musicbrainz
Correct Metadata from Discogs
would have to wait for all songs to be analysed before even starting Correct Metadata from MusicBrainz stage. This isnt so much a problem if run unattended but is if using interactively.
If we kept it then a second run of the Autocorrecter would do alot more fixes at the Correct Metadata from Discogs stage than the first run, because only on the second run would it benefit from Musicbrainz data for all the songs.
Best solution I have come up with so far would be to change ‘Fix Song by Song’ to ‘Fix Album by Album’, this might work as follows
Retrieve Acoustic Ids fixes all songs in the first subfolder encountered or with same album metadata value as the subbfolder, then these songs are passed to Correct Metadata from Musicbrainz, then when these have been processed they are passed to Correct Metadata from Discogs.
Of course if you are processing a load of files dumped into a single folder then the subfolder concept might not work so well, but then again you probably need to consider all the files togther to fix them properly
Comments please
Threading Musings (multiple cpus):
When used with Autocorrecter another subfolder/album will run on another thread so whilst album1 is being processed on one thread, album2 will be processed on another. Because album will be different lengths we should get different threads at different stages, i.e. album1 might be at the Correct Metadata from Musicbrainz , but album2 still on Retrieve Acoustic Ids. Which is useful because some tasks such as Correct from Musicbrainz are limited by the webservice tos to sending one query per second, so multiple cpus dosnt help speed this up that much.
When used standalone, the threading can again work at album level although if you are processing only one album then thats going to be single threaded which isnt best use of resources, so this may be a special case.