SongKong Jaikoz

SongKong and Jaikoz Music Tagger Community Forum

Reduce delay in MusicBrainz querys

Is there a manual override for MusicBrainz lookups, for those of us who run a mirror/local musicbrainz server?

No, but I should get this done as it has been requested before.

Been a few months… Any chance this is getting some attention?

Actually (as an afterthought), I suggest this should probably be a non-GUI-interface command line flag… It might help reduce defaulters from messing with the actual musicbrainz server unintentionally, and I presume that anyone with skill enough to run a local musicbrainz server would have the know-how to run jaikoz via the command line.

When you say local mb server do you mean a Virtual machine build ?
I was going to make the change but unfortunately realised that with the virtual machines only lookups are done locally, search still uses the the webservice on the live server.

So I can only improve the lookup queries, and I cannot just do that without some rather messy code changes.

[quote=paultaylor]When you say local mb server do you mean a Virtual machine build ?
I was going to make the change but unfortunately realised that with the virtual machines only lookups are done locally, search still uses the the webservice on the live server.

So I can only improve the lookup queries, and I cannot just do that without some rather messy code changes.[/quote]

Yes, I am referring to a VM build. I’m trying to find the reference to ‘search vs lookups’, but I’m not seeing anything in musicbrainz regarding any difference. (So maybe I’m just being thick). How are the two different?

According to MB, the VM server is supposed to be 100% stand-alone (depending on which databases you install). Again, this comes down to knowledge of running a VM server; and I can see you preferring to keep the software ‘safe’ from breaking terms of service.

Someone did make an interesting suggestion in the MB forums for Picard; why not have throttle overrides only for Private IP Addresses? Shouldn’t be much more difficult than giving direct control over the throttle itself… ?

Just an idea.

Lookups are done by a query to the Musicbrainz database.
Searches are done by a query to a Lucene Index which on live server is built four times a day from the database.

The VM build doesnt contain a built lucene index, and when you use the webservice in the VM it defers to the live server when you do a search but its local database when you do a lookup.

So if I changed Jaikoz to not do throttling when connecting to the local Musicbrainz VM the problem would be that the VM would connect to thelive server and that would be out of my control.

Got it. And yes, we can install our own search servers.

http://forums.musicbrainz.org/viewtopic.php?id=2314

This comes down to a musicbrainz problem, not a Jaikoz one. (Including the search server with the virtual machine would show more forethought, on their part). In fact, I may just submit a new (empty) virtual machine with those very changes, once I get this up and running. Considering the last update was done 2 years ago, this might be a simplified global solution.

…So, I’m working my end of the problem. Can I convince you to try throwing this into your next beta?

[quote=Ouglee]Got it. And yes, we can install our own search servers.

http://forums.musicbrainz.org/viewtopic.php?id=2314

This comes down to a musicbrainz problem, not a Jaikoz one. (Including the search server with the virtual machine would show more forethought, on their part). In fact, I may just submit a new (empty) virtual machine with those very changes, once I get this up and running. Considering the last update was done 2 years ago, this might be a simplified global solution.

…So, I’m working my end of the problem. Can I convince you to try throwing this into your next beta? [/quote]

Hmm, I worked out an easy way to identify whether a request is a lookup or a search so I could make a change so that lookups do not have the one second limit enforced if not going to musicbrainz.org or test,musicbrainz.org, though I am still a bit concerned about this being abused.

What I dont really want to do is remove the one second limit for all queries with the hope that a customers virtual machine does have a search server installed, because most dont.

The Virtual Machine has not been updated for 2 years because there has nt been a new Musicbrainz Release for 2 years, but there wil be soon Musicbrainz NGS. I am pretty sure that there will be no more Jaikoz releases now before Musicbrainz NGS is released and I expect to make Jaikoz 4.0 NGS availble within the next couple of weeks that you can use against test.musicbrainz.org.

So if I were you I would look at NGS and the NGS Virtual Machine http://blog.musicbrainz.org/?p=807

I’ve been trying to work with NGS, but have been hitting some problems as far as using the ova file with VirtualBox. (Or any other emulator). I’ll try a previous version, but I’ve had nothing but errors with 20110221. I’d much rather boil-down my own version under Ubuntu, but I’ll get to re-learn perl module installations :stuck_out_tongue:

I agree with your assessment of potential abuse, but I truly believe that if the primary bases are covered, most people won’t abuse inadvertently.

  1. Impose perma-limits for the musicbrainz.org domains.
  2. Prevent casual-users from changing the limits (Command line option?)
  3. Error Dialogues for ‘refused’ queries and searches, warning against potential abuse.

…What more could be done?

I appreciate the efforts, and I look forward to trying Jaikoz 4.0 NGS!

I have another solution , add to the preferences a Musicbrainz Search Server field, for the public Musicbrainz releases this would be set the same as Musicbrainz Server (i.e http://musicbrainz.org).

If you have installed the Virtual MB build and done nothing else, then once again these preferences will both have the same value. But if you have build the search server and can provide a url direct to the search server (because in NGS Musicbrainz just returns the results untouched) then this can be used as the basis of removing the limit for searches.

Following through the thread here, keen to participate in any solutions you are cooking up (both on the Jaikoz and MB vm / NGS vm).

My preference is ubuntu for both desktop and server. I use vmware server (free) patched to run on the newer kernels. I currently have a fully operational MB vm slave without the search service.

Currently don’t have a lot of time to get my head round all the perl modules, additional products/libraries and porting needed to crowbar MB onto Ubuntu (went down that path once and ended up in a dark place), but am happy to test, beta test, QA, whatever. I can find my way around network diags if necessary too - useful to see whats really going across the wire.

Let me know if I can help.

Cheers
Graf

Just touching on this…

Currently, NGS RC2 is (to my knowledge) still unavailable for VM testing due to a corrupted image. I, also, am not savvy enough with the perl modules to create a ‘custom’ server box without just batching them all together in some ugly mess, so I’m stuck waiting for the pre-fab in order to modify and test.

As of the 14th, there are still some problems with merging of data, and they’re (hopefully) going to have this sorted out by today or tomorrow. (Keep your eye out for RC3).

Most of the concerns I’m seeing relate to how data will be merged into NGS, and not its overall functionality, so I’m pretty confident we can nail this all down long before its official release. Still, there isn’t any hurry to get it done (the main point of the thread, that is) without a fully functioning NGS Jaikoz, so first things first, right?

Also another followup…

Any movement on image availability for you? I tried downloading recently and got errors.

Paul: any further thoughts on speed/search local MB server override?

Yeah, this will looked ta soon once things have settled down.

Posted an idea in another topic for a local DB search…

Have been developing in python (was thinking about perl originally but have since decided to learn python).

I have a local MySQL version of the whole MB database.

Currently assumes that files already have a PUID (MusicIP) id (sourced through Jaikoz).

Paul: had a thought that this code so far is basically what your “Do extra searches” option currently does?

--------------------------------------

Main

--------------------------------------

ProcessCommandLine()
ReadConfigFile()
EnvironmentCheck()

target_file_names = []

start_dir = “/Data/Music/00AA Dev”
print ‘Starting in Folder:’, start_dir

ScanForTargets(start_dir, target_file_names)

print ‘Found Files:’
for t in target_file_names:
print str(t)

print ‘Opening Database’
try:
conn = mdb.connect(‘localhost’, ‘mbdb’, ‘mbdb’, ‘ngsdb’);
#print ‘Connecting to Database’
cursor = conn.cursor()

for target_file_name in target_file_names:
    tag_set = ReadTags(target_file_name)
    puid_str = ExtractPUIDString(tag_set)
    print target_file_name, ' has PUID: ', puid_str, ' and is on: '
    if puid_str:
        possible_releases = GetPossibleReleases(puid_str)
        PrintReleases(possible_releases)    
print 'Closing Database Connection'
cursor.close()
conn.close()

except mdb.Error, e:
print “Error %d: %s” % (e.args[0],e.args[1])
sys.exit(1)

os.sys.exit(0)
#-------------------------

All the work gets done in: possible_releases = GetPossibleReleases(puid_str)

Which does a table walk to match every possible release that may have that song puid. Interesting thing is the non-precise match to puid (thats where other meta data will need to come it).

Now starting on the hard part: analysis logic to work out the matches (exact, best, overmatch, missing song, duplicates, etc etc).

Paul, I have a funny feeling I’m starting to build Jaikoz :slight_smile:

Can you confirm the difference in logic? I think Jaikoz is matching best “first” or “earliest” release based on combo of meta data, PUID etc. Routines eg “match to single release” take the track count into account.

Where I am heading is to build up a full match landscape across all input files then analysis from the landscape to work out most likely full albums (having regard to codec, quality and source folder grouping etc).

Effectively this would allow it to be pointed at any collection, no matter how ordered, and produce the best release gid map for that collection.

From there “update meta data from MB” and the rest of Jaikoz could sort out.

Is this making sense?

Not really, what are you asking of Jaikoz here ?

I’m not really asking anything specific of Jaikoz other than asking you to validate my logic for a tool to be used in conjunction with or as an extension to Jaikoz - if you have time and inclination.

By contrast:

  1. A basic tagger would look at the PUID (however generated) and in conjunction with other meta data already assigned to the file, try to match it to a track on a release.
  2. Picard (for example), extends this by flagging releases that get involved in an overall matching batch (per method 1 above) and then allows the user to manually match / move tracks to get the best match to a release. I.e quasi-album centric.
  3. Jaikoz is a hybrid, where by target releases are identified through much more sophistication. It essentially uses method 1 with more smarts - eg, balance of meta strength, release type preference, country preference, and matching by source folder etc. to try to pin down the right release.

The failing with all approaches is the resolution of many to many matching - ie. an arbitrary list of files could potentially match multiple releases, either fully, in part, or over match.

I increasingly find that I have folders that incorrectly match a release, either having orphan files ("non album tracks") or not enough files (partially complete albums).

Therefore what my logic is attempting is to identify ALL possible releases for a given PUID through walking PUID->Recording->Track->Tracklist->Release (this is the point where the code I posted is up to). After then processing for a group of files (intent is to throw all files for a particular artist at it per batch), then by analysing the possible releases that have been matched (comparing track counts, TOC etc), to select the best match (or matches) of complete releases without orphans or partial albums.

This amount of database searching really requires local DB access (though it could conceivably be done through the web-service).

Greatest outcome would be for you to include such functionality in Jaikoz (though I’m not sure if such scope is in your road map?). So instead, am sharing this all with you as a collaboration. I’m not sure yet if what I am trying is really achievable in practice? If it is, then is it a concept or option worth bringing into / making available through Jaikoz?

Cheers

I think you dont fully appreciate how Jaikoz works. Puids are not central to Jaikoz,
It group files into possible releases
Find potential releases for this grouping using puids and/or metadata
Then scores every potential release it finds against the file grouping to find the best match.
If it finds a good match it uses it.
If not tries track matching.

So I think it already does similar to your own idea.

If you are not getting the matches you want try matching release by release using Match Songs to One Musicbrainz Release, or run Autocorrect and then tidy up afterwards using the improved Manual Correct.

OK, so thats how I understand Jaikoz to work too. I think where I am coming unstuck is on the first step - grouping files into possible releases:

Through using picard and then later Jaikoz (as a far better tool), but with initially limited understanding, I have screwed my collection.

So now I have heaps of incomplete releases, strewn across releases I never had in the first place.

  1. It would be great in any case if Jaikoz would iteratively “match to single release” per sub folder - I think this was a topic somewhere else.

  2. Where my logic was heading and to fix my incomplete release issues:

  • add all files from (say) a single artist.
  • ignore existing meta data and ignore underlying folder structure
  • so just assuming a big pool of files and only starting with puid:
    – get a universe of all possible releases derived from the puids
    – do some sort of magic “best fit” analysis to determine the most complete set of releases with all tracks matched that can be built from the pool of input tracks.

Feasible? Something that you could build into Jaikoz?

Cheers (and apologies for the extended dialogue on this - clean music collection, happy life!)