SongKong Jaikoz

SongKong and Jaikoz Music Tagger Community Forum

batching by

Something that could be nice, is I give a folder to jaikoz (let say 30,000 mp3) and it will automatically batch them by 300 or 5000 or what ever is the fastest number, once the let say 5000 mp3 are processed and moved on a diff directory, it’ll takes the next 5000 and continue…

I guess the only issue there could be if you want to eye check before saving but otherwise this can be a huge time saver. The failed autocorrect mp3s could be move in a diff directory to be reprocess later on automatically or manually…

I think this would be better served by a different product really, one that you just specified a folder and then let it get on with it, its always going to be difficult do this with Jaikoz as it currently works.

May be something like this in the Autocorrector:

  • Load next 100 files from folder “c:\My Music To Tag”
  • … (current autocorrector calls)
  • Move & Save in folder “c:\My Music”, move failed in “c:\My Music Failed”

Could it be possible with Jaikoz?

thanks

Wil

Not really because everything works based on the files being loaded into Jaikoz, I can see what you want to do but how would the interface work, would any songs currently loaded then just be dropped, if you didnt elect to save the files in the autocorreter then what would happen to them when you loaded the next batch …

When you save it just remove them as it does today. If you do not save then the files are not remove (they are appended) and jaikoz keep them in memory until either it finish processing all the files or the memory blow up (I mean this is as it does it today if you do not save, you can try to load 100,000 musics, the memory may blow up at some point, but if you batch by 100 and think about saving in your autocorrector then you are ok - if you think about it this is what we are doing today, we cut our big lib in small pieces to process them by more manageable chunk so the memory do not blow up, automatizing those steps will be a great help and a huge time saver).
Later on you can effectively add a db where you can push the data in the db, but I think this could be a quick solution to batching in the meantime of a db. Also this allow you to keep a low level of memory (if you do not forget to put the save in your autocorrector script)

The trouble is that most people seem to want to load their whole library every time, even if most of the songs have already been fixed. If we did what you say they would select the folder containing 100,000 files but just load 1,000, fix, unload and then load the next…

So there is no point loading the files into the display in the first place, as the user never interacts with the files.

If you want to do it automatically it would be better for me to create another application that forgoes the Jaikoz GUI but will be able to cope with any amount of songs, I am thinking about doing this.

Wrt to the database Jaikoz does have a database but I had problems when trying to serve the screen directly from the database, and if the no filter is being used you still need alot of memory to have all the songs in the table. I will have another go at this at some point but it was a difficult issue to solve.

1 app for tagging is nice. If you start to devise it might become confusing. Like I can experiment with the filter, once I have something l like I add it in the autocorrector, this will even make more sens once you will have the add-ins, then we could experiment and then batch if we like the experiment.

One autocorrector instruction could be “load on table”, thought this may or may not be useful, it depend: I can see batch and table being use together for the failed one, like I can really see my self asking to load 100,000 by 1000 and then having all the failed one either kept loaded or loaded from a failed folder at the end “load failed from c:\My Music Failed”, so basically I can go on the table and just correct them manually and submit them again.

Ultimately people want to tag their library and update it once in a while (batch or not batch). All this discussion is mostly trying to get around the memory, I do not feel a second app is necessary to solve this issue, batching or having a db could solve this issue, I feel batching could quickly solve the issue in the mean time a db arrive.

For the table and the db, the best (but this is a bit long to implement) is to have a table with a buffer up and down that point to the db, so you only display let say a window of 1000 records and pre fetch 2000 records, 1000 records after and 1000 records before the currently displayed one, so if we move the slider up or down they are already cache (like a cache window of 1000 up and 1000 down), then when we arrive at the end of the display (current window), you display the cache window down and pre fetch the next 1000 and get rid of the 1000 in the cache window up and replace them by the previous current display etc… sorry if I didn’t explain well but this work really well this is basically a window view of the data (with a cache window up (for previous data) and a cache window down (for next data) and then you “slide” those 3 windows up and down following what the user want to see)

If the db is already there then we do not need the batching, to handle 100,000 we just need to make sure they are not all loaded in memory and use the window cache mechanism for the table display and this will solve all the issues. Batching is just a short cut (quick win) to get the functionalities in case the db wasn’t there (and the cached table display). But utlimatly the db+cache table can replace the batching

Do you have more knowledge on this ?

I did implement two in the past for a huge chemical db. I might need to re-implement one in few months for another field.

There may be already some free implementation on the web for the techno you are using, look on the web for buffered table, table with cache, or table with huge dataset to display…things along those words, also may be add the key words of the technology you are using, if .net for instance “ListView table with cache buffer” if java…

I cc a screenshot to explain the concept.

Cheers,

Wil

Yes I don’t have this at the moment.

My concerns with it would be to do with sorting and maintaining the record number. For example if they sort on a new column what does it do , query the database for the next 1000 rows, performing the sort with SQL ?

What happens as they scrolldown - approaching the 1000 th record …

But as you say, I need to do some background reading

This is a common visualization issue for huge db like billions of records (DNA, biological info, or even financial data) and the db are share by many users and they are remote.

Sorting can be done for sure there are diff ways (resort on a temp table with only the PKs (primary keys) etc…). The best is if you can find one already implemented.

If you approach the 1000th there are at least 2 implementation 1 can be to just slide the 3 orange rectangle cache through the db (so basically filling up the down cache memory once you arrive at let say 1000-10%), the other one will be soon as you are at 1000th, drop the up cache, the current view become the up cache, the down cache become the current view and now you can load in background the next 1000 from the db to fill up the new down cache…this can go really fast as the db is local, some test will need to be done to see what are the best values, like 1000 versus 100 versus 500…
You need some kind of producer/consumer to fill up the cache in background…The nice thing is the user can look at the 1000 data while the next 1000 are being loaded…

The batch could be a (temp) quick solution in the mean time a buffered table is build (or you extend an existing table) . I didn’t look for this for a long time but I will be surprise if there are no implementation outside, java is pretty popular and I know it is use for huge dataset display…

Another solution can be to have a “Prev” and “Next” button under the table, so you basically display 1000 (cache 1000 up and 1000 down), so we can browse the current 1000, if we press next you just display the next 1000, unload the cache up and replace it by the one that were displayed, load in background the next 1000 in the cache down etc…). I guess the diff is the button trigger the event vs the table slider…
Note: somewhere on the ui you will need to display at which thousand page we are at like if I’m watching the 2000 to 2999 out of 100000 it could display “Pages: 2/100” (otherwise this is more work to inherit from the table and fake the slider to make it continuous)

Hmm, thats sounds simpler, suddenly sounds doable.
I assume when you say the first 1000 you mean the first thousand according to the active filters and sort criteria, rather than just the first 1000 songs loaded.

Exactly according to the active filter and criteria (the simplest is to push this on the db)

The important thing is filling up the caches down (if we go down) and up (if we go up) in a background thread so it will feel like this is instantaneous (like after viewing few records of the current page the cache down should be completed, or you can cache more than 1 page up and down if you want…like 5 page total in memory 2 up 1 present 2 down, page can be 1000 or 100 really depend of performance…).

Hi Paul, I was wondering what is the status concerning this feature? I believe you mention you will work on it

Best Regards,

Thanks

Wil

Hi Paul for those 2 topics,
http://www.jaikoz.net/jaikozforum/posts/list/1530.page
http://www.jaikoz.net/jaikozforum/posts/list/1357.page

Nowdays most comouter langage include tables (listview) that have a VirtualMode, I made a test in c# with 100 millions objects with 2 columns each and this is super fast if implemented with VirtualMode. I know you use java and I’m sure there is an equivalent, I put the code here for c#

Last answer in the thread:
http://social.msdn.microsoft.com/Forums/en-US/csharplanguage/thread/b91ffdc2-12e0-4d8b-860a-f0506f058af0/

Regards

W