SongKong Jaikoz

SongKong and Jaikoz Music Tagger Community Forum

Questions to improve my automation script

Hi @paultaylor,

So, on your recommendation, I’ve wrote a script, that does the following :

  • checks my base dir for folders that are above 550GB, if there are any, it will split them in folders of 500GB each, and will keep their subfolders intact (to keep all the album folders intact)
  • scans the now ready, with 500GB sized monthly folders, inside my main folder, and will list all the monthly folders it finds
  • starts a fix task, followed by a delete duplicate and a rename task for each monthly folder

The script logs the folders that were fully processed in a log file, which avoids to re-process these folders as soon they are listed in songkong_log file. This makes it pretty convenient for me, as I can simply restart the task from where I left it, should anything bad occur to my server (eg: crash)

The script will also, if set to True, send regular notifications about the advancement of a task :

and another one, at the end of a task, here an example of a Rename task that ended :

Using this “small folder” approach, I was able to reduce the impact on my server resources drastically, as It only do load a super small part of my music files (as you recommended it several times to me, thanks for this).

My running docker container, executed by the script for each task separately, barely uses 3GB or RAM now, and the impact on the CPU is neglictable:

My main concern, is that it’s still runnin too slow in my humble opinion. And I am seriously thinking about running several instances of songkong in parallel, in order to process, not a single monthly dir, but 3 or 4, in parallel.

this brings me to the question I have for you : there is an option in songkong to get rid of the database, and the reports. I know there is no way to roll back if the database gets wiped, but this is a risk I am agreeing to take. but, what, except this feature, is the impact on next processes, if reports and databases are wiped before each start of songkong ? As far as I understand, songkong reads each files tags, and will write a fingerprintID for each of them. So, once the file itself is fingerprinted, matched to musicbrainz, and musicbrainz or discogs ID is present in the tags, what is the added value of keeping the database and reports ?

From that screenshot the cpu usage for each cpu is very low, but in the past it has got very high so it seems that maybe the bottleneck is file i/o or getting results from Internet (MusicBrainz/Discogs ectera )

But also it depends on the task, i.e Fix Songs task is going to perform differently then Delete Duplicates. So it would be worth monitoring that during a run, and if we see a change then if you send your logs then from them I can determine what each thread is doing at ten minute intervals and see if we can work out where the delay is. Maybe even temporarily automate it by automatically sending the logs to support@jthink.net after each task (Fix Songs, Delete Duplicates ecetera) completes.

BTW have you got Rename Files working, I thought it didn’t work from command line ?

That is unsupported configuration and I think that is a very bad idea because firstly if they use same shared location for database/reports error that will break things because only designed for one instance of SongKong running. So then you have to configure separate location for each instance, then you’’ have noncontinuity for logs reports ectera. And if the slowness is file i/o or internet results then you will still have that problem because increasing the number of tasks is not going to increase your file i/o capability or internet results speed.

So we should monitor existing processes as described above

So database is needed during running of task because that is where we store details of the song and interim results. But you could delete database at the end of processing each folder if yo wanted, this would have some impact on performance but not so much if yo had finished processing that folder, i.e it would have more impact if you deleted database after running Fix Songs but before running Delete Duplicates on same set of files.

There is not a command line option to empty database but you can manually do it if SongKong not running. Within the /songkong folder is a Database folder and you could just delete that and it will be automatically create an empty one next time start SongKong. Actually there are two databases, Database.mv.db is relational read/write database where we store details of your songs and the running task, EhCache is a read only cache of releases downloaded form Discogs and MusicBrainz. So you could elect to delete both or either of these.

Reports are created at the end of processing from information stored in database. Reports are read only and they are not needed for actual processing of files but here is not an option to not create reports because the reports are very useful. I guess you do not much like the reports as despite me asking a number of times you never show me screenshots from the reports but alot of work has gone into them and I find them always the best way to resolve matching issues. With such a large library that you clearly would like to get in pristine condition it would make sense to create and keep these reports so then you always have a clear record of what SongKong has done in case of any issues later on.

Again there is no command line option you can manually delete reports after task completed, however you cannot just delete the /songkong/Reports folder. You should not delete the style, webhelp, index.html, reports.html or local_reports,html files. The images folder store thumbnails of album covers that are shared between reports.

3 posts were split to a new topic: Loaded Songs Count dont match when run FixSongs from Cmdline