just for example: main app is thread 0
population phase: staggered thread start
Thread 1 reads/writes the folder/subfolder directory into the database
Thread 2 reads database entries and goes to each folder/subfolder, writing filenames into database
Thread 3 gets file location from database then reads tag info from each file and writes back to database
generation phase:
by now thread 1 is done reading folder structure and starts generating MB Id’s, writing to each database entry as it goes.(this is where most memory is used and released)
Thread 2 starts the lengthy process of online MB Id data retrieval writing results as it goes.
Thread 3 finishes writing tag info and generates MB Id starting halfway down the folder tree
i dont know if 2 threads can individually access MB online data at the same time, but if it can, then either/both threads 1 and 3 can go online to retrieve MB Tag data after finishing MB Id generation.
to avoid data corruption you have to keep threads from writing to the same entry or file at the same time(this you already know) so the idea is to stagger the load across multiple threads at multiple locations in the database and the filesystem simultaneously…Thread 0 could monitor progress and divide database sections for each thread to chew on…
to ease up on read/write access to the database, you could cache them to memory and do it in blocks… in any case the file itself only remains in memory long enough to generate the MB Id( and again to commit changes later)…everything else is database access which can be done with SQL
for multi-core systems this would scream compared to the top>down approach…for older single core, i guess it depends on how well it can multi-thread
food for thought…wish i knew how to do it myself…