Could this be accomplished through the use of hashes of music files? use FLAC’s native checksum? t’s in the METADATA_BLOCK_STREAMINFO header inside a FLAC file. The FLAC command ilne supports using it to test files:
It’s rather obvious that we don’t need this to be cryptographically sound, so algorithms such as MD5 become wasteful. XXHash may be a nice solution, the hashing speed is ridiculously fast, and it should be reliable enough to fit our needs. Although I’m not sure yet, I think the speed difference between MD5 and XXHash won’t matter much, the disk speeds will be the bottleneck, although we’ll have to wait and see.
Another interesting factor here is how can we store this. I see two ways we can go about this:
- We checksum the entire file, which means we can’t store the sum as a metadata tag, rather we will have to do something like create a large file with song->hash association list.
- We checksum only the audio data, and here there’s the question if we want to sum the formatted data (FLAC, ALAC) or sum over the PCM stream. Not sure why would we do this over the PCM stream, but just putting it on the table.
I for one like solution 2 more. It makes it more portable to just have a builtin sum and it’s simpler as well.
Robert