SongKong and Jaikoz Music Tagger Community Forum

Delete duplicates deletes original

Hello,

Looking at the reported similar topics, it’s very possible this has been brought up before. From my observations, it seems a serious issue, and people should really generate a dry-run, or run DeleteDuplicates task in “move the duplicates to a duplicates folder”, instead of deleting to see if they get the same behavior.

From the Delete Duplicates report, I’ve identified that several albums which had duplicates were aggresively cleaned up with both the original and the duplicate removed.

For the record, this is an archive I’m cleaning up for a relative. I’m software developer with crypto and professional media management and indexing experience. Anyhow. Looking at the duplicates folder I can find the original file; no harm, beside being an annoyance.

Then I wondered, perhaps the same file is in the compilations (typical). Resorting to fdupes, it did report that many dupes had their original file still in the library. But about half of them did not.

I understand that a dupe isn’t only based on the file size or hash, as the same file can be present under different quality/codecs, but in these cases, after further investigation, it appears to be really that both the original and the duplicate were moved.

Had it been a destructive operation, these files would be gone. Yes I have a backup (hey, I kinda expect those things to happen from time to time) but many clients may not.

Sorry for the noise if its a known issue. I can help further diagnose. I would be fantastic if this could work correctly.

Thanks!

SongKong v6.8 Rumours, docker container edition, running on Synology DS-218+.

Hi, could you please give example of the song(s) you believe to be erroneously moved and run Create Support Files so I can take a look at the report

Hi Paul,

Here is a sample:

Original:
16 Bach_ Jauchzet Gott In Allen Landen, BWV 51 - Aria, _Jauchzet Gott In Allen Landen!_.m4a

Duplicates:
16 Bach_ Jauchzet Gott In Allen Landen, BWV 51 - Aria, _Jauchzet Gott In Allen Landen!_(1).m4a
16 Bach_ Jauchzet Gott In Allen Landen, BWV 51 - Aria, _Jauchzet Gott In Allen Landen!_ 2(1).m4a
16 Bach_ Jauchzet Gott In Allen Landen, BWV 51 - Aria, _Jauchzet Gott In Allen Landen!_ 2.m4a

Support file is being created. The music archive was quite a bit of a mess, hopefully not too much :slight_smile:

Hi, havent looked at support files yet - but I’m not clear why you think it is wrong to see as duplicates, from the filename looks like they are duplicates?

The duplicates are indeed duplicates. What I have problem with is that the original was also transfered to the duplicates folder… If it had been a “Delete” instead of “Move” the original would be purged now…

Also, I have to go through all duplicates to confirm whether the original is still in place… which means finding the original folder.

May I propose that when duplicates are moved to the duplicate folder, the tree structure gets preserved?

foo/bar/baz.mp3
foo/bar/baz 1.mp3 ----> duplicates/foo/bar/baz 1.mp3

It would save time to find where the duplicates came from!

Hi, you have found a bug ! - now logged as https://jthink.atlassian.net/browse/SONGKONG-1973

The problem occurs when you use the Same song and same album (metadata only) option and you find duplicates for tracks whereby the albumArtist is different but the album is the same

The give away is here:

The Artist line says

Artist: Gächinger Kantorei Stuttgart, Bach-Collegium Stuttgart, Helmuth Rilling; there are 3 duplicate keys, 9 songs to be deleted

Yet when you expand it the Album line says there are 19 duplicates keys, and 40 songs to be deleted

Album: Oratorio De l’Ascension & Cantates BWV50 Et BWV71; there are 19 duplicate keys, 40 songs to be deleted

Since the artist line is just totaling up the values of all deletions for albums for that artist the artist line should never be lower than the total of all the albums under the artist

If I then go to next line I find

basically the same duplicates listed again.

We also have

A duplicate should only be listed once.

So basically because the artist links to the same album, we end up grabbing all the duplicates keys for that album rather than just that specific artist/album combination. And that then causes errors in determining the songs to delete

I will fix this for next release

I don’t believe the problem occurs with any other Song is a duplicate if has same modes but I will double check. it is generally better to run Fix Songs first and then run Delete Duplicates afterwards using a Song is a duplicate if has same mode that considers musicbrainz ids/acoustids rather than just basic metadata.

May I propose that when duplicates are moved to the duplicate folder, the tree structure gets preserved?

This is a valid point and we already have an enhancement request for this - https://jthink.atlassian.net/projects/SONGKONG/issues/SONGKONG-1460

We have now fixed the problem with Delete Duplicates, we have also fixed another problem whereby the report was not listing the duplicate file not deleted only the deleted ones.

1 Like

Thank you Paul!

I’ve installed the last version just before I got caught by the flu. Slowly back on my feet, I’ve installed v6.8.1 and I’ll be giving it a shot soon. For now just creating new reports!