SongKong Jaikoz

SongKong and Jaikoz Music Tagger Community Forum

Jaikoz fails to read playlist entries with non-Latin characters in MacOS

May I request that MacOS users, in particular, export a playlist from iTunes consisting of your whole library noting how many songs are in it then open that playlist with Jaikoz and see how many actually open? Please post those numbers, your OS, and Language in this thread.

I was editing some data and realized several songs I had loaded in a playlist were missing. So I exported a big playlist from Music (iTunes in Catalina) and told Jaikoz to open the playlist. the playlist file has 31416 files, Jaikoz indicated it was importing 30399 files told me twenty were corrupt and couldn’t be opened, it finished with 30379 open. I can live with the twenty, they will probably work after I open them or preview them. The other 1000+ files, that’s 3%, that did not load mostly have non-latin characters (Unicode) ñ, á, ö, etc. somewhere in their name or pathname but also some with a decimal dot, dashes , exclamation mark, and other punctuation. If English isn’t one’s first language the failure rate could be very high.

The files load in Jaikoz if one adds/opens their directories or files. There is nothing wrong with the files, and the playlist opens every file in Music/iTunes so there is nothing wrong with the playlist. Apple has an issue that the playlist export from iTunes/Music is encoded differently than the folder listing from the command line/system, but playlists/filenames using either coding open properly, and are recognized/processed by the system. They simply fail if one compares them. Both are supposedly UTF-8, but only the iTunes/Music playlist actually is. Jaikoz fails regardless of the playlist encoding which indicates it is not UTF-8, and there is a good chance the fix is trivial.

If someone knows a quick fix I would appreciate it, but based on my investigation, this issue can’t be fixed by changing one’s code page, terminal settings, locale or some easy adjustment by the user, it must be fixed by the developer.

Are you say the playlist exported by iTunes is not actually UTF8, if so that is the problem because then when Jaikoz reads the filename listed in the playlist the names will not be read correctly if the playlist says it is Utf8 but isn’t, this is biggest problem for non ascot chars such as accented chars.

I have been looking into this in depth and determined the cause and a very easily implemented solution with virtually no time penalty. The playlist is UTF-8, but not the one used in Linux and other OSes, as I had believed, it is UTF-8-MAC. I tested it and it’s close, but not the same. It uses the long encoding on all non-Latin characters as far as I could tell. If you do a shell script with find, you get a result that has short encoding for some like ñ and long for others, and they aren’t usually the same long encoding. The fix is to convert the characters coming from anything off the command line (shell actually), and possibly some programming libraries.

Below is an AppleScript program I wrote to show the audio files in the Music folder that are not in iTunes/Music. It requires BBEdit (a free download) you have to export your Library playlist and change the location in the script to point to it and the Music folder. It is a perfect demonstrator for this issue.

The program relies on iconv, a built-in function, to do the conversion. With the pipe ( | iconv -c -t UTF-8-MAC ) removed there are a thousand mismatches in my library, of course the files appear the same. It is those files that Jaikoz fails to load from a playlist. With the pipe in place as printed below I have ten unmatched files because they are missing completely from the playlist.

I think all you need to do is pipe some of the data through iconv and things will work perfectly.

Thank you for your attention.

Gary

The site software converts regular quotes to smart quotes and those won’t work, you will need to edit them, there are four sets.

(*

Program to generate a list of the audio files in the Music Folder and compare to the contents of an exported Library playlist.

The difference represents audio files which are not added to Music/iTunes.

The shell script is run while BBEdit is waking up, this cuts cold start time almost in half. The right window is opened first since it is a file and may exist in somewhere in BBEdit’s window tabs and it could leave a blank window for comparison.

© 2020 Gary Hillis

*)

set LibraryPlayList to “Volumes:Share:Library.m3u”

set MusicFolder to “Volumes:Share:Media:Music”

set StartTime to current date

tell application “BBEdit”

activate

–set TempClipboard to the clipboard

set the clipboard to ( do shell script “find -E " & texts 1 thru -2 of (POSIX path of (MusicFolder as alias )) & " -type f -iregex ‘.*.(aa|aif|aiff|mp3|m4a|WAV)$’ | iconv -c -t UTF-8-MAC”)

make new text window

open LibraryPlayList

select text 1 of project window 1

sort lines selection of project window 1 output options {replacing target: true }

set RightWin to ID of front text document

select line 1 of text 1 of project window 1

make new text window

paste

select text 1 of project window 1

sort lines selection of project window 1 output options {replacing target: true }

set LeftWin to ID of front text document

select line 1 of text 1 of project window 1

compare text document id LeftWin against text document id RightWin options {case sensitive: false , ignore RCS keywords: true }

end tell

–set the clipboard to TempClipboard

set EndTime to current date

set ElapsedTime to (EndTime - StartTime)

Ok thanks for looking into it.

UTF8-MAC is not UTF8 so this a bug in how iTunes is generating the file, not a bug in how Jaikoz is reading it.

i.e if the file says it is UTF8 then we have to read as UTF8 not UTF8MAC

Written communication is notably imperfect for dealing with some issues, so please accept my assurance that I am not trying to be confrontational, I am simply trying to be direct, thorough and concise.

I did some additional investigation with Jaikoz and playlists. For the purposes of this discussion we can ignore the difference between UTF-8 and UTF-8-MAC as none of those differences actually appear in the playlists of my library generated by Music/iTunes, I suspect it is unlikely they will for any real playlist.

Music/iTunes in MacOS generates identical files when exporting .m3u and m3u8 files. The only difference is the file extension. There are no option to select encoding as there is on TextEdit. Aside from UTF-8, UTF-16 big endian and ASCII are the encodings I would most expect to see in files on a Mac. When Jaikoz imports the .m3u8 file all but 2 of my songs, 31405, are opened - those 2 don’t have editable metadata. When Jaikoz imports the same playlist file with an m3u extension only 30399 are loaded. Clearly the program is capable of dealing with a UTF-8 encoded file, but it doesn’t unless it expects the file to be UTF-8 encoded. What encoding does Jaikoz expect an m3u on a Mac to have?

Here’s the problem statement as I would frame it if we could start over: The m3u format is not well defined, each entry may consist of as little as the local pathname, or it may include other information which one may properly ignore. Jaikoz knows that a .m3u8 file is UTF-8 encoded and processes it properly. Jaikoz cannot know the encoding of a .m3u file without inquiring of the OS because it doesn’t know if the playlist was generated using iTunes, some other application, or manually typed, it shouldn’t care, it shouldn’t make assumptions, it shouldn’t behave differently with equivalent inputs, and it should not fail unexpectedly and without alerting the user. The program should be as flexible as possible in dealing with such unknowns.

If Jaikoz were to convert all playlist files to UTF-8 as it reads them and then treat them as it treats m3u8 files they would always work. If Jaikoz currently converts m3u8 files to some other format as they are read, then Jaikoz should do that to all playlists as they are read so they will all work properly. Since iconv doesn’t require the input format to be specified either conversion is easily incorporated into Jaikoz. Doing so actually simplifies your program as it means it doesn’t have to inquire about the encoding and it eliminates the two cases associated with m3u and m3u8.

Ok I misunderstood you, so as I now understand it it works perfectly fine if you name the file with a .m3u8 extension, the problem only occurs if the file is has a .m3u extension because then the file should be encoded for the default for that platform, but is in fact still encoded as UTF-8.

So sorry I dont understand the point of the applescript you wrote, isn’t the solution simply to change the extension of your files from .m3u to .m3u8?

I have looked at the Jaikoz code it assume that the encoding is ISO-8859-1 because this is what is commonly used and there is no consistent way to work out the encoding, I could default it to UTF-8 but then that would prevent reading of m3u playlists that were ISO-8859-1

  1. Yes, If Jaikoz reads the playlist exported from iTunes as m3u8, or as m3u renamed to .m3u8 extension, it works perfectly, both in Windows and MacOS. Jaikoz and iTunes both work the same in each OS.

Technically there is no sure default for any OS - not MacOS, not Windows, not Linux, not even BSD. Applications like Jaikoz can choose any encoding, but very few choose ANSI/ISO-8859-1 anymore. Apple applications use UTF-8, some, like TextEdit allow the user to change it. And the command line may yield UTF-16. I looked at some of my oldest MacOS apps and they were all UTF-8 even 15-16 year old ones (I had to boot an older version of OS X to check 32 bit apps). I wanted to do some additional checking before making any recommendations so I borrowed a Windows 10 computer. iTunes in Windows 10 exports playlists encoded in UTF-8, and most Microsoft applications also use UTF-8, Notepad defaults to UTF-8 but allows others. Windows 10 uses UTF-16 internally. Linux used to use ANSI/ISO8859-1 the most, but for over a decade it is mostly UTF-8 as delivered. Users often use a local variation like Russian that is a European Language and will therefore convert nicely to UTF-8.

The most common encodings overall on the Web as of February 25, 2020
(http://w3techs.com/technologies/overview/character_encoding/all):

  1. [UTF-8] (95.0%)
    (https://en.wikipedia.org/wiki/UTF-8, https://w3techs.com/technologies/details/en-utf8, https://w3techs.com/technologies/details/en-iso885901)
  2. [ISO-8859-1] (2.4%)
    (https://en.wikipedia.org/wiki/ISO/IEC_8859-1, https://w3techs.com/technologies/details/en-iso885901)
  3. [Windows-1251] (1.0%)
    (https://en.wikipedia.org/wiki/Windows-1251, https://w3techs.com/technologies/details/en-windows1251)
  4. [Windows-1252] (0.5%)
    (https://en.wikipedia.org/wiki/Windows-1252, https://w3techs.com/technologies/details/en-windows1252)
  5. [Shift JIS] (0.2%)
    (https://en.wikipedia.org/wiki/Shift_JIS,
    https://w3techs.com/technologies/details/en-shiftjis)

Discogs uses UTF-8

MusicBrainz uses UTF-8
https://wiki.musicbrainz.org/History:Character_Encodings
https://picard.musicbrainz.org/docs/options/

FreeDB uses US-ASCII, ISO-8859-1 and UTF-8
http://ftp.freedb.org/pub/freedb/latest/DBFORMAT

My Testing in Windows 10:
I opened a playlist with 18 songs in Notepad, 7 valid songs with non-Latin characters in them, 7 valid songs, and 4 nonexistent songs and saved it in ANSI, UTF-8, UTF-16 Big Endian, UTF-16 Little Endian, and UTF-8 with BOM (Byte Order Mark) which writes some characters at the beginning of the file so it should be recognized as UTF-8. I then opened each of the playlists in Jaikoz. Ideally 14 should be loaded. ANSI loaded 14, UTF-8 7, UTF-16 Big Endian 0, UTF-16 Little Endian 0, UTF-8 BOM 7, m3u8 UTF-8 14. This agrees with your stated internal encoding.

I opened the UTF-8 playlist with Notepad and tried to save it as ANSI (this is actually a recommended way), but it warned that characters couldn’t be converted. It wrote an empty file. I tried to convert it using iconv to ISO-8859-1, it warned that the conversion failed and it saved the lines up to the line with a non-Latin character.You cannot convert to a smaller character set unless all the characters exist in that set.

  1. If I simply change the extension to m3u8, that is a workaround I can use, but what about the rest of your customers who don’t know of this issue and fail to recognize that all of the songs didn’t load and edit most, but not all of the songs they wanted to work on and end up finishing their editing in iTunes? This has been one of the biggest frustrations I have encountered.

  2. The AppleScript compares a listing of the songs in the music folder/directory to a playlist of all songs in the library exported from iTunes/Music. The songs that are in the directory listing, but not the library listing are not in iTunes/Music, they may be leftovers (undeleted old copies of songs), and they may be songs that the user would want to add back into iTunes/Music. Songs that are in the Library, but not the directory are generally going to be in some other directory. The user may want to put them in the music folder so that they are all together. If you edit out the pipe of iconv, the files that have the non-Latin characters will show up as unmatched since you are then comparing a UTF-16 file list against the UTF-8 playlist, those are the songs Jaikoz won’t load from a playlist. The conversion from UTF-16 to UTF-8 always works because I am not using any non-European characters, just “normal” and non-Latin ones.

  3. Now for a recommendation I definitely think you should be using Unicode since ANSI/ISO8859-1 is not the common encoding anymore. I think doing a conversion of the playlist as it is read to UTF-8 and then treating it as it were an m3u8 playlist makes good sense. In MacOS and Linux it requires an iconv pipe to be added to the routine, for Windows add cygwin or use LibIconv, and it’s a little more complicated because you have to bring the routine instead of using the built-in.
    Jaikoz would take a playlist in ISO-8859-1, upconvert to UTF-8 and work perfectly.
    Jaikoz would take a DOS encoded playlist upconvert it to UTF-8 and work perfectly.
    Jaikoz would take a UTF-8 playlist and try to convert it and work perfectly.
    Jaikoz would take a UTF-16 playlist and downconvert it to UTF-8 and work perfectly unless it contained non-European characters which is almost impossible since the websites that provide the data are encoding as UTF-8. Nobody loses functionality.
    Even if you choose not to do the conversion of the playlists as you load them, I think most of your customers would be better off if Jaikoz assumed playlists were UTF-8 instead of ANSI/ISO8859-1 based on the apps they are most likely to use.

You could use UTF-16 but it uses more memory for European languages than UTF-8 does, and Jaikoz uses a lot of memory as it is. Unless you have customers in the Far East I can see no reason to UTF-16, and they would not be harmed by this change since those files would fail today. Java uses UTF-16 internally according to my research, but UTF-16 doesn’t seem to be without some potential issues (https://en.wikipedia.org/wiki/UTF-16, https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows, https://javarevisited.blogspot.com/2015/02/difference-between-utf-8-utf-16-and-utf.html). I can’t believe I just referenced Wikipedia pages. TWICE.

Hi, I’m impressed by the amount of work you have put int this, but I’m not sure why this is such an issue and alot of what you have said is irrelevant as the issue is about reading a playlist file not writing data to files.

Nobody else has ever reported this problem !

Jaikoz uses unicode internally (all Java applications do) and fully supports writing metadata as UTF-8/UTF-16 to the file based on the audio tagging formats. When it reads a playlist file it uses UTF-8 if defined as .m3u8 and ISO-8859-1 if defined as .m3u but the read filenames are then stored in unicode. Your Windows testing using BOM is all very well but irrelevant, the .m3u playlist standard does not say read the files based on BOM at the start etc, it says use the local encoding, which unfortunately is poorly defined and makes no sense if files are transferred.

If you are creating a playlist for filenames known to contain non-ascii chars then you should be using UTF-8 and .m3u8, simple as that.

Jaikoz does not edit or create playlists, it only reads them so your suggestion of upconverting playlists is out of scope.

Regarding .m3u my options are:

  1. Keep things as they are, so only works for ISO-8859-1
  2. Always read as a different format instead e.g UTF-8 so would work for UTF-8 but would now fail for ISO-8859-1
  3. Read file as different format based on what platform Jaikoz is running on, not clearly defined and not guaranteed to work because playlist may be copied from a different machine.
  4. Try UTF-8 and if that fails try a different encoding

You are suggesting 4> , is possible but its quite alot of work and not 100% achievable, it seems to me that the issue is with iTunes initial playlist creation, to be honest I would prefer to spend time working on something that is useful to more customers.

What applications export an m3u playlist encoded as ANSI/ISO-8859-1? Perhaps I should consider using one of them to avoid this issue.

A pipe changes a data stream from a file as it is read, not the file. if instead of reading a file Jaikoz reads the file through the iconv pipe ( | iconv -t UTF-8 ), Jaikoz will think the file is UTF-8 encoded even if it is ISO-8859-1, UTF-16 or something else. inconv is the universal translator from Star Trek. My suggestion to use it was so that you could process a playlist with any encoding so that none of your users would have this issue, rather than fixing it for almost all of your customers which assuming UTF-8 would do.

You can use iconv yourself or simply rename file to solve your issue. It would not be possible or desirable for Jaikoz to rely on 3rd party product like iconv which is not available for all platforms and may have licensing issues and is non java.

I have raised an enhancement request for future consideration - https://jthink.atlassian.net/browse/JAIKOZ-1247