Issues with unicode filenames on macOS Safari in Web Edition

EDIT: The issue is actually with Safari, not macOS. See my comment below.
EDIT2: Updated the subject line to include Safari.

Just did an initial backup on a Big Sur mac. Some filenames are in Swedish so they have Å, Ä and Ö characters.

The filenames look correct in the logs, but if I try to restore from the web UI there are two issues:

  1. If I try to open a folder in the restore browser with a non-ASCII character in the name, I just get a popup saying “Invalid path”.
  2. If I select a file with a non-ASCII character in the name and click restore, it runs but nothing actually gets restored. As you can see from the log below, it just silently skips the file. This only happens when I select a unicode filename.

2021-09-03 19:38:10.321 INFO SNAPSHOT_FILTER Loaded 0 include/exclude pattern(s)
2021-09-03 19:38:11.574 INFO RESTORE_START Restoring /Users/p/Documents to revision 1
2021-09-03 19:38:11.578 INFO RESTORE_END Restored /Users/p/Documents to revision 1
2021-09-03 19:38:11.578 INFO RESTORE_STATS Files: 0 total, 0 bytes
2021-09-03 19:38:11.578 INFO RESTORE_STATS Downloaded 0 file, 0 bytes, 0 chunks
2021-09-03 19:38:11.578 INFO RESTORE_STATS Skipped 0 file, 0 bytes
2021-09-03 19:38:11.578 INFO RESTORE_STATS Total running time: 00:00:01

However, restoring these files from CLI seems to work.

Tried a restore from Web UI on a Windows PC and to my surprise it worked there!

Which got me thinking. I went back to the mac and tried Firefox instead of Safari, and now it worked there too. So the actual issue is Safari support.

1 Like

I couldn’t reproduce this bug. My Safari version is 14.0.3. What is yours?

Version 14.1.2 (16611.

I ran Wireshark on the requests to list_restore_directory when I click on a folder name. And I found a difference in UTF8 encoding between Safari and Firefox.

The character “ä” is two bytes in Safari (c3 a4) but three bytes in FF (61 cc 88). Both are valid of course, but Safari uses the single code point for ä while FF uses two code points (a+ ̈).

The encoding in FF matches the actual filename on the filesystem.

I don’t know Go, but if you compare these strings on a byte level they will not match, but if you compare them as unicode/utf8 strings, they should match.

I found out that it is the combination of Safari and HFS that cause this strange behavior. Safari is known to normalize POST data to the composed form (NFC), but HFS stores filenames in the decomposed form (NFD).

On the other hand, APFS stores files in the NFC form so this issue doesn’t happen if you’re running macOS Sierra or higher.

I’m not sure if this issue should be fixed. Of course we could trap the “Invalid path” error and try again with the NFD form of the path, but I feel this is not a clean solution.

Unfortunately I don’t think it’s that easy. I’m on the latest macOS (Big Sur) and the drive is APFS.

Looking into this issue now, I believe that HFS+ normalizes filenames to NFD while APFS doesn’t normalize at all. And when I create a directory/file in Finder or the standard macOS save file panel then it uses NFD (for legacy reasons I assume). However, if I create a file or directory from zsh then it uses NFC. So APFS will apparently accept both and doesn’t care either way.

I should also note that it’s only the Restore dialog that has a problem with this in Duplicacy. I can go into NFD folders with no issue in the Restore To dialog, and the Folder dialog when creating a new backup.

In other words, list_restore_directory chokes but list_local_directory does not.

I was wrong to say that APFS always normalizes to NFC. It does look that APFS accepts both forms so this bug can happen with APFS if the file/directory to be restored is encoded in NFC.

The reason why list_local_directory works is because there appears to be another layer that automatically checks for filenames in both forms according to this talk:

The relevant part is:

So we’re also introducing a new runtime normalization mechanism, and the runtime normalization will automatically convert between the NFC content versus the NFD for the purposes of file comparison, being able to do lookups.

If it doesn’t find one, it will automatically look up with the other to make sure that your app doesn’t receive an ENOENT [phonetic] error back from the file system.