Using VSS when backing up files (Windows)

It seems like when a file is open on Word, PowerPoint, etc. the file is locked and thus cannot be backed up. However, Cloudberry backs up those files using VSS, and Google Drive also backs up those files (not sure what trick it uses tho).

This may not be a really prioritized feature, but would be really nice if Duplicacy could back up locked files too.

VSS is supported: Backup command details. Just add the -vss option when running a backup (you’ll need to run as administrator).

I’m using duplicacy web. Can I set it to run as administrator on boot?

1 Like

I’d like to know this as well. I found Backup with VSS failing which indicates the installer can install for current user or all users, but for me, duplicacy_web_installer_win64_1.5.0.exe doesn’t actually prompt and installs/runs as the current user by default.

Any suggestions?

Edit: Ah, I had to start the installer as Administrator, then the option to install as a service showed up. The problem now is that even with this done, I’m still seeing permission issues in the log, like:

2021-11-26 01:35:52.074 WARN LIST_FAILURE Failed to list subdirectory: open \\?\C:\ProgramData\.duplicacy-web\repositories\localhost\0\.duplicacy\shadow\Users\Artem Russakovskii\AppData\Local\Temp\_MEI249922: Access is denied.
2021-11-26 01:35:56.228 WARN LIST_FAILURE Failed to list subdirectory: open \\?\C:\ProgramData\.duplicacy-web\repositories\localhost\0\.duplicacy\shadow\Users\Artem Russakovskii\Application Data: Access is denied.
2021-11-26 01:35:56.230 WARN LIST_FAILURE Failed to list subdirectory: open \\?\C:\ProgramData\.duplicacy-web\repositories\localhost\0\.duplicacy\shadow\Users\Artem Russakovskii\Cookies: Access is denied.
2021-11-26 01:36:09.944 WARN LIST_FAILURE Failed to list subdirectory: open \\?\C:\ProgramData\.duplicacy-web\repositories\localhost\0\.duplicacy\shadow\Users\Artem Russakovskii\Local Settings: Access is denied.
2021-11-26 01:36:09.946 WARN LIST_FAILURE Failed to list subdirectory: open \\?\C:\ProgramData\.duplicacy-web\repositories\localhost\0\.duplicacy\shadow\Users\Artem Russakovskii\My Documents: Access is denied.
2021-11-26 01:36:09.947 WARN LIST_FAILURE Failed to list subdirectory: open \\?\C:\ProgramData\.duplicacy-web\repositories\localhost\0\.duplicacy\shadow\Users\Artem Russakovskii\NetHood: Access is denied.
2021-11-26 01:36:09.956 WARN LIST_FAILURE Failed to list subdirectory: open \\?\C:\ProgramData\.duplicacy-web\repositories\localhost\0\.duplicacy\shadow\Users\Artem Russakovskii\PrintHood: Access is denied.
2021-11-26 01:36:09.956 WARN LIST_FAILURE Failed to list subdirectory: open \\?\C:\ProgramData\.duplicacy-web\repositories\localhost\0\.duplicacy\shadow\Users\Artem Russakovskii\Recent: Access is denied.
2021-11-26 01:36:09.966 WARN LIST_FAILURE Failed to list subdirectory: open \\?\C:\ProgramData\.duplicacy-web\repositories\localhost\0\.duplicacy\shadow\Users\Artem Russakovskii\SendTo: Access is denied.
2021-11-26 01:36:09.966 WARN LIST_FAILURE Failed to list subdirectory: open \\?\C:\ProgramData\.duplicacy-web\repositories\localhost\0\.duplicacy\shadow\Users\Artem Russakovskii\Start Menu: Access is denied.
2021-11-26 01:36:09.966 WARN LIST_FAILURE Failed to list subdirectory: open \\?\C:\ProgramData\.duplicacy-web\repositories\localhost\0\.duplicacy\shadow\Users\Artem Russakovskii\Templates: Access is denied.

Delete everything under that folder.

Ensure only one instance of duplicacy, the one that is installed as administrator, is running.

Give access to the user duplicacy is running as (System? Administrator?) read access to your files.

Edit. Actually, never mind that. Those are symlinks and reparse points. Ignore them, or add to exclusion list to avoid seeing the warnings.

Thanks, looks like those are all warnings and I can indeed either ignore them or just not pay attention to them in the logs.

However, it’s been several days, and there are still no successful backups.

Upon examining the log file, inexplicably, one of the “cannot access” lines is an error and not a warning, and I presume this kills the whole backup. Why, what makes it so special that it has to abort? Is this a bug, @gchen?

2021-11-29 04:03:32.735 WARN OPEN_FAILURE Failed to open file for reading: open \\?\C:\Users\Artem Russakovskii\AppData\Roaming\Slack\Cache\data_2: The process cannot access the file because it is being used by another process.
2021-11-29 04:03:32.735 WARN OPEN_FAILURE Failed to open file for reading: open \\?\C:\Users\Artem Russakovskii\AppData\Roaming\Slack\Cache\data_3: The process cannot access the file because it is being used by another process.
2021-11-29 04:03:35.731 WARN OPEN_FAILURE Failed to open file for reading: open \\?\C:\Users\Artem Russakovskii\AppData\Roaming\Slack\GPUCache\data_1: The process cannot access the file because it is being used by another process.
2021-11-29 04:03:53.086 ERROR CHUNK_MAKER Failed to read 0 bytes: read \\?\C:\Users\Artem Russakovskii\AppData\Roaming\Telegram Desktop\tdata\user_data\cache\0\binlog: The process cannot access the file because another process has locked a portion of the file.
2021-11-29 04:03:57.151 INFO INCOMPLETE_SAVE Incomplete snapshot saved to C:\ProgramData\.duplicacy-web\repositories\localhost\0/.duplicacy/incomplete
Failed to read 0 bytes: read \\?\C:\Users\Artem Russakovskii\AppData\Roaming\Telegram Desktop\tdata\user_data\cache\0\binlog: The process cannot access the file because another process has locked a portion of the file.

I understand I can probably just ignore that directory, but that makes me nervous that a random read error will result in complete backup failure in the future, and I’d have to keep babysitting the exclusion list rather than set and forget.

Does the log actually ends with that? I.e. it aborts the backup after the failure to read a file? Then it’s a bug. Summoning @gchen

Regardless, you seem to be backing up a lot of transient stuff. At least add

-*/AppData/*Cache*/

to your exclusion patterns – this should address worst offenders

Unfortunately, many so called “application developers” should not be allowed nowhere near a computer – including whoever hacked together Slack and Telegram, as evident from your log. Putting caches and transient data into a user’s roaming profile is an equivalent of spiting to their customer faces and demonstrates sheer ignorance of OS conventions, lack of desire to make a quality product, and complete disregard of customer needs.

This is not even the worst example – I have seen some developers dump gigabytes of gpu shader caches into Documents folder that is often synced to OneDrive. it wasn’t a mom and pop shop either, it was a rather large game franchise.

Anyway, because of such a desperate state of affairs in windows world (its’ a bit better on macOS but not by much) I would exclude contents of AppData by default, and explicitly add specific things that you positively do want to backup from there (because same crappy software developers may decide to keep important user data under, say AppData/Local, even though it is not what it was intended for. But why read the documentation when you need to ship crap asap?).

This is unavoidable unfortunately: even though it’s easy to blanket select everything – backing up tons of transient crap increases data turnover, bandwidth requirements, and storage costs, and reduced battery life, let alone reliability of your backup, when important data is stuck in the queue waiting for temporary shader caches to get uploaded…

Yes, that’s indeed the end of the log, so it must be a bug then.

Regarding the rest of your rant, I agree, but products with low friction that “just work” are very valuable, and getting Duplicacy to this state would be very good for the product.

In my case, I have unlimited bandwidth, storage costs aren’t a concern, data turnover isn’t much of a concern due to my rotation policy, battery life isn’t a concern since it’s a desktop, and the backup should still complete relatively quickly.

For example, I set up Crashplan to back up my PC years ago and I don’t really need to worry about babysitting it. The only reasons I’m looking into alternatives to Crashplan and as an active user of duplicacy on my servers, are cost, memory and CPU footprints, and restore speed. I’d like Duplicacy to become the best backup solution out there.

1 Like

I agree, no doubt about that. It would be extremely valuable if duplicacy maintained list of known default exclusions for each OS (just like Crashplan you mentioned and pretty much any other backup program ( Arq, Kopia, even Backblaze Personal) evolved to do).

On macOS it is already accomplished by TM extended attribute; there it is indeed working as you describe: set to backup everything and it will do the right thing. I’m using it on macOS and don’t bother with filters, other than Logs/Caches and .Trash, which must be excluded manually still…

On windows and other OSes unfortunately there is no way to do that other than via centralized exclusion list or by peppering your filesystem with .nobackup markers.

Anyway, this is a candidate for another feature request.

If a file cannot be opened Duplicacy will skip the file. However, if a file can be opened but can’t be read then Duplicacy will simply quit. In this case it is caused by a poorly-implemented VSS writer, but it can also indicate a bigger problem like disk corruption. So I think it is better to just give up the backup.

Respectfully, I disagree, at least in this case. There’s no disk corruption - it’s a binlog file, presumably used by the database Telegram uses, so it must be a case that either isn’t handled by VSS, Duplicacy, or something else. Telegram is one of the most popular messengers on the planet, so it would make sense to not abort the entire backup here as it’ll be happening to many users.

Furthermore, I think any sort of read issues should be treated as skips with warnings, just like they’re already treated for all the other cases listed. It should be on the user to deal with real potential corruption, not Duplicacy. Duplicacy should skip and move on.

1 Like

Why can’t Duplicacy just skip the problematic file (regardless of the underlying problem), log the error/warning, and continue backing up the rest of the repository that can be read and in need of backing up?

1 Like

Fully agree here with @fisowiw784 and @archon810.

It could be also, as the message seems to indicate, that portion of the file is locked (e.g. memory mapped), as part of absolutely legitimate operation.

Let’s think about it this way: There are only two answers to the question “Can duplicacy read this file atomically and successfully in its entirety?”:

  • YES: Pack it up, move to the next file.
  • NO: Skip it, make a note, move to the next file.

The reasons why exactly couldn’t it read it are irrelevant. Maybe the disk has bad sector. Maybe file is open in exclusive mode by other program. Maybe something unexpected is going on that ultimately results in file read failure. Duplicacy shall make a note of the incident, and continue.

Having a bad file planted between my important document shall never preclude backup of said documents.

On the flakiness of VSS: it happens all the time. On some Windows machines, the file system gets corrupted routinely. Does not matter, duplicacy must do its best to suck in as much of the user data as possible.

1 Like

I’m still not convinced that read errors should be skipped. Do you report an error, or merely a warning, at the end of the backup? If you report an error, then you still need to exclude these files anyway in order to silence the error.

Just a warning: i.e.

3882 new and changed files backed up
12 files skipped due to read errors.

Skipping few files does not make backup a failure. In fact it’s almost expected when used without VSS.

In other words it’s hard to imagine that just because some files failed user decides that they don’t want other files backed up anymore.

In yet other words: backup software must do its best to protect as much data as possible. This implies not giving up on minor failures. Lose few fights but win the war.

1 Like

What happens with VSS enabled and it then fails? Should Duplicacy plough on and potentially encounter more failures and create an empty snapshot? Making the next incremental take eons? That behaviour has already been reported by users, even recently.

And then it keeps happening coz there’s a b0rked VSS writer or stuck service (often resolved by a reboot IME) and the user is non-the-wiser coz all their backups complete and yet are empty. :grimacing:

Rather it aborted and retried again at the next schedule…

No difference.

Yes, and it should keep skipping failures and saving successes.

If everything fails — yes. That’s the state of the filesystem now. Backup program does not get to decide what and when to backup. It shall keep trying.

That would not be the case. Duplicacy does not make incremental backups based on previous one, they are incremental on entire dataset.

Duplicacy could warn the user when the number of files picked up or failed changes drastically between backup sets.

Why would this be any different? If vss is screwed — it’s screwed. User needs to repair filesystem. And next backup will be attempted anyway when time comes; this does not justify giving up on the current one.

This can’t be always be known, because Duplicacy may not see the files due to the error.

The SnapRAID helper script I use to schedule syncs and scrubs has a protection measure where, a customisable threshold can be configured - e.g. 300 - to detect during the pre-diff run if that many files were deleted, and to abort if exceeded.

Often I have to manually bump it up when curating the array, but it’s a pita to deal with in a backup program. Much much simpler to abort and retry later. As a user, I want to know as soon as major errors occur, not after my backup program glosses over them.

That’s not true.

Duplicacy indexes the fs and iterates through it, comparing against the list of files made during the last backup to determine what should be hashed and chunked. Normally this is new or modified files but the whole lot can be forced with -hash.

An empty previous snapshot would be the same as adding all the same data again, in a different directory or whatever. Or the same as a backup with -hash. Because all that data is ‘new’ according to Duplicacy.

And that’s great. If it is only possible to read 0 files from the file system now – that’s what this filesystem is now. It must be backed up as filesystem with zero files. It’s accurate. Do you also expect backup program to fix file system, re-flow ram and order you new UPS when the last one can’t switch reliably?

Duplicacy may, as a courtesy, report to the user when the number of files in the data set changed drastically – like OneDrive notifies users when it notices that many files are deleted in bulk. But it’s just that – a courtesy notification.

Incremental, as in “no data will be transferred to the target when files re-appear”.

This is correct. Data went away and now some data reappeared. Maybe old, maybe new. Duplicacy can’t know. It shall re-hash. This is correct behavior.

After a filesystem crash, trying to optimize rehashing time is the least of users’ worries. They need to salvage data and replace the drive, and need to have their backup to be up to date. It would be infuriating to find that duplicacy did not pick up the document they were editing for the past last couple of hours because the disk sector holding some temporary trash from the browser cache rotted and duplicacy “gave up”.

Edit:
Duplicacy also may, if VSS fails, attempt to back up the file(s) direclty without VSS, also with a warning. Knowing what piece of unstable turd VSS on Windows is — this would be a prudent, custoemr friendly approach.

Duplicacy already aborts a backup, with an error, when 0 files are backed up. Quite happy with this arrangement, and no I don’t expect it to fix anything. Just report the error, because it is.

What about when 10% is excluded, or 90%? And Duplicacy happily creates partial backups - potentially forever - without generating an error!

Not relevant to what you said:

Which is technically incorrect, since Duplicacy has to rehash all the data again. Untouched data that was already in a previous backup, but is now disregarded because of an intermediate fs hiccup. That’s simply not correct behaviour.