Duplicacy using 3x storage of source for backup

Apologies if this have been answered before (I searched).

I’m currently evaluating Duplicacy with the latest WebUI version. I configured a backup of my Windows user directory to my fileserver every 15 minutes. Source storage usage is 57GB with 650k files. However, destination file usage (reported through the web UI and using du) is 176GB with 43k files after the first day.

I’m concerned about data storage costs going forward with a 3x size increase, plus revisions. Is there something I have configured improperly, or is this expected storage usage of Duplicacy?

It’s not normal :d: behavior, but i also think you should not backup the whole users directory without having some filters set in place.

All your programs, plus windows creates all sorts of files in there, plus temp files, plus so many other things, so it’s natural that very many files differ by the time 15 minutes pass and the next backup starts.

Here are the filters that i use on my windows machines. Try adding those (ofc, you should check what i have in there since you may want to save some more things compared to me), deleting all the data in the storage and restarting a fresh backup and see how the storage is consumed afterwards.

1 Like

you verified that size with windirstat and not windows right?? i have a server with 3.8 TB and windows reports 1.3 TB

did you run first backup by itself and not on a schedule?

57gb isnt small… what did you send to? external drive or cloud??

whats your upload pipe?

I’ll remove the backup and try it again with tighter filters, which were originally loose because I’d prefer not to accidentally miss backing something important up. For comparison, when Crashplan supported P2P backups, this same backup set was ~150GB on disk with five years of version history.

Edit: I did the backup with your exclude list with slight modifications and am still seeing a roughly 2x space usage from the backup, with 28k chunks. The GUI is reporting storage usage of 98.51GB (du on the fileserver reports 96.4GB).

As this backup set is smaller, this is still not the result I was hoping to see from a single backup with no revisions (I haven’t set the schedule up). Is there a way to see how much data :d: things the source set contains?

What @WordtotheBird says regarding windirstat is probably worth investigating. In all likelyhood, your Windows user profile directory is much bigger than the 57GB Windows is reporting.

My own recommendation would be to run WizTree as an administrator (so it can see all files regardless of ownership).

Also, did you wipe the storage when you recreated the backup job with filters?

Quite frankly, I’d say it’s technically impossible for the storage with just one snapshot to be any larger than the original repository.

Edit: What does the logs say when you run a check scheduled job on that storage? Edit #2: Additionally, what even does the bottom of your backup logs say about how much was backed up?

1 Like

Ah, I misunderstood and thought @WordtotheBird was suggesting to run it on my (Linux) fileserver. Wiztree reports that the whole directory without any excludes is 173GB.

Total wipe and recreate of the storage before I started it again. WebUI reports 1 revision, instead of the ~70 I had had before.

The backup log has this at the end:
BACKUP_STATS Files: 817130 total, 149,868M bytes; 817072 new, 149,252M bytes.2019-07-16 03:10:56.589 INFO BACKUP_STATS File chunks: 30191 total, 149,875M bytes; 7378 new, 38,585M bytes, 24,403M bytes uploaded.2019-07-16 03:10:56.589 INFO BACKUP_STATS Metadata chunks: 62 total, 259,603K bytes; 62 new, 259,603K bytes, 78,809K bytes uploaded.2019-07-16 03:10:56.589 INFO BACKUP_STATS All chunks: 30253 total, 150,128M bytes; 7440 new, 38,838M bytes, 24,480M bytes uploaded.

Check log:
snap | rev | | files | bytes | chunks | bytes | uniq | bytes | new | bytes | [Redacted]_Backup | 1 | @ 2019-07-16 02:01 -hash | 817130 | 149,868M | 27591 | 92,566M | 27591 | 92,566M | 27591 | 92,566M | [Redacted]_Backup | all | | | | 27591 | 92,566M | 27591 | 92,566M | | |

So what this seems to suggest is I was mislead by Windows being bad at reporting how much disk space it’s using coupled with the :d: webUI lacking a way to show how large the backup source is?

it tells you in the first line

the source backup is 149,868 megbytes
the backup to disk is 150,128 megabytes

i am guessing you dont have compression enabled?

I haven’t enabled or disabled compression. Whatever the default is. Given a reported size of 150GB and disk usage of 98GB, it seems like compression may be enabled.

Either way, I think this answers my initial question. :d: needs a way in the webUI to show what the storage usage of the backup source is.

Is there a process to submit a feature request for :d:?

I could be wrong, but I don’t think the number for “File/All chunks” total is an accurate representation of the backup storage.

In fact, some of my own repositories say they’re roughly twice what I know for certain they actually are. Which is slightly puzzling but I think there’s a good explanation somewhere… :stuck_out_tongue:

But when you run a check, it gives all the gory details (reformatted for visibility):

             snap | rev |                          |  files |   bytes  | chunks |   bytes |  uniq |   bytes |   new |   bytes |
[Redacted]_Backup |   1 | @ 2019-07-16 02:01 -hash | 817130 | 149,868M |  27591 | 92,566M | 27591 | 92,566M | 27591 | 92,566M |
[Redacted]_Backup | all |                          |        |          |  27591 | 92,566M | 27591 | 92,566M |       |         |

817,130 files were backed up, amounting to 149,868M. On the storage, there are 27,591 chunks amounting to 92,566M. These check stats become even more illuminating after many backup jobs.

@BobertMcBob - If you haven’t done so already, add -vss to your options for the backup job. Since this is a Windows user profile, many files might be in-use, so activating Volume Shadow Copy will capture more of these files.

BTW, I agree it would be nice to see a rough estimate of the number and total size of files for each repository - in the actual UI. Perhaps it can be listed in the Backup tab, with the stats taken from the last check.

1 Like

well crashplan is a garbage product to begin with… I’m not surprised it managed to exclude half of your profile…

there is really only really one way to use crashplan and thats on Ubuntu… or whatever its called…

it took 67 days to send 2 TB over a 400/20 connection to crashplan…

stupid customer didn’t pay their bill and lost their account… so i created a VM with ubuntu and gave it access to the os and i backed it up with crashplan … took about 8 days

Also if you use the -vss option make sure you use a time out

Yeah, things look much more in line with my expectations now. It also seems like the number for the “File/All chunks” isn’t accurate. At least the overview on the Storage page is within a couple % of the actual usage on disk (100.24GB reported versus 98.98GB actual disk usage from du).

Even an estimate of amount being backed up would have eliminated this confusion. The info must be there in the backend because I see both a time to completion estimate and a transfer rate, so the total amount is known. Somewhere…

1 Like

As another data point, I added a second set of data to be backed up, reported as 207GB using WizTree and consuming 222GB of storage. I’m glad that this was more a UI limitation than any problem.

1 Like