Odd Sizing Issue

I have a little over 2TB to back up. Size on backup is 3.6TB. Huh?
I’ve tried pruning with exclusive and it barely makes a dent?

You may want to run prune -exhaustive, to get rid of any potentially orphaned chunks.

If still no dice —Then that’s the difference between versions. Perhaps you are backing up some high turnover data.

You can run check with -stats argument to see statistics

I don’t need a ton of versions here. As long as each file has 1 or two versions it’s good. It’s all music. I had run prune -a -exhaustive. Standard prune is -keep 0:45 -keep 7:30 -keep 1:7 -a which might be even too much. Wanna keep the backup size close to the actual size.

Music is immutable. If you create multiple versions of files that don’t change, storage size does not increase: all data is already on the storage, only the file manifest will be uploaded.

In other words, regardless of number of version of static files backed up the backup size will be very close to the source size

So, something else is at play here.

What is your backup destination? One possibility I can think of is if the target is on a filesystem with massive sector size. Duplicacy creates a lot of small files, 4MiB on average, and there could be a lot of overhead.

Destination is a NAS, dedicated folder.

Ah, plot thickens. What filesystem is on the volume the backup is located on?

Can you ssh to the nas, and run this:

du -hd 0 /path/to/duplicacy/datastore
du -hd 0 --apparent-size /path/to/duplicacy/datastore

to see the difference between size of data vs size it takes on disk.

And you are sure that there is no possibility that you have backed up some files other than immutable media files?

du -sh was showing 3.6 TB
-hd 0 shows 3.6
–apparent-size is not recognized.

Try -A instead of --apparent-size.

But yeah, if du shows 3.6 – then that’s what you have there.

Did duplicacy check -stats reveal anything interesting?

-A still shows 3.6 T
Check -stats shows up as not initialized? I’m primarily using the web interface, so I am within the backed up folder where the cmdline normally would work.

Web interface sizes look normal for that specific backup job.

Prune -a -exhaustive is still running though.

IIRC backup command is run form the /0/,/1/, etc sub path, buck check, prune and others from /all/ sub path.

Or you can create a schedule, untick all days, andd the -stats flag, and launch the schedule manually.

Let’s see if it finds and removes orphans once it’s done.

Finished and nothing cleared :frowning:

Great, that there was no orphans – means it all is working correctly.

So, the size difference is then due to some data that has either been backed up in one of the backup runs and sits there, or maybe it’s being constantly picked up. e.g. perhaps some hidden folder with cache data, or other transient stuff.

Run the duplciacy check -stats or check -tabular, it will show you among other things amount of new data between snapshots.

If you only backup media, the amount of new data should only increase after you actually add new data.

You can also run list -files to see all the files that have been backed up. Confirm that these are only the files you have intended to have backed up.

Let’s see where it goes! Thanks. Running and waiting!

Total chunk size shows 2414G but size on disk shows 3.4TB. Huh?

Ok, so the 2.4G seems more or less reasonable, so we need to figure out where does this extra terabyte go:)

Since we ruled out the orphaned chunks (-exhaustive did not change anything) and large sector size on the filesystem (actual size is close to apparent size as reported by du), I see two possibilities:

  1. Filesystem corruption on your nas. Check if you can run some sort of diagnostic/repair tool, like chkdsk. Is the disk you backup to part of the array or a USB disk? What filesystem is there,

  2. Perhaps some other data got moved in into the duplicacy folder? Accidental drag and drop? stuff like that? You can check if there are any extraneous files visually or with a shell script to find files that does not match specific pattern.

    find -E /path/to/duplicacy/folder/chunks  -type f ! -iregex '.*/[0-9a-f]{62}$'
    

    Explanation:

    • -E turns on extended regex
    • -type f looks only for files
    • ! negates the condition, i.e. find files that don’t match the patten
    • -iregex looks for case-insensitive regular expression
    • '.*\/[0-9a-f]{62}$' matches exactly:
    • .* sequence of arbitrary characters of arbitrary length
    • / followed by the path separator
    • followed by exactly 62 digits or letters ‘a’ to ‘f’
    • followed by the end of string.

    This will find and print any files in the chunks folder that don’t look like duplciacy chunks. This works on FreeBSD, and should also work on linux, but I don’t have access to one right now to check.

I would not be able to check for corruption since this is a cloud-based storage box.

For reference, the Duplicacy instance only has access to the folder it is in. The find command kicked back.

$ > find -E chunks -type f ! -iregex ‘.*/[0-9a-f]{62}$’
Command not found. Use ‘help’ to get a list of available commands.

This is weird. You can check what shell are they giving you with echo $SHELL. But even sh has “find”.

haha, yea. No go on echo or find.

To note, this folder is not used for anything other than this specific Duplicacy instance and is not accessed by any other user or process, so it would be weird if something were out of place.

Indeed. If you can’t even use any tools – how could you copy there anything…

Since duplciacy thinks that all chunks are accounted for, and are of reasonable size, and there are no unaccounted chunks – it must be something else. Maybe some internal “lost+found” folder, something server side? Can you ask the provider to send you the list of files in your folder? I’m all out of other ideas…