When deleting zero sized chunks, should I delete every chunk found?

wim-olbright · 20 April 2023 00:17

I have over 1000 of zero sized chunks in storage. Should I just delete every one of them manually or it would be better to run prune -exhaustive? They are all from unfinished backup.

Is there some metadata chunks, or duplicacy’s technical internal chunks, which might have zero size?

lukehmcc · 20 April 2023 01:20

I’d say that it’s always safer to use an internal command like prune as opposed to manually going into the chunks yourself. There is just too much to mess up.

Droolio · 23 April 2023 12:56

Late reply, but wanted to point out… Duplicacy prune - -exhaustive or otherwise - won’t handle zero byte chunks. (It probably should, though there is an edge case where an in-progress backup is running and there’s zero byte partial uploaded chunks during a prune - so maybe only in conjunction with -exhaustive would be safe.)

In this particular instance, manually deleting the zero byte chunks is best, and may reveal that some of your snapshots reference missing chunks (which were corrupted prior) when you next run a check - so it’s better to find out sooner than later.

With Linux, you could do it with find $dir/chunks -size 0 -type f -delete

wim-olbright · 23 April 2023 18:33

Thank you. I use Windows, so my way to find them is a bit complex. I use Everything, which indexing my mounts, and then I filter all the zero sized chunks with “size:=0”.

In my case prune handled zero sized chunks, because corresponding snapshot file wasn’t uploaded yet.

My main question was if it’s safe to delete all zero sized files in “chunks” folder and you seem to answer it.

sevimo · 23 April 2023 21:19

Not sure what you mean by that. All prune is working on is file names, so if these chunks are not referenced, prune will remove them even if they’re broken (zero size), as prune doesn’t need to read non-metadata chunks. Exhaustive will remove all files under /chunks that are not referenced chunk names, even if you put some random files in there (e.g. not chunks at all). If chunks folder happen to have a referenced chunk name then prune indeed won’t remove these, which is true whether these are zero-size or not. Prune is not check, so if it sees referenced chunk name, it will leave it alone; the chunk may be corrupted (e.g. zero size), but prune won’t know about it, that is check’s job.

Don’t know if that’s true for all backends, but normally you shouldn’t see properly named chunks that are partially uploaded, as upload first happens into temp file, which is renamed into a proper chunk name on completion. So exhaustive prune may preemptively dump partially uploaded file from a concurrent upload, but it will only lead to failing on chunk upload as on rename partially uploaded chunk won’t be there. It shouldn’t leave partial chunks that are properly named. But again, I could be wrong here for some backends, I don’t know.

Droolio · 23 April 2023 21:48

This is what I meant by Duplicacy not ‘handling’ zero byte chunks. From the way OP phrased his question, it sounded like he thought it’d remove them because they were zero bytes.

Under normal circumstances, regardless of backend, you shouldn’t end up with 0 byte chunks (or snapshots) that aren’t .tmp files. From experience, this is a bad situation to be in.

Agreed 100% with that but either way, neither check or prune fill fix a storage where you have 0 byte chunks laying about, and it’s best not to assume a normal prune will take care of it - because it very likely wouldn’t. If, after you’ve pruned and checked, and you still have 0 byte files - you should delete them manually, because Duplicacy won’t tell you they’re corrupt unless you attempt to read the chunk content (restore, check -files etc.). This was the point I was making.

system · 3 May 2023 21:48

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.