How to find all the corrupted chunks at once using the `check` command?

15218256637 · 10 April 2025 16:00

To locate all the corrupted chunks at once, I ran the command duplicacy check -persist -chunks, which ultimately returned “10 out of 436782 chunks are corrupted.” However, it did not display the filenames of all the corrupted chunks. Does this mean I have to print out the logs and manually search through them one by one to identify these corrupted chunks? That seems far too time-consuming.

saspus · 10 April 2025 17:08

Or you can use tools like grep. Or text editors and/or viewers with search and filter functionality, like BBedit or Console app on macOS. It’s no different than parsing any other log file – you never want to read it in its entirety, but you do want to filter interesting bits.

What storage backend are you using that corrupts data left and right? I would consider moving to more reliable storage.

15218256637 · 10 April 2025 23:30

Thank you for your assistance. I am using Alibaba Cloud Drive, which is mapped to my local hard drive via the WebDAV protocol. After troubleshooting, I found that the corrupted chunks were caused by network issues during the download process. I have now re-downloaded the files, and the issue has been resolved.

saspus · 10 April 2025 23:45

Let me make sure I understand: You have third party solution that “mounts” AliBaba Drive, via WebDav, and presents it as a virtual filesystem, and you have duplicacy backup into that filesystem? Is that correct?

If so, this is a recipe for a guaranteed data loss: duplicacy can no longer ensure that data has been uploaded intact. It only knows that data was written to virtual filesystem, or, more precisely, that the filesystem told it it was written; at that point it likely hasn’t reached the target intact, and there is no way for duplciacy to ensure integrity.

As a bare minimum, remove the virtual filesystem from the picture. Backup directly to Alibaba via WebDav endpoint in duplicacy.

Next, I would get rid of WebDav protocol in the first place. It is designed for document exchange, it’s not robust enough for handling of millions of files. If you can – switch to Alibaba Cloud – I believe it supports S3 protocol properly.

15218256637 · 11 April 2025 11:31

Your understanding is correct. I use the open-source tool alist to map Alibaba Cloud Drive to my local hard drive, and then I back up the data to a virtual file system using duplicacy. Since there’s an additional third-party tool (alist) in the process, duplicacy cannot guarantee whether the backup files are properly uploaded to the cloud drive.

I previously tried backing up directly via duplicacy using the WebDAV protocol, but I kept encountering the error: ‘GET config’ returned status code 302. As a result, I had to resort to this workaround of mapping the WebDAV as a local disk.

Using the WebDAV protocol for duplicacy backups is not the best choice. During the backup process, I’ve encountered several issues with missing chunks, which forced me to create new backup IDs to fix the problem. It’s quite cumbersome. If my budget allows in the future, I plan to switch to using Alibaba Cloud OSS for duplicacy backups.

However, the issue this time was not caused by WebDAV itself. The backup data was downloaded via the Alibaba Cloud Drive client, and then I ran a check chunks operation. It turned out that some of the downloaded chunks were corrupted. After re-downloading and replacing the damaged chunks, the check no longer flagged any issues. This confirms that the corruption occurred during the download process