Review on using GoogleDriveFS as local storage and some problems I need help with

wim-olbright · 3 April 2023 01:59

I used gdfs as described in this post. It worked smoothly, and the speed was great, until gdfs reported some abstract errors and some files appeared in “lost_and_found” folder under C:\…\Local\DriveFS

I checked and in the folder were 52 chunks. I noticed that 21 of them were uploaded correctly, but 31 were missing on remote. I decided to leave them as is.

I started new initial backup with changed ID directly to google drive (“gcd://” as storage). It seemed to upload exactly 31 chunks according to the log file.

Then I found some zero size files with .tmp extensions on GDrive remote. I assumed it was partially uploaded chunks. The names of files (before extensions) didn’t have any duplicates anywhere.

I started check -chunks and it ran for half a day and returned this error:

"ERROR DOWNLOAD_CHUNK Failed to download the chunk chunkname: stream error: stream ID 90737; INTERNAL_ERROR; received from peer*

There is no such chunk uploaded on GDrive and or under the lost and found folder.

I restarted the check again and it reported that all chunks were successfully verified.

My questions:

Does check -chunks compare uploaded chunks with the actual corresponding files in repository? Does it use checksum?
Does check take into account filters file under .duplicacy folder?
Can I do anything else to be completely sure that the backup is solid?
Is it necessary?

Thank you gentlemen.

Droolio · 5 April 2023 16:47

Not files, no. Just validates existence of and integrity of referenced chunks. Checksum yes.

No, that’s only for backup.

Personally, I abide by the 3-2-1 rule, so having a local copy to sync to/from is super useful. One thing you could do - perhaps once a year - is download from GD to a local storage. The act of using the copy command effectively tests chunk validity, and is a good alternative to -chunks if you have the disk space.

Testing backups? Absolutely! Though in your situation it might’ve been quicker to just do a simple check without -chunks, just to make sure everything was properly referenced and exists. Follow up with a check -files, a restore, or copy as desired. Good to do every once in a while.

wim-olbright · 6 April 2023 06:03

Hey, thank you for response, it’s very helpful.
Just to make clear, couple of questions.

Does it make sense, for the sake of complete peace of mind, to run new initial backup after first initial backup, with new ID? Lets say BC2 after BC1.

As far as I understand, is calculating checksums from chunks, and if this process is deterministic, then BC2 should be the same as BC1. Same chunks, same sizes, everything (even with encryption enabled?). And it should be skipped very quickly.

So if there are some new (not metadata related) chunks in BC2, then there were some mistakes in reading or writing of BC1. Am I wrong? Assuming that no new files were added to repository.

In my particular case I need to upload some of the files and delete from hard drive after. I should’ve probably used rclone for that, but it doesn’t make sense to switch now. So that’s why I’m so over-killing this problem.

Droolio · 6 April 2023 13:41

That’s certainly the case.

Though in terms of validating the integrity of the existing chunks - it won’t quite do that. It’ll see the chunk ID (filename) on the destination, and skip it without ever having read the contents. So sure, changing ID can fill missing chunks, but won’t fix corrupt chunks (which, to be fair, is unlikely on GCD - but always test once).

Only a check -chunks, check -files, restore, or copy -from will read chunk data.

Again, it’s always good to have more than one (actually 2) copies on top of your original data.