Webdav Indexing / Listing chunks

stefan1 · 1 June 2020 20:30

Hi there,

I have a short question:

My webdav storage provider limits the number of files per folder. So I added a nesting file to extend the write-level from 1 to 2:

{
    "read-levels": [1, 2],
    "write-level": 2
}

My problem at this is, that by starting a backup it first list all chunks. But for this it retrieves the folder list one at a time. In my case a folder each 1-2 seconds. If I calculated correctly, this should take about 18 hours. In this time nothing is done than listing.

I already set -threads 10 but it seems to has nothing to do with the listing process itself. Is there any way to speed things up?

Thank you very much!

Example output from log file during the process:

2020-06-01 22:28:40.113 TRACE LIST_FILES Listing chunks/6d/2a/
2020-06-01 22:28:41.069 TRACE LIST_FILES Listing chunks/6d/09/
2020-06-01 22:28:42.003 TRACE LIST_FILES Listing chunks/6d/05/
2020-06-01 22:28:42.940 TRACE LIST_FILES Listing chunks/6d/45/
2020-06-01 22:28:43.854 TRACE LIST_FILES Listing chunks/6d/14/
2020-06-01 22:28:44.767 TRACE LIST_FILES Listing chunks/6d/0b/
2020-06-01 22:28:45.613 TRACE LIST_FILES Listing chunks/6d/66/
2020-06-01 22:28:46.510 TRACE LIST_FILES Listing chunks/6d/3e/
2020-06-01 22:28:47.460 TRACE LIST_FILES Listing chunks/6d/dc/
2020-06-01 22:28:48.341 TRACE LIST_FILES Listing chunks/6d/bd/
2020-06-01 22:28:49.281 TRACE LIST_FILES Listing chunks/6d/a2/
2020-06-01 22:28:50.196 TRACE LIST_FILES Listing chunks/6d/d5/
2020-06-01 22:28:51.080 TRACE LIST_FILES Listing chunks/6d/29/
2020-06-01 22:28:51.932 TRACE LIST_FILES Listing chunks/6d/be/
2020-06-01 22:28:52.871 TRACE LIST_FILES Listing chunks/6d/62/
2020-06-01 22:28:53.707 TRACE LIST_FILES Listing chunks/6d/fe/
2020-06-01 22:28:54.644 TRACE LIST_FILES Listing chunks/6d/1a/
2020-06-01 22:28:55.564 TRACE LIST_FILES Listing chunks/6d/0c/
2020-06-01 22:28:56.468 TRACE LIST_FILES Listing chunks/6d/d6/
2020-06-01 22:28:57.321 TRACE LIST_FILES Listing chunks/6d/bc/
2020-06-01 22:28:58.224 TRACE LIST_FILES Listing chunks/6d/f3/
2020-06-01 22:28:59.102 TRACE LIST_FILES Listing chunks/6d/38/
2020-06-01 22:28:59.979 TRACE LIST_FILES Listing chunks/6d/20/
2020-06-01 22:29:00.883 TRACE LIST_FILES Listing chunks/6d/a6/
2020-06-01 22:29:01.762 TRACE LIST_FILES Listing chunks/6d/3a/
2020-06-01 22:29:02.641 TRACE LIST_FILES Listing chunks/6d/9b/
2020-06-01 22:29:03.499 TRACE LIST_FILES Listing chunks/6d/73/
2020-06-01 22:29:04.419 TRACE LIST_FILES Listing chunks/6d/4f/
2020-06-01 22:29:05.337 TRACE LIST_FILES Listing chunks/6d/0e/
2020-06-01 22:29:06.205 TRACE LIST_FILES Listing chunks/6d/93/

stefan1 · 2 June 2020 06:59

Additional Info:
It took 8,5 hours. Seems still too long for me

(Or could this be a one time thing because this backup hasn’t finished yet? I needed to pause it because of system update.)

gchen · 2 June 2020 14:35

It is possible to pass a different depth parameter to the webdav server to list files under the entire folder at once. I’ll be working on a fix.

For a backup job this only happens with an initial backup. A subsequent backup will not list the existing chunks. But, a check or prune job may still need to list all chunks so the fix is needed.

Just curious, which storage provide is this and what is the maximum number of files per folder?

stefan1 · 2 June 2020 16:53

Thank your for your fast reply. That sounds great!

I am using OpenDrive (https://www.opendrive.com/).

They provide a webdav access, but it seems a little bit strange all in all. I read a lot of it on the Internet. What I know so far is:

not all special chars are supported (only read about it)
some special chars are rewritten so they seem to be the same for a human, but not for the machine (only read about it)
there is a maximum file limit inside a folder of 25’000 (it returned me an error using duplicati like: 403 exceeded 25000 files in folder)
no clue about maximum file size
do not support SSLv3 only TLS1.2 or TLS1.1 - some programs like duplicaty are having troubles with that even if you force the usage of TLS1.2

As I calculated with an avarage filesize of 2mb, the limit would be around 12TB (I am new to duplicacy so please tell me if I am wrong - 255 folders * 2mb * 25’000 files/folder = 12,1…TB). As I do not want to hit that limit I activated the nesting.

stefan1 · 5 June 2020 13:04

Do you have any idea when this feature is going to be implemented in a new version? The reason why I am asking is that the backup was aborted due to a ssl verification error (old ssl cert hit the end date and duplicacy has not recognized that there was a new valid one). Now as there are ~2,5TB of files it is searching now since 15 hours.

(After 15,5 hours it started to copy.)

stefan1 · 14 June 2020 13:44

@gchen I switched my storage provider. The webdav using OpenDrive seemed to have some troubles. Some files where uploaded but “temporary not available” said the webinterface of the cloud hoster. So I canceled my subscription as a backup should always be available and switched to OneDrive.

(So no need for me at this point any more.)

gchen · 15 June 2020 15:40

I actually tried to set Depth: inifinity in the http header which is supposed to list the remote directory recursively. Unfortunately, this parameter isn’t supported by all webdav providers.

If this issue keeps coming backup for other users, I’ll add a multi-threaded option for the webdav backup to list the chunks directory using multiple threads.

Droolio · 15 June 2020 23:28

Please please look into other back-ends as well - particular SFTP and ~~Google Drive~~ - if this is feasible. Thanks.

Edit: Nevermind! Just saw your other post. Awesome.