WebEdition Scheduled Backup Index runs longtime

bauer · 11 February 2020 12:03

Hi,

currently I have the following problem that during backup the indexing is taking a long time for one snapshot.

Note that the first backup indexing of the problematic snapshot has been finished after ~10 minutes.

Running backup command from /.duplicacy-web/repositories/drobo/2 to back up /archiv
Options: [-log backup -storage B2-duplicacy-europe -stats]
2020-01-29 06:24:23.029 INFO REPOSITORY_SET Repository set to /archiv
2020-01-29 06:24:23.030 INFO STORAGE_SET Storage set to b2://...
2020-01-29 06:24:25.834 INFO BACKUP_START No previous backup found
2020-01-29 06:24:25.834 INFO BACKUP_INDEXING Indexing /archiv
2020-01-29 06:24:25.834 INFO SNAPSHOT_FILTER Parsing filter file .duplicacy-web/repositories/drobo/2/.duplicacy/filters
2020-01-29 06:24:25.835 INFO SNAPSHOT_FILTER Loaded 0 include/exclude pattern(s)
2020-01-29 06:32:12.967 INFO LIST_ATTRIBUTES Discarding file attributes
2020-01-29 06:41:00.325 INFO INCOMPLETE_LOAD Incomplete snapshot loaded from .duplicacy-web/repositories/drobo/2/.duplicacy/incomplete

The second, third backup was aborted by me after ~12 hours.

Running backup command from .duplicacy-web/repositories/drobo/2 to back up /archiv
Options: [-log backup -storage B2-duplicacy-europe -stats]
2020-02-10 21:04:59.148 INFO REPOSITORY_SET Repository set to /archiv
2020-02-10 21:04:59.148 INFO STORAGE_SET Storage set to b2://...
2020-02-10 21:26:02.994 INFO BACKUP_START Last backup at revision 1 found
2020-02-10 21:26:02.999 INFO BACKUP_INDEXING Indexing /archiv
2020-02-10 21:26:03.107 INFO SNAPSHOT_FILTER Parsing filter file /.duplicacy-web/repositories/drobo/2/.duplicacy/filters
2020-02-10 21:26:03.256 INFO SNAPSHOT_FILTER Loaded 0 include/exclude pattern(s)
2020-02-10 21:26:03.604 INFO LIST_ATTRIBUTES Discarding file attributes
Duplicacy was aborted

Is there any way to debug this behavior for example a detailed log to se what is happening?

Droolio · 11 February 2020 12:14

I don’t have an explanation for your issue, but the newly-released Web Edition v1.2.0 now supports global options such as -v or -d.

At first glance, I don’t think the problem is indexing, because that log line appears after the 20+ minute delay.

bauer · 11 February 2020 12:32

Hi Droolio,

thank you for the fast answer. Currently I have started the backup with gloabal -d again. Thank you for the information. “Indexing” is the message showing in the progress bar the whole time. It is possible that this results in a false assumption on my side.

bauer · 14 February 2020 08:40

Hi,

I have further informations about the run with the debug options.

The runtime of the job was longer than one day

2020-02-13 19:35:55.965 INFO BACKUP_END Backup for .../archiv at revision 3 completed
2020-02-13 19:35:56.042 INFO BACKUP_STATS Files: 1580925 total, 485,057M bytes; 0 new, 0 bytes
2020-02-13 19:35:56.043 INFO BACKUP_STATS File chunks: 97854 total, 485,057M bytes; 0 new, 0 bytes, 0 bytes uploaded
2020-02-13 19:35:56.044 INFO BACKUP_STATS Metadata chunks: 112 total, 553,871K bytes; 0 new, 0 bytes, 0 bytes uploaded
2020-02-13 19:35:56.044 INFO BACKUP_STATS All chunks: 97966 total, 485,598M bytes; 0 new, 0 bytes, 0 bytes uploaded
2020-02-13 19:35:56.044 INFO BACKUP_STATS Total running time: 1 day 11:10:45

The backup source has no changed files so there is nothing to upload.
If I take a closer look in the log file I can see gaps of multiple hours. For example after the last LIST_ENTRIES. (The last folder is empty).

2020-02-12 12:05:22.784 DEBUG LIST_ENTRIES Listing virtualbox/20110210/sbs2011/Snapshots/
2020-02-12 16:32:16.456 DEBUG CHUNK_CACHE Skipped snapshot chunk 6551cf0275a6415fad0024b531ef84fd1dc46d6063b723f7eeef01dbeedd1892 in cache

The whole process of skipping chunks needs at least 24 hours.

2020-02-13 19:30:55.699 DEBUG CHUNK_CACHE Skipped snapshot chunk 5aad28c68debad875df37d8b5a93a0a2809bc36b7eefbe6f7ecda8010a6b902f in cache
2020-02-13 19:30:58.596 DEBUG UPLOAD_FILE_CACHE Saved file snapshots/nas_archiv/3 to the snapshot cache

The next backup job in the schedule from the same source (different directory) which is larger with more chunks needs only 1:30

Running backup command from .duplicacy-web/repositories/drobo/3 to back up ...Shares/backup
Options: [-log backup -storage B2-duplicacy-europe -stats]
2020-02-13 19:35:57.976 INFO REPOSITORY_SET Repository set to Shares/backup
2020-02-13 19:35:57.976 INFO STORAGE_SET Storage set to b2://....
2020-02-13 19:40:02.274 INFO BACKUP_START Last backup at revision 3 found
2020-02-13 19:40:02.280 INFO BACKUP_INDEXING Indexing Shares/backup
2020-02-13 19:40:02.290 INFO SNAPSHOT_FILTER Parsing filter file Shares/backup/duplicacy/.duplicacy-web/repositories/drobo/3/.duplicacy/filters
2020-02-13 19:40:02.407 INFO SNAPSHOT_FILTER Loaded 11 include/exclude pattern(s)
2020-02-13 19:40:02.448 INFO LIST_ATTRIBUTES Discarding file attributes
2020-02-13 20:57:34.626 INFO BACKUP_END Backup for Shares/backup at revision 4 completed
2020-02-13 20:57:34.795 INFO BACKUP_STATS Files: 813120 total, 657,743M bytes; 0 new, 0 bytes
2020-02-13 20:57:34.795 INFO BACKUP_STATS File chunks: 134758 total, 657,743M bytes; 0 new, 0 bytes, 0 bytes uploaded
2020-02-13 20:57:34.795 INFO BACKUP_STATS Metadata chunks: 61 total, 314,335K bytes; 0 new, 0 bytes, 0 bytes uploaded
2020-02-13 20:57:34.795 INFO BACKUP_STATS All chunks: 134819 total, 658,050M bytes; 0 new, 0 bytes, 0 bytes uploaded
2020-02-13 20:57:34.795 INFO BACKUP_STATS Total running time: 01:21:33

The question is waht is the difference between the two backup jobs?
What is happening in during the gaps in the log file?
And as a further hint the WebUI is still showing “Indexing” the whole time.

This behaviour is reproducible. This was the second try.

Droolio · 14 February 2020 14:02

Your first job mentions VirtualBox - do you have large virtual machine images within that repository? Such images will trigger a rehash if even the VM was just switched on and off again. It’s also recommended to backup such images to a separate Duplicacy storage with fixed size chunks.

Even then, it’ll take longer to backup multi-hundred-MB repositories where most of the data effectively ‘changes’, as it has to rehash it all. De-duplication still works fine, and chunks will be skipped, it just has to sift through a lot more data. For that reason, I run my VM backups manually.

However, I believe you’ll see a benefit to using -threads for both your backups.

bauer · 14 February 2020 14:42

Thank you for your response.

The first job has only old images which are not active. The whole directory of this job is only for archiving purposes and has no active data living in there. The directory left unchanged for months. (not mounted by other machines). As you can see in the summary of the first there where no data uploaded.

gchen · 14 February 2020 16:56

I guess that is because the directory to be backed up is on a networked share, so the speed really depends on the network condition. The faster run was still quite slow, for a backup of 657 GB.

bauer · 14 February 2020 17:32

No both directories are on the same hard disk and duplicacy is running on the same machine where the discs are installed. That is the cause why I am wondering about the differences in speed.

The Machine is a Drobo 5N SAN. I looked for memory consumption and other things. I haven’t seen anything on the machine during the run what could result in the difference. For example no memory swapping occurred.

gchen · 14 February 2020 18:18

Can you run the second job (.duplicacy-web/repositories/drobo/3) again with -d and send me the full log? It appears to spend all the time processing the snapshot so I believe it is memory related.

bauer · 18 February 2020 18:35

Hi,

sorry for the delay in my response. I was on holiday. I have send the log file by e-mail.

Thanks,
Martin

gchen · 26 February 2020 20:49

From the log it still looks like a memory issue to me. The Skipped snapshot chunk messages indicates Duplicacy was trying to upload snapshot chunks but all of which already existed in the storage. These messages spanned over 30 minutes, which is the time to create and upload the new snapshot. With enough memory this step shouldn’t take this long.

bauer · 12 March 2020 14:50

Hi,

thank you for your help.

Now I have divided the backup into four parts. The backup is now much faster than before. Therefore it could be possible that memory is not sufficient.