What happened in this backup?

gregeeh · 23 February 2025 03:20

Hi all,

Herre is my scheduled Sunday USB Backup:

Here is part of the backup report:

Back up /share/Backup@localhost to storage USB-Backup, status: Completed, log file: backup-20250223-010139.log

--------------------------------------------------------------------------------------------------------------

Running backup command from /.duplicacy-web/repositories/localhost/1 to back up /share/Backup

Options: [-log backup -storage USB-Backup -threads 4 -stats]

2025-02-23 01:01:39.111 INFO REPOSITORY_SET Repository set to /share/Backup

2025-02-23 01:01:39.112 INFO STORAGE_SET Storage set to /srv/dev-disk-by-uuid-2dc66f1b-ee1e-4473-b188-3ac4e6ac9ab6

2025-02-23 01:01:39.243 INFO BACKUP_START Last backup at revision 294 found

2025-02-23 01:01:39.553 INFO BACKUP_INDEXING Indexing /share/Backup

2025-02-23 01:01:39.553 INFO SNAPSHOT_FILTER Parsing filter file /.duplicacy-web/repositories/localhost/1/.duplicacy/filters

2025-02-23 01:01:39.553 INFO SNAPSHOT_FILTER Loaded 1 include/exclude pattern(s)

2025-02-23 02:25:50.089 INFO UPLOAD_FILE Uploaded GH Phone/Internal Storage/DCIM/Camera/PXL_20250218_215103985.jpg (1802055)

2025-02-23 02:25:50.089 INFO UPLOAD_FILE Uploaded GH Phone/Internal Storage/DCIM/Camera/PXL_20250218_215123336.jpg (1474299)

2025-02-23 02:25:50.089 INFO UPLOAD_FILE Uploaded GH Phone/Internal Storage/DCIM/Camera/PXL_20250218_215137823.jpg (1410309)

What happened between 01:01:39.553 and 02:25:50.089?

Whatever was happening was using a lot of CPU Resources.

TIA

saspus · 23 February 2025 08:50

Each job backs up to a different snapshot ID, correct?

Enumerating changes, generating chunks. You can add global flag -v or -d to get more logging if you want.

gregeeh · 23 February 2025 19:22

@saspus - thanks for your reply.So no way of find out what was happening after the backup, without the global flags.

Thanks again.

Droolio · 24 February 2025 18:36

The bit before 02:25 is the actual backup - the files get chunked and the chunks get uploaded, that’s pretty much it. The ‘INFO UPLOAD_FILE’ lines from 02:25 onward is just a report of which files were uploaded.

gregeeh · 25 February 2025 19:14

Thanks for explaining that as I assumed, wrongly, they were the times the files were uploaded.

saspus · 25 February 2025 19:50

If that’s the case – it’s a bad user experience and needs to be fixed.

Along with the other ux bug that lies “Deleting chunk” while pruning, when in reality it’s just being collected for deletion that happens in the end.

Droolio · 26 February 2025 13:06

What? Why? This is not a “UX” thing, it’s a LOG… you don’t want the log filled up with thousands of useless pack, chunking, and cache nonsense when doing a simple backup without -v|-d. It produces a list of files which were backed up, which is all you need to know. Nothing to fix.

saspus · 26 February 2025 15:23

Logging files as they are processed, as opposed to in a blob all at once in the end, would have been significantly friendlier. Including for dry-run scenarios.

Yep, it’s not about this.

I need to know sooner and not have to wait till the end. There is no reason to delay showing the available information to the user.

Droolio · 26 February 2025 16:24

Literally no difference between dry-run or not wrt to logs. You can always do -v if you’re fussed, but it’s chunks that are uploaded during this stage, not files. Files aren’t uploaded / backed up until the snapshot has been saved… like, at the very end. There really is nothing to fix here.

saspus · 26 February 2025 17:04

Never said there was.

Why should I?

Hard disagree. Duplicacy takes my files and uploads them to the storage. One by one. Chunking and skipped transfers for duplicate chunks is an implementation details, user should not care about.

That’s true. And yet, prune in the same scenario behaves differently: prints which revisions would be deleted before actually deleting them … “at the very end”. So whichever camp you are in — duplicacy behaves differently in different operations. One of them needs to be fixed.

Besides the aforementioned lack of parity with prune behavior, the glaring issue is an app sitting for hours with no indication of what’s it is doing. Listing files one by one as it processes them would fill the awkward silence here.

You don’t think it’s a problem but the mere existence of this topic means this is a problem for some users. Including myself. I’m killing apps that consume CPU with no evidence of doing anything

Droolio · 26 February 2025 17:54

You’ve just undermined your own argument here. Make up your mind, either the user cares or they don’t.

The fact is, chunking - which is what it’s actually doing at this time - is hidden from the logs by default. They have the option of -v if they care. I feel I need to repeat: it’s NOT uploading files, it’s uploading chunks. Reporting it as such, before it’s had a chance to fail, would cause even more confusion IMO.

I would much prefer a definitive backup report at the very end, not intermingled with other log lines.

Edit: I’ll also add, this summary comes by way of the -stats flag, which is always enabled by the GUI. If your backups have a tendency of being aborted for some reason, you don’t really want your logs filled with junk too.

We’ve already debated about prunes tendancy to break things if aborted, so yes that needs fixing. Chunks are deleted before revision files, which should be reversed. Logs have nothing to do with this though, and randomly killing processes seems pretty dumb if you ask me.

saspus · 26 February 2025 18:34

I think it was obvious, but explicitly:

users dont’ care about implementation details, such as progress of chunks upload, or their existence in the first place.
users do care about the progress of the application handling their data. It’s also useful at the configuration state – to see if some files from an accidentally included folder start ~~uploading~~ processing, to abort, without waiting for the ~~upload~~ processing of entire dataset to complete.
And lastly, users do care to know what is application is doing at any time. Sitting chewing CPU with no indication of how long will that be happing and what is going on - is asking for uninstall. Battery is a limited resource.

How so?

Processing Documents/file1.txt
Processing Documents/file2.txt
…

What is so confusing about it?

Prune does this in exact this way you dislike when collecting revisions to delete (albeit with a misleading text “deleting” as opposed to “collected”). So, let’s fix prune logging then, let it also sit silent for hours?

They do. “prune” logs “Deleting snapshot X” while collecting snapshots to delete. It’s the exact same thing. “backup” could log “Processing file X” while doing backup.

Why? It’s not random, it for cause.

If I don’t know what is the process doing and for how log it will be doing it – I’m killing it. I think it’s very prudent approach. Again, battery life is a limited resource, and I’m not wasting it on apps that can’t even articulate what are they doing while gorging on it.

yes, this is a separate bug entirely, orthogonal to the need of logging what’s going on.

To condense the disagreement here as I see it – I believe users have the right to know what is the app is doing for any non-trivial amount of time, or when any non-trivial amounts of resources are consumed. You think that users shall just obediently wait for the app to finish in indeterminate amount of time while consuming limited resources, because app knows better.

Droolio · 26 February 2025 20:09

Then they don’t have to use -v, duh.

Then they can pay attention to the progress bar in the GUI, or use -v if they like looking at logs or watching paint dry.

The confusion arises when the user assumes files were uploaded when the backup job is aborted.

I’m not disputing prune logging behaviour here - AFAIC, it deletes snapshots and chunks in the wrong order anyway, so the whole logging situation is moot. Prune also doesn’t deal with files, so its verbosity is especially irrelevant as far as what the user can do with that info.

Most processes don’t have a log. If you know a process belonging to a backup application is consuming CPU, and you kill it just because, that’s pretty dumb yes.

If you need to shutdown the computer, need the bandwidth, or whatever, feel free to kill it - but you don’t need logs for that. Even with -v, that doesn’t tell you ‘how long it’ll take’ because the complete index of changed files isn’t logged beforehand (unless using -d).

So unless you want massively bloated logs, they’re of no help to the user. Besides, never have I seen Duplicacy ‘stuck’ - while consuming CPU - so this dreamed up scenario makes no sense.

Correct. And if they want to know more, they’re free to use -v. The default shall be clutter-free logs.

Same with similar such tools. You don’t get a list of files being transferred with Rclone either, without using -v. Remember, the file report at the end in Duplicacy doesn’t come from -v, it comes from -stats.

saspus · 26 February 2025 21:01

Ah, that’s what I was missing. Then yes, it’s a correct behaviour, nothing to fix.

Still, none of the duplicacy flags result in a list of files processed being logged, not even -d. That’s a separate issue then.

demo

% touch meow{20..30}{10..50}.txt
% duplicacy -d backup -stats
Storage set to /tmp/2
Chunk read levels: [1], write level: 1
Compression level: 100
Average chunk size: 4194304
Maximum chunk size: 16777216
Minimum chunk size: 1048576
Chunk seed: 6475706c6963616379
Hash key: 6475706c6963616379
ID key: 6475706c6963616379
top: /tmp/1, quick: true, tag:
Downloading latest revision for snapshot ddd
Listing revisions for snapshot ddd
Downloaded file snapshots/ddd/6
Last backup at revision 6 found
Chunk ba1e216d4b7fd6d034019fd6ac7cea35c63227b10ce71a516e1359d0576386e2 has been loaded from the snapshot cache
Chunk 3bd1332085ddf729fee230ee55b3772a034aceae063b274c97b0f32d5f35178d has been loaded from the snapshot cache
Chunk ba1e216d4b7fd6d034019fd6ac7cea35c63227b10ce71a516e1359d0576386e2 has been loaded from the snapshot cache
Indexing /tmp/1
Parsing filter file /tmp/1/.duplicacy/filters
There are 0 compiled regular expressions stored
Loaded 0 include/exclude pattern(s)
Listing
Chunk 15f383e593b8bf660e7844fd94d3a9b184099cb3ffc46b36690454a0dc826bf1 has been downloaded
Skipped metadata chunk ba1e216d4b7fd6d034019fd6ac7cea35c63227b10ce71a516e1359d0576386e2 in cache
Skipped metadata chunk 3bd1332085ddf729fee230ee55b3772a034aceae063b274c97b0f32d5f35178d in cache
Chunk 0aae60bebc8f838b9b509e08bf750902e2f5b8fb9340d74a6e2d897ef9a4dc00 has been uploaded
Uploaded file snapshots/ddd/7
Chunk ba1e216d4b7fd6d034019fd6ac7cea35c63227b10ce71a516e1359d0576386e2 has been loaded from the snapshot cache
Listing snapshots/
Listing snapshots/ddd/
Delete cached snapshot file ddd/6 not found in the storage
Listing chunks/
Listing chunks/ba/
Listing chunks/8d/
Listing chunks/7e/
Listing chunks/78/
Listing chunks/45/
Listing chunks/3b/
Listing chunks/15/
Delete chunk 15f383e593b8bf660e7844fd94d3a9b184099cb3ffc46b36690454a0dc826bf1 from the snapshot cache
Backup for /tmp/1 at revision 7 completed
Files: 1414 total, 0 bytes; 0 new, 0 bytes
File chunks: 0 total, 0 bytes; 0 new, 0 bytes, 0 bytes uploaded
Metadata chunks: 3 total, 38K bytes; 1 new, 38K bytes, 7K bytes uploaded
All chunks: 3 total, 38K bytes; 1 new, 38K bytes, 7K bytes uploaded
Total running time: 00:00:01

How do I see files that should have been picked up?

Good point. Agreed. But it can be mitigated by correct wording. “Processed” as opposed to “Uploaded”

Well, I kill it if I notice it, (and duplicacy can’t throttle, so it always goes full speed, and is therefore very noticeable) and can’t easily see that it isn’t stuck.

Agree in general, but this is different circumstances. WebUI provides ui to show logs from duplicacy CLI. And therefore the expectation is that it shall contain current status. And if logs are empty for three hours – app is stuck. Or has a logging bug.

I don’t share the faith in infallibility of duplicacy. If it sits there chewing through CPU with no indication of progress (well, the progress bar you posted is indeed helpful) it gets murdered.

To go deeper into the woods – I would send a SIGINFO to the process (Control+T on FreeBSD and macOS) and expect current status. Duplicacy ignores SIGINFO. So how am I supposed to know it’s not stuck?

If I’m very persistent I would run spindump (or whatever analogue is on linux) on them to see what’s going on out of the curiosity but I would not expect most users to bother.

Either way, the progress bit of it seems to be addressed in web UI via a progress bar. I don’t know how well does it track actually progress, but it’s there.

Droolio · 26 February 2025 23:27

Aren’t these empty files? TBH I’m not sure if -d is supposed to include -v log lines though…

I mean that’s certainly a wise stance for any app, but there’s no evidence it’s ever got stuck while consuming CPU. I’ve seen it hang on connections (rare) but never eating cycles.

The GUI has this also:

It does it surprisingly well, though I’m not clear on the how exactly…

The one issue I sometimes have with the GUI is that it doesn’t tail the log consistently while the job’s running. Most times it tracks, other times it’s an empty loading page til the backup finishes.

system · 8 March 2025 23:28

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.