Duplicacy should respect 429 responses (Dropbox)

bhigh · 2 April 2020 23:27

Please describe what you are doing to trigger the bug:
Backup or copy with multiple threads

Please describe what you expect to happen (but doesn’t):
The backup or copy should succeed,

Please describe what actually happens (the wrong behaviour):
The backup fails with a “too many write operations” error.

Dropbox will respond with a 429 response code and Retry-After header. Sleeping for that many of seconds and retrying will allow the operation to succeed.

bhigh · 5 April 2020 03:11

I’ve sent a pull request with a very simple fix. I’ve been using it for a few days and it will pause and resume for 429 errors.

It could be better when working with multiple threads by disallowing simultaneous writers, but for now it will prevent a Too Many Requests error from causing a failure.

zaheer22 · 1 December 2021 00:48

recently came upon this software, and setup Dropbox. I noticed this still seems to be an issue almost 2 years later. When I set dropbox threads to 4 in the web GUI, an initial backup kept failing, until I dropped it all the way down to 2.

I saw many log entries like this:

2021-11-29 15:24:19.230 INFO DROPBOX_RETRY POST content.dropboxapi.c**/2/files/upload returned 429; retrying after 0.50 seconds
2021-11-29 15:24:22.762 INFO DROPBOX_RETRY POST content.dropboxapi.c**/2/files/upload returned 429; retrying after 0.50 seconds

As well as this:

content.dropboxapi.c**/2/files/upload: dial tcp: lookup content.dropboxapi.com: no such host; retrying after 0.75 seconds

It seems like the 2nd error appears after several hundred of the RETRY attempts. this seems to be a logic error of some kind, as 2 threads does seem to work.

zaheer22 · 1 December 2021 00:58

I would also note, that the “no such host” error seems to be an action that appears after not respecting the 429? As when I initially looked up this error it seems to be DNS related, but I tested several DNS servers successfully looking up this address as well as checked the DNS logs and ensured the lookups were indeed successful. Once I dropped the threads to two no such errors occurred. Is there any plans to get this issue resolved?

saspus · 1 December 2021 01:20

OneDrive and DropBox are very sensitive to the api load, they shall be used with 1 thread. This avoids theses issues altogether.

zaheer22 · 1 December 2021 01:26

I was able to work successfully with 2 threads. Is this a known design issue? or is it documented anywhere that 1 thread is a limitation?

saspus · 1 December 2021 01:42

Technically, using those file sharing/collaborations services for bulk storage is abuse: these services are not optimized for this access pattern; design decisions there were driven by totally different requirements. Whether the api rate limiting is done by necessity to maintain quality of service or as a deterrent — does not matter, we can’t control that.

When you access dedicated bulk storage services (Amazon S3 for example) you usually pay for api calls separately. Make a lot of calls — pay a lot. With *box/ *Drive services api calls are free — and there is no incentive for the company to optimize for this nonstandard use-case to let you get more of free stuff. You already pay them fixed cost — why help you use more resources? As long as their client works correctly — they are good.

Some services are more tolerant than others. It was my personal observations — both onedrive and Dropbox own clients talk to their services in one thread and both services happen to be very picky when more is used. 2 maybe ok, but why push it? I’m glad that the singe thread api calls aren’t further throttled.

Google drive seems to work fine with 4, I did not test more; but I still set it to one, as a courtesy.

bhigh · 21 December 2021 23:10

I haven’t looked at the code since April 2020 so things might have changed.

The dial tcp: lookup content.dropboxapi.com: no such host error is probably caused by running our of file handles. It looks like the response body is not being closed when a 429 or 5xx response is encountered. I’ll send a pull request with a fix this week.

bhigh · 21 December 2021 23:26

I worked as an SRE supporting Magic Pocket at Dropbox, and I can tell you that it’s much better suited for bulk storage than lots read access to limited sets of files.

A paid Dropbox account is not subject to API rate limits, and it’s not that expensive for what it gets you.

I may look into using batch uploads with DBX which should address rate limit concerns in a majority of cases. That will also require me to add it to the client, and I haven’t been using golang much in the last 18 months.

saspus · 22 December 2021 22:29

This is actually a great point, since this is an in-house developed storage system it could have been optimized for any use; however it’s hard to imagine that when the compromise needed to be made in the dropbox api implementation the preference would not be given to the path more aligned to the intended use, as opposed to S3 replacement.

In other words, I don’t doubt that magic pockets is a very efficient block storage service; however the dropbox API that is exposed to the user is not a block storage api; it’s file sharing and collaboration api, optimized for that, as opposed to serving raw block storage.

Did this change recently (in the last 2-3 years)? Or maybe this only applies to business/team account? I had paid storage upgrade on my DropBox account and it was throttling just the same.

That would be really useful! Another useful usecase is speeding up files enumeration (which is part of all workflows)

The benefit of go is that it’s very learnable Ive been developing in C++ most of my life, and when I had to write a small utility I picked go just for fun - and it’s been awesome. It’s a rather delightful language to use, where you don’t have to fight it, and instead it helps you

bhigh · 24 December 2021 01:29

MP stores data in 4MiB blocks. (4MiB was chosen since that was the average size of a smartphone picture at the time.) Files that are larger are split, and smaller files are tail-packed until it reaches 4MiB. This is done in a staging area where the more costly operations are not performed. Blocks are written to multiple spindles across machines to improve durability.

These blobs are compressed and md5 and SHA1 hashed, and the hashes are used for deduplication. Then the blocks are put into larger container and written to disk. The filesystem overlay and metadata are stored separately. The ingestion pipeline has a few other steps to maintain durability without bottlenecking.

I don’t remember if the tail-packed blocks are compressed and hashed as part of the 4MiB block, or if each fragment gets its own.

Since the data is deduped at the block level there’s not much cost (to DBX) if users upload lots of copies of the same thing. The downside of this is that after a block has gone through the ingestion pipeline it is only stored on one physical spindle in two different geographic regions. There is (or was) no facility for migrating busy blocks from erasure coding to multiple copies to improve read performance. There is some caching in the access pipeline from the user-facing API to the backend, but its hit rate is limited by its small size relative to the volume of data in the system.

There is a migration to cold storage that happens after a period of time where it’s split and stored in two locations with an XOR in a third. How we optimized Magic Pocket for cold storage - Dropbox

This is all from memory, but there’s a good write up at Inside the Magic Pocket - Dropbox, although details are glossed over. As of March 2020 they were using LRC Erasure Coding rather than Reed-Solomon. I think it was 12-2-2 (12 data disks in two groups of 6. Each group has a single local parity disk, with two global parity disks covering the 12 data disks) There was some investigation looking at a migration to 14-2-2 but I don’t know if that ever took place.

If you look at the documentation for the /upload_session/append you’ll recognize this - Concurrent upload sessions must use a 4MiB offset, except for the last piece.

the dropbox API that is exposed to the user is not a block storage api

It is an object store under the covers and if you take the implementation into account you can use it as such, you just need to keep your object size <= 4MiB. I was trying to build an object store interface for internal use but didn’t get much traction. Due to the deduplication and lack of spindle diversity it wouldn’t have had acceptable performance.

Or maybe this only applies to business/team account

I had nothing to do with the business side of things. Looking at the site, the Standard and Advanced business accounts get 1 billion api calls/mo, but you can pay for more too.

Another useful usecase is speeding up files enumeration

The existing client already uses /list_folders and /list_folders/continue for reading directories.

/get_file_metadata/batch might speed up fetching file info, but I think the implementation of GetFileInfo will need to be updated to allow for batching. Fetching metadata is probably another significant cause of 429 errors.

The benefit of go is that it’s very learnable

It’s not about learning it. I actually landed diffs in the internal version of /x/sync/singlefilght while at Google. Specifically adding the dups member to call, since I needed to know how many of our calls were being deduplicated. It’s small, but it’s something! That was in 2015 or 2016 before the library was made public.

I’ve been using a lot of python asyncio lately which isn’t terribly different in concept, but the syntax differences trip me up. I find it easier to switch between python and c++.

Icydog · 15 March 2023 05:05

The dial tcp: lookup content.dropboxapi.com: no such host error is probably caused by running our of file handles. It looks like the response body is not being closed when a 429 or 5xx response is encountered. I’ll send a pull request with a fix this week.

Ah, I wish you had gotten around to that. Would have saved me a day! Cannot “check -chunks” — socket: too many open files - #2 by Icydog