go: store/datas/pull: clone.go: Improve robust of Clone for certain remoteapi implementations when the remote Conjoins. #9306
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The clone code works by listing the remote table files and downloading them into the local table file store. When the remote is a remoteapi implementation, like a DoltHub repository, this resulting in listing the remote table files and using URLs to fetch each of them.
The URLs returned from these APIs can expire and they need to be refreshed. This refresh can happen in two ways:
There is explicit support in the TableFileSource representation returned by the API to include a mechanism to refresh it. DoltHub uses this, and the Dolt client will make use of that support to refresh expired URLs.
The heavy handed approach is to list the table files again and use the newly returned URLs.
The Clone code has explicit support for doing (2), and it is necessary for remoteapi implementations with expiring URLs but without explicit RefreshTableFileUrl support. dolt itself, when running a remote as part of sql-server for example, does not implement RefreshTableFileUrl support, and so the re-list support is still necessary.
This PR changes the Clone implementation so that, on a retry, it makes all the newly returned table file sources available for the next try, but it keeps the old sources around if they no longer come back from ListTableFiles. In this way, we get strictly more robust behavior than before.
The downside is that, when the remote file is actually gone, the Clone code will continue attempting to download it until it reaches a terminal download failure. This change in behavior is not as disruptive as the current behavior, and so we make this new trade off for now.