Skip to content

Conversation

@reltuk
Copy link
Contributor

@reltuk reltuk commented Jun 3, 2025

The clone code works by listing the remote table files and downloading them into the local table file store. When the remote is a remoteapi implementation, like a DoltHub repository, this resulting in listing the remote table files and using URLs to fetch each of them.

The URLs returned from these APIs can expire and they need to be refreshed. This refresh can happen in two ways:

  1. There is explicit support in the TableFileSource representation returned by the API to include a mechanism to refresh it. DoltHub uses this, and the Dolt client will make use of that support to refresh expired URLs.

  2. The heavy handed approach is to list the table files again and use the newly returned URLs.

The Clone code has explicit support for doing (2), and it is necessary for remoteapi implementations with expiring URLs but without explicit RefreshTableFileUrl support. dolt itself, when running a remote as part of sql-server for example, does not implement RefreshTableFileUrl support, and so the re-list support is still necessary.

This PR changes the Clone implementation so that, on a retry, it makes all the newly returned table file sources available for the next try, but it keeps the old sources around if they no longer come back from ListTableFiles. In this way, we get strictly more robust behavior than before.

The downside is that, when the remote file is actually gone, the Clone code will continue attempting to download it until it reaches a terminal download failure. This change in behavior is not as disruptive as the current behavior, and so we make this new trade off for now.

…emoteapi implementations when the remote Conjoins.

The clone code works by listing the remote table files and downloading them into the local table file store. When the remote is a remoteapi implementation, like a DoltHub repository, this resulting in listing the remote table files and using URLs to fetch each of them.

The URLs returned from these APIs can expire and they need to be refreshed. This refresh can happen in two ways:

1) There is explicit support in the TableFileSource representation returned by the API to include a mechanism to refresh it. DoltHub uses this, and the Dolt client will make use of that support to refresh expired URLs.
2) The heavy handed approach is to list the table files again and use the newly returned URLs.

The Clone code has explicit support for doing #2, and it is necessary for remoteapi implementations with expiring URLs but without explicit RefreshTableFileUrl support. dolt itself, when running a remote as part of sql-server for example, does not implement RefreshTableFileUrl support, and so the re-list support is still necessary.

This PR changes the Clone implementation so that, on a retry, it makes all the newly returned table file sources available for the next try, but it keeps the old sources around if they no longer come back from ListTableFiles. In this way, we get strictly more robust behavior than before.

The downside is that, when the remote file is actually gone, the Clone code will continue attempting to download it until it reaches a terminal download failure. This change in behavior is not as disruptive as the current behavior, and so we make this new trade off for now.
Copy link
Contributor

@coffeegoddd coffeegoddd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
c5a5ec8 ok 5937457
version total_tests
c5a5ec8 5937457
correctness_percentage
100.0

@reltuk reltuk merged commit 9850933 into main Jun 4, 2025
21 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants