fix: add retry to network settings update #262

JEETDESAI25 · 2025-12-04T01:57:03Z

What kind of change does this PR introduce?

Adds retry handling for network settings updates.

What is the current behavior?

Closes #239. Applying supabase_settings immediately after supabase_project frequently fails with HTTP 500 (error adding pooler tenant…) because the pooler tenant isn’t fully provisioned yet; users must rerun manually.

What is the new behavior?

updateNetworkConfig now wraps the API call in retry.RetryContext, retrying transient 500s for up to 5 minutes with debug logging on each retry.
Added TestAccSettingsResource_NetworkRetry, which mocks a 500→201 flow to validate the retry path end-to-end.

Additional context

Retry window is five minutes to give the pooler tenant time to finish provisioning; the issue noted one-minute sleeps were no longer sufficient, and other Terraform providers use 2–5 minute waits for similar propagation delays.

…ork-retry # Conflicts: # go.mod

savme

Thanks so much for the PR, @JEETDESAI25! This definitely solves the core issue. I think we should tweak the approach slightly, because retrying on any 500 might capture unrelated errors we probably don't want to retry.

We might be better off verifying that the project is in the ACTIVE state before moving forward with any related resources (in both createProject and updateProject). That way we'd avoid broad retries and still solve the underlying problem.

Happy to chat through the details if you'd like - just let me know!

JEETDESAI25 · 2025-12-11T20:25:45Z

@savme Thanks for the review! You’re right about the blanket 500 retry and it could hide unrelated errors. My plan is to add a waitForProjectActive helper that polls GET /v1/projects/{ref} until the status is ACTIVE_HEALTHY or ACTIVE_UNHEALTHY, call it at the end of createProject and after updateInstanceSize and then remove the retry from updateNetworkConfig. That matches the current 5‑minute window but makes the readiness check explicit. Does this sound good?

savme · 2025-12-12T16:20:12Z

Yeah, this is pretty much exactly what I was thinking @JEETDESAI25 👍

…ork-retry

JEETDESAI25 · 2025-12-12T22:36:08Z

Thank you for guiding me, really appreciate your feedback. Please let me know if there's anything else I can adjust. @savme

savme · 2025-12-18T14:00:08Z

internal/provider/project_resource.go

+const projectActiveTimeout = 5 * time.Minute
+
+func waitForProjectActive(ctx context.Context, projectRef string, client *api.ClientWithResponses) diag.Diagnostics {
+	err := retry.RetryContext(ctx, projectActiveTimeout, func() *retry.RetryError {


I'm wondering if the more specific WaitForStateContext would work better here?

Now, we use StateChangeConf.WaitForStateContext with explicit Pending/Target states. Also moved the helper to utils.go so both project_resource and settings_resource can share it.

savme · 2025-12-18T14:09:41Z

internal/provider/project_resource.go

+		})
+
+		switch status {
+		case api.V1ProjectWithDatabaseResponseStatusACTIVEHEALTHY, api.V1ProjectWithDatabaseResponseStatusACTIVEUNHEALTHY:


Have you come across a project being in an unhealthy state during testing?

I’m not 100% sure, but I assume that subsequent updates to an unhealthy project would be rejected. If that’s the case, we probably shouldn’t treat this status as a successful update.

You're right, an unhealthy project might reject subsequent updates. Changed to only target ACTIVE_HEALTHY. ACTIVE_UNHEALTHY now stays in the Pending list so it keeps polling until the project becomes fully healthy.

savme · 2025-12-18T14:10:30Z

internal/provider/project_resource.go

+		switch status {
+		case api.V1ProjectWithDatabaseResponseStatusACTIVEHEALTHY, api.V1ProjectWithDatabaseResponseStatusACTIVEUNHEALTHY:
+			return nil
+		case api.V1ProjectWithDatabaseResponseStatusINITFAILED, api.V1ProjectWithDatabaseResponseStatusREMOVED:


Suggested change

case api.V1ProjectWithDatabaseResponseStatusINITFAILED, api.V1ProjectWithDatabaseResponseStatusREMOVED:

case api.V1ProjectWithDatabaseResponseStatusINITFAILED, api.V1ProjectWithDatabaseResponseStatusREMOVED, api.V1ProjectWithDatabaseResponseStatusGOINGDOWN:

Done, added GOING_DOWN also included INACTIVE , PAUSE_FAILED and RESTORE_FAILED since these are terminal states that require operator intervention.

sweatybridge · 2025-12-24T02:28:27Z

internal/provider/utils.go

+var knownProjectStatuses = map[api.V1ProjectWithDatabaseResponseStatus]bool{
+	// Target
+	api.V1ProjectWithDatabaseResponseStatusACTIVEHEALTHY: true,
+	// Pending


lets move this to a separate array for easier reuse

sweatybridge · 2025-12-24T02:28:39Z

internal/provider/utils.go

+
+const projectActiveTimeout = 5 * time.Minute
+
+const statusUnknownTransient = "UNKNOWN_TRANSIENT"


we can remove this custom status

sweatybridge · 2025-12-24T02:29:19Z

internal/provider/utils.go

+			})
+
+			switch httpResp.JSON200.Status {
+			case api.V1ProjectWithDatabaseResponseStatusGOINGDOWN,


replace with a check using terminal states array

sweatybridge · 2025-12-24T02:30:34Z

internal/provider/utils.go

+			if !knownProjectStatuses[httpResp.JSON200.Status] {
+				tflog.Warn(ctx, "Unrecognized project status, treating as transient", map[string]interface{}{
+					"project_ref": projectRef,
+					"status":      status,
+				})
+				return httpResp.JSON200, statusUnknownTransient, nil
+			}
+


Suggested change

if !knownProjectStatuses[httpResp.JSON200.Status] {

tflog.Warn(ctx, "Unrecognized project status, treating as transient", map[string]interface{}{

"project_ref": projectRef,

"status": status,

})

return httpResp.JSON200, statusUnknownTransient, nil

}

we can assume all status returned by api have a corresponding enum value

fix: add retry to network settings update

6968fdb

JEETDESAI25 requested a review from a team as a code owner December 4, 2025 01:57

Merge remote-tracking branch 'upstream/main' into feat/issue-239-netw…

778bc5f

…ork-retry # Conflicts: # go.mod

savme self-requested a review December 10, 2025 07:41

savme reviewed Dec 11, 2025

View reviewed changes

JEETDESAI25 added 2 commits December 12, 2025 12:03

Merge remote-tracking branch 'upstream/main' into feat/issue-239-netw…

d57d50f

…ork-retry

fix: poll for project ACTIVE status after create/resize

e9e5fbf

JEETDESAI25 force-pushed the feat/issue-239-network-retry branch from 43c93fd to e9e5fbf Compare December 12, 2025 22:33

savme reviewed Dec 19, 2025

View reviewed changes

fix: wait for projects to become ACTIVE before settings updates

37fe472

sweatybridge reviewed Dec 24, 2025

View reviewed changes

	case api.V1ProjectWithDatabaseResponseStatusINITFAILED, api.V1ProjectWithDatabaseResponseStatusREMOVED:
	case api.V1ProjectWithDatabaseResponseStatusINITFAILED, api.V1ProjectWithDatabaseResponseStatusREMOVED, api.V1ProjectWithDatabaseResponseStatusGOINGDOWN:


		const projectActiveTimeout = 5 * time.Minute

		const statusUnknownTransient = "UNKNOWN_TRANSIENT"

Uh oh!

fix: add retry to network settings update #262

Are you sure you want to change the base?

fix: add retry to network settings update #262

Uh oh!

Conversation

JEETDESAI25 commented Dec 4, 2025

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior?

Additional context

Uh oh!

savme left a comment

Choose a reason for hiding this comment

Uh oh!

JEETDESAI25 commented Dec 11, 2025

Uh oh!

savme commented Dec 12, 2025

Uh oh!

JEETDESAI25 commented Dec 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants