ECH: Move nodes off allocator doc updated #1619

eedugon · 2025-06-05T09:59:53Z

As described in #1527, this PR is promoting a knowledge article into our existing doc, per @kunisen and support team request.

Preview:

Understanding node moves and system maintenance

Changes:

Title updated
Email notifications section fixed (it wasn't valid).
Content of mentioned KB integrated into the doc.

Links to existing KB:

Closes #1527

eedugon · 2025-06-05T12:18:36Z

@kunisen , about the comment you have shared:

But one thing I just noticed, is maybe we could add a "Frequently Asked Questions (FAQs)" sub heading in the page so that readers can understand we included a bunch of FAQs.

The headings are in "Q&A" format style already, but that's something I wasn't sure if it was the right approach, and I wanted to double check that with other docs folks.

I agree if the headings are kept in this Q&A format, then a "Frequently Asked Questions" heading would make all sense, but maybe we rewrite the headers to be in a different format.

cc: @shainaraskas , what would you say?

troubleshoot/monitoring/node-moves-outages.md

jakommo

Left a few small comments, but other than that LGTM!

troubleshoot/monitoring/node-moves-outages.md

shainaraskas · 2025-06-05T15:35:28Z

The headings are in "Q&A" format style already, but that's something I wasn't sure if it was the right approach, and I wanted to double check that with other docs folks.

I agree if the headings are kept in this Q&A format, then a "Frequently Asked Questions" heading would make all sense, but maybe we rewrite the headers to be in a different format.

this isn't really in our style (reasoning) and could be reworked

a couple of them should be removed (e.g. the support CTA), or integrated into the doc ("Could such a system maintenance be avoided or skipped?" should just be introductory information about why this happens and its inevitability)

some could be pulled into an "Availability during system maintenance" section and perhaps "Data loss risk for non-HA deployments"

some of them could be reworded ("How can I be notified when a node is changed?" > "Notifications for moved or changed nodes" [more task-based]).

I do think that if we want to keep these together, they do need a heading of their own so they're not nested below "Possible causes and impact"

Co-authored-by: Stef Nestor <[email protected]>

eedugon · 2025-06-05T19:48:57Z

@shainaraskas : I'll do some rework on this to avoid the FAQ style while keeping all the key points we want to communicate to the users. Thanks a lot for your feedback!

kunisen · 2025-06-06T05:55:22Z

Thanks for being patient and all the help! 🙏

[1]

I made a bunch of updates based on internal ticket comments - https://github.com/elastic/support-tech-lead/issues/1576#issuecomment-2948156720.

Here's the preview:
https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/1619/troubleshoot/monitoring/node-moves-outages

[2]

@eedugon I totally get what you and @shainaraskas said above #1619 (comment). Please feel free to make any updates from docs perspective based on your writing standard.

I still added FAQ heading because if we don't have this, it's logically unbalanced and not ready for being merged.
I know it's not clearing the docs criteria, but in case it takes long or great effort to reorganize the wordings, it's technically and logically ready for merge, which means we could do the merge first, and then think about the wording improvement next.

Again, please feel free to make your change even including the removal of that one.

[3]

Also, I believe it's technically clear now so no longer need to discuss anything further internally. But if still anything is technically unclear or regarding the expectation, let's still discuss it internally ha :)

troubleshoot/monitoring/node-moves-outages.md

eedugon · 2025-06-10T09:43:51Z

@shainaraskas : I've worked on your suggestions and removed the FAQ style. I'm pretty happy with the outcome and final sections / sub-sections, let me know your thoughts.

I also updated some minor paragraphs and added a couple of introductory sentences that felt needed (mainly in performance considerations during system maintenance).

The content is 90% similar to the KB article but I think it reads better and it's organized by topic more than by questions.

@kunisen , please share your thoughts too!

kunisen · 2025-06-10T11:48:37Z

Thanks @eedugon looks nice from my side - https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/1619/troubleshoot/monitoring/node-moves-outages#why-data-loss-can-occur-even-with-multiple-zones tho I am still a bit unfamiliar with this non FAQ way, but let's try it.

Some small things:

Availability during node vacate

WDYT we say "Service availability during node vacate"

Why data loss can occur even with multiple zones

WDYT we say "Data loss risk without replica shards"?

If you are good with it, then I am good to merge :)

eedugon · 2025-06-10T14:54:53Z

@kunisen , very good suggestion, next time feel free to add them directly in the code (as suggestions) and we can discuss them there.

I've applied the changes, thanks a lot!

kunisen · 2025-06-11T12:19:35Z

Thanks @eedugon indeed I will use suggest next time. 🙏

@shainaraskas could you kindly help us double check if we are good to go please?
Once we merge the public doc PR, I will tweak a little of our KB to make it more adaptive to public doc.

Then I think we should be good to go :)

shainaraskas · 2025-06-11T18:05:20Z

troubleshoot/monitoring/node-moves-outages.md


-**What is the impact?**
+This document explains the "`Move nodes off of allocator...`" message that appears on the [activity page](../../deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) in {{ech}} deployments, helping you understand its meaning, implications, and what to expect.


suggest splitting this apart so the error message is in its own codeblock and the full text is present. just put [allocatorname] or something as a placeholder

shainaraskas · 2025-06-11T18:06:01Z

troubleshoot/monitoring/node-moves-outages.md


-During the routine system maintenance, having replicas and multiple availability zones ensures minimal interruption to your service. When nodes are vacated, as long as you have high availability, all search and indexing requests are expected to work within the reduced capacity until the node is back to normal.
+![Move nodes off allocator](images/move_nodes_ech_allocator.jpeg)


I don't think this screenshot adds value if we share the entire error message on the page. it's also very small and hard to read so I'd prefer to skip it

troubleshoot/monitoring/node-moves-outages.md

shainaraskas · 2025-06-11T18:18:52Z

troubleshoot/monitoring/node-moves-outages.md

+::::{admonition} Availability zones and performance
+Increasing the number of zones should not be used to add more resources. The concept of zones is meant for High Availability (2 zones) and Fault Tolerance (3 zones), but neither will work if the cluster relies on the resources from those zones to be operational.
+
+The recommendation is to **scale up the resources within a single zone until the cluster can take the full load (add some buffer to be prepared for a peak of requests)**, then scale out by adding additional zones depending on your requirements: 2 zones for High Availability, 3 zones for Fault Tolerance.
 ::::


be careful about repeating info that's elsewhere - this concept is something we should probably use a snippet for

you should also avoid bolding and brackets generally. "high availability" and "fault tolerance" also do not need Title Case.

"the recommendation is" is not an ideal sentence struture. Try "You should [blank]"

shainaraskas · 2025-06-11T18:20:10Z

troubleshoot/monitoring/node-moves-outages.md

+
+1. Enable [Stack monitoring](/deploy-manage/monitor/stack-monitoring/ece-ech-stack-monitoring.md#enable-logging-and-monitoring-steps) (logs and metrics) on your deployment. Only metrics collection is required for these notifications to work.
+
+In the deployment used as the destination of Stack monitoring:


this needs to be integrated into the list of steps. this should be step 2 and steps 2-4 should be made children

troubleshoot/monitoring/node-moves-outages.md

Co-authored-by: shainaraskas <[email protected]>

github-actions · 2025-06-12T08:46:13Z

🔍 Preview links for changed docs:

troubleshoot/monitoring/node-moves-outages.md

🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes.

move nodes doc updated

4e92abc

github-actions bot deployed to docs-preview June 5, 2025 10:00 View deployment

title update

7aa65ab

github-actions bot deployed to docs-preview June 5, 2025 10:06 View deployment

eedugon marked this pull request as ready for review June 5, 2025 10:09

eedugon requested review from a team as code owners June 5, 2025 10:09

alert name updated

f75d235

github-actions bot deployed to docs-preview June 5, 2025 10:14 View deployment

This comment was marked as outdated.

Sign in to view

jakommo reviewed Jun 5, 2025

View reviewed changes

troubleshoot/monitoring/node-moves-outages.md Outdated Show resolved Hide resolved

jakommo reviewed Jun 5, 2025

View reviewed changes

troubleshoot/monitoring/node-moves-outages.md Outdated Show resolved Hide resolved

jakommo reviewed Jun 5, 2025

View reviewed changes

troubleshoot/monitoring/node-moves-outages.md Show resolved Hide resolved

jakommo approved these changes Jun 5, 2025

View reviewed changes

stefnestor reviewed Jun 5, 2025

View reviewed changes

troubleshoot/monitoring/node-moves-outages.md Outdated Show resolved Hide resolved

stefnestor reviewed Jun 5, 2025

View reviewed changes

troubleshoot/monitoring/node-moves-outages.md Outdated Show resolved Hide resolved

stefnestor reviewed Jun 5, 2025

View reviewed changes

troubleshoot/monitoring/node-moves-outages.md Outdated Show resolved Hide resolved

stefnestor reviewed Jun 5, 2025

View reviewed changes

troubleshoot/monitoring/node-moves-outages.md Outdated Show resolved Hide resolved

Apply suggestions from code review

92ea67a

Co-authored-by: Stef Nestor <[email protected]>

github-actions bot deployed to docs-preview June 5, 2025 19:32 View deployment

applying other suggestions by reviewers

1f43345

github-actions bot deployed to docs-preview June 5, 2025 19:45 View deployment

Update node-moves-outages.md

785a285

github-actions bot deployed to docs-preview June 6, 2025 05:46 View deployment

eedugon commented Jun 6, 2025

View reviewed changes

troubleshoot/monitoring/node-moves-outages.md Outdated Show resolved Hide resolved

Update troubleshoot/monitoring/node-moves-outages.md

1ddf86c

github-actions bot had a problem deploying to docs-preview June 7, 2025 05:23 Failure

kunisen reviewed Jun 7, 2025

View reviewed changes

troubleshoot/monitoring/node-moves-outages.md Outdated Show resolved Hide resolved

Update troubleshoot/monitoring/node-moves-outages.md

2df3c2d

github-actions bot deployed to docs-preview June 7, 2025 05:25 View deployment

FAQ style removed and minor introductory paragraphs

f4e9240

github-actions bot deployed to docs-preview June 10, 2025 09:42 View deployment

eedugon requested a review from shainaraskas June 10, 2025 09:44

titles updated per Kuni suggestion

bb9f4f0

github-actions bot deployed to docs-preview June 10, 2025 14:53 View deployment

Merge branch 'main' into ech_node_moves_troubleshoot

ea69dea

github-actions bot deployed to docs-preview June 11, 2025 09:31 View deployment

shainaraskas reviewed Jun 11, 2025

View reviewed changes

Apply suggestions from code review

c6a4d23

Co-authored-by: shainaraskas <[email protected]>

github-actions bot deployed to docs-preview June 12, 2025 08:46 View deployment


		What is the impact?
		This document explains the "`Move nodes off of allocator...`" message that appears on the [activity page](../../deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) in {{ech}} deployments, helping you understand its meaning, implications, and what to expect.


		During the routine system maintenance, having replicas and multiple availability zones ensures minimal interruption to your service. When nodes are vacated, as long as you have high availability, all search and indexing requests are expected to work within the reduced capacity until the node is back to normal.
		![Move nodes off allocator](images/move_nodes_ech_allocator.jpeg)


		1. Enable [Stack monitoring](/deploy-manage/monitor/stack-monitoring/ece-ech-stack-monitoring.md#enable-logging-and-monitoring-steps) (logs and metrics) on your deployment. Only metrics collection is required for these notifications to work.

		In the deployment used as the destination of Stack monitoring:

ECH: Move nodes off allocator doc updated #1619

Are you sure you want to change the base?

ECH: Move nodes off allocator doc updated #1619

Conversation

eedugon commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

eedugon commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jakommo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shainaraskas commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eedugon commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kunisen commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[1]

[2]

[3]

Uh oh!

Uh oh!

Uh oh!

eedugon commented Jun 10, 2025

Uh oh!

kunisen commented Jun 10, 2025

Uh oh!

eedugon commented Jun 10, 2025

Uh oh!

kunisen commented Jun 11, 2025

Uh oh!

shainaraskas Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

shainaraskas Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shainaraskas Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

shainaraskas Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

Uh oh!

eedugon commented Jun 5, 2025 •

edited

Loading

shainaraskas commented Jun 5, 2025 •

edited

Loading

eedugon commented Jun 5, 2025 •

edited

Loading

kunisen commented Jun 6, 2025 •

edited

Loading