Improving diff3 conflict quality -- reducing the prevalence of nested conflicts with recursive merges #1855

newren · 2025-01-21T16:41:35Z

Nested conflicts are confusing and painful. We can't entirely get rid of them, but we can avoid them when:

the nested conflict would have been found in the merge base (due to a recursive merge that merges the merge bases)
there were exactly two merge bases

As a reminder, when we have more than one merge base, we merge the merge bases to get a virtual merge base (instead of just picking one of the merge bases), because we want as much information in our virtual merge base as possible. Getting as much information as possible into the virtual merge base reduces conflicts and reduces surprises in the resolution of the outer merge. Also as a reminder, when we merge the merge bases and there isn't a clean resolution, we need to pick something that is unlikely to match either side (otherwise the outer merge can accidentally cleanly merge without conflict to the side not picked for the virtual merge base).

The idea for reducing nested conflicts in recursive merges is somewhat simple: We currently do that by choosing to use a conflicted region with contents from both sides, but an alternative is to simply take the version from the merge base of the merge bases.

In fact, there is prior art for this. For binary conflicts, modify/delete conflicts, submodule conflicts, symlink conflicts, and perhaps others, there isn't a way to insert a "conflicted text region" for the resolution, so these conflict types simply take the merge base (of the merge bases) as the resolution. This series simply extends this logic of using the merge base (of the merge bases) to individual text conflict regions as well.

This is a change that has been discussed a few times before, but only with an untested (and not-quite-correct) patch and incomplete rationale; see below for links. The implementation of this change is pretty small and straightforward, but my rationale in the second patch is pretty lengthy, in part to try to explain why it's not just a valid way to merge, but why it's specifically expected to generally be superior to using conflicted regions for the virtual merge base.

There is one part of these patches that feels suboptimal to me -- it only applies when there are two merge bases; not when there are more than two. That could be viewed as a limitation of the current implementation, but it seems somewhat tied to our pairwise merging of merge bases (see the second commit message for more details about why that limitation arises), and coming up with an alternative N-way merge of merge-bases seems a little unpractical. Having exactly two merge bases, though, is very common and

Naturally, this change also required modifying the two testcases in our testsuite that explicitly tested what the expected conflicts were, since now the "base" version for those conflicts will be simpler.

== Previous discussions ==

I originally posted this idea at
https://lore.kernel.org/git/[email protected]/
but I didn't have a good framework for evaluating whether it was a good idea in general; that took me a while. In the mean time, Hannes in a later thread tested it out (or at least the original buggy version I posted with a small compile fix) and found it
gave good results for his case:
https://lore.kernel.org/git/[email protected]/
mhagger also expressed some interest in the idea when I talked with him about it at Git Merge '22:
https://lore.kernel.org/git/CABPp-BFzOs7e61JZocjW0=kZYms74uqhzyqPhAL3ZDi84EwQ5Q@mail.gmail.com/

================== BEFORE SENDING UPSTREAM ======================

The paragraph about the 2 merge base requirement doesn't make any sense. I no longer know what test case I was using where I discovered that...or whether there even was a real test case. Is the requirement even real?

Similar to XDL_MERGE_FAVOR_{OURS,THEIRS} add a new mode for favoring the base version in the event of a conflict, and add a --base flag to merge-file to allow testing this out. This will be used in a subsequent commit to reduce the prevalence of nested conflict markers which traditionally arise when using the diff3 merge.conflictStyle due to having conflicts in the virtual merge base. By eschewing using conflicted regions as the resolution for the virtual merge base, and instead preferring to use the version of the text from the merge base of the merge bases, we can avoid causing nested conflicts in the outer merge. Signed-off-by: Elijah Newren <[email protected]>

This is a pretty small code change, but one that perhaps deserves a lengthy explanation... When merging, it is possible to have nested conflicts. This most frequently happens when using merge.conflictStyle=diff3 (or zdiff3) and doing so in a case where there is more than one merge base. For example: L1---L2 / \ / \ B X ? \ / \ / R1---R2 Here on branches L and R there are many commits omitted, but L1 and R1 are both valid merge bases for a merge of L2 and R2. This reason we end up with two valid merge bases is because we have both a merge from L into R and a merge from R into L (each merge occurring before or at L2 and R2, respectively). When merging L2 and R2 using the diff3 conflict style, today you might get a conflict of the form: Non-conflicting leading content <<<<<<< e11e11e1 (First line of commit message of L2) L2:conflicting region ||||||| merged common ancestors <<<<<<<<< Temporary merge branch 1 L1:conflicting region ||||||||| ba5eba11 B:conflicting region ========= R1:conflicting region ======= R2:conflicting region >>>>>>> 52525252 (First line of commit message of R2) Non-conflicting trailing content where "COMMIT: conflicting region" above stands for several (or even hundreds) of lines of content from the (relevant file of) the relevant commit. You could get another layer of nesting here, if you found that there was more than one merge base of the merge bases. In fact, the number of layers of nesting is not limited. In effect, the higher the depth of recursion needed for merging, the more the "base" version in the diff3 output expands. Reports over the years suggest the presence of nested conflicts diminish the value of having the base version available; the greater the nesting (and perhaps also the longer the length of each region of lines when there is a nested conflict), the more diminished the value is. In fact, it might be preferable for these particular conflicts to have used merge.conflictStyle=merge instead, i.e. to provide 0 context lines for the base version, while still using merge.conflictStyle=diff3 for other cases that don't have conflicts between the merge bases. However, there is an alternative way to handle the recursive merges that would approximate merge.conflictStyle=merge as the number of nesting levels increases: resolve the merge of merge bases not by using a conflicted merge of the two merge bases, but by using their base version. This alternative strategy works because we have some latitude in how the virtual merge base is selected. Using the base version of the merge bases is something we have done before in specific contexts, and in each case doing so fixed actual bugs. For more details, see: 816147e (merge-recursive: add a bunch of FIXME comments documenting known bugs, 2021-03-20) -- particularly the cases where resolution for merge bases are wrong 4ef88fc (merge-ort: add handling for different types of files at same path, 2021-01-01) c73cda7 (merge-ort: copy and adapt merge_submodule() from merge-recursive.c, 2021-01-01) 62fdec1 (merge-ort: flesh out implementation of handle_content_merge(), 2021-01-01) ec61d14 (merge-recursive: Fix modify/delete resolution in the recursive case, 2011-08-11) a129d96 (Allow specifying specialized merge-backend per path., 2007-04-16) -- particularly the "common ancestor" comment and associated code If this explanation feels like "magic" to you, there's an alternative rules-based approach by which we can evaluate the choice of how to create a virtual merge base. We want any virtual merge base to follow these rules: Rule 1) If within a certain range of lines, all merge bases match each other, then use those lines from any of them in the virtual merge base. Rule 2) If within a certain range of lines, there is at most one version of those lines that does not match the merge base of the merge bases, then use that unique version of those lines in the virtual merge base. Rule 3) In lines of the file that disagree between two or more merge bases (and which also disagree with the base of the merge bases), fill those lines in the virtual merge base with something that matches none of the merge bases. The first two rules simply let us resolve cases that are clearly unambiguous. The third rule may look funny but is necessary to avoid the virtual merge base accidentally matching one of the two sides in the outer merge. (If the virtual merge base matches one of the two sides in the outer merge, the merge machinery will think that side of the outer merge made no change and thus that there is no conflict in the outer merge, despite the fact that the two sides of the outer merge may disagree with each other.) If we are using merge.conflictStyle=merge, then these three rules are sufficient; anything else we do will be irrelevant to the end result. In that case, we could even satisfy rule 3 by ignoring the conflicting lines and replacing them with totally random lines. However, for merge.conflictStyle=diff3, we want something that looks more like a "base version" of the relevant file. That gives us a goal for the virtual merge base: Goal 4) In lines of the file falling under rule 3, try to pick something that looks like a base version. For Goal 4, both merging the conflicted portions of the merge bases and taking the base of the merge bases satisfy this goal. Both have their plusses and minuses. But both become less and less useful when there is a deeply nested recursive merge. For a deeply nested recursive merge, the conflicted contents gives a highly nested conflict showing every version of the file going back to the eventual common point in history. In contrast, the "base of the merge bases" strategy instead only gives the single version of the file from that final common point in history. Since codebases tend to grow over time, odds are that the more deeply recursive the merge has to go, the smaller the context that will be provided with the "base of the merge bases" strategy. In the limit, the original version of the lines far enough back in history may have been empty, so the "base of the merge bases" strategy effectively makes recursive merges look like merge.conflictStyle=merge for deep recursions, while still providing some "base version" context for more shallow recursions. As noted near the beginning of this commit message, having something that approaches no context in the special cases of deep recursions is exactly what we'd prefer. So: Goal 5) In lines of the file falling under Rule 3, the more deep the recursion is, the less likely relevant context can be kept; prefer small (or even empty) context regions over very complicated ones. This all sounds great, but there is one gotcha -- since we iteratively merge the merge-bases pairwise, we don't have an easy way to distinguish between Rule 2 and Rule 3 at times. For example, if we have three merge bases and they all disagree on some line, the conflicted-content solution avoids an ambiguity, but taking the "base of the merge bases" introduces one. In particular, for this case of three merge bases that disagree on some line, merging the first two merge bases yields an interim virtual merge base that matches the base, making it look like the virtual merge base has not been modified relative to its base. Then when we merge the third merge base with the interim merge base, we'd think it cleanly resolved to that line from the third merge base, against our wishes. Since all three merge bases differed on that line, we'd want to use the base of the merge bases, but the pairwise merging made that difficult. To avoid this problem, only use the "base of the merge bases" strategy when we have two merge bases. That will limit where this new virtual merge base strategy will help us, but since two merge bases is the most common case for recursive merges, it should still provide significant benefit. Signed-off-by: Elijah Newren <[email protected]>

FIXME: Try to determine either better wording for why it is bad (and maybe an example), or explain why it's fine... Signed-off-by: Elijah Newren <[email protected]>

newren added 3 commits May 10, 2025 17:25

What if I use the strategy for >= 3 merge bases?

87a8a00

FIXME: Try to determine either better wording for why it is bad (and maybe an example), or explain why it's fine... Signed-off-by: Elijah Newren <[email protected]>

newren force-pushed the favor-base branch from 35331a2 to 87a8a00 Compare May 11, 2025 05:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improving diff3 conflict quality -- reducing the prevalence of nested conflicts with recursive merges #1855

Improving diff3 conflict quality -- reducing the prevalence of nested conflicts with recursive merges #1855

Uh oh!

newren commented Jan 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Improving diff3 conflict quality -- reducing the prevalence of nested conflicts with recursive merges #1855

Are you sure you want to change the base?

Improving diff3 conflict quality -- reducing the prevalence of nested conflicts with recursive merges #1855

Uh oh!

Conversation

newren commented Jan 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

newren commented Jan 21, 2025 •

edited

Loading