Skip to content

unify Combine and CombineTypes #2651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

winitzki
Copy link
Collaborator

@winitzki winitzki commented Mar 13, 2025

Allow the /\ operator to work on record types, exactly as in the functionality of //\\.

This PR follows up on discussions elsewhere:

dhall-lang/dhall-lang#1079 (comment)
dhall-lang/dhall-lang#1378 (comment)

This is a change that keeps all type-correct programs still type-correct but adds new correct programs.

Examples:

 { a : Bool }  { b : Bool }

{ a : Bool, b : Bool }

 { a = { b : Natural}, a = { c : Natural } }

{ a = { b : Natural, c : Natural } }

 { a : { b : Bool } }  { a : { c : Bool } }

{ a : { b : Bool, c : Bool } }

None of these expressions would be type-checked by the old Dhall code because the operator was not supposed to work on record types (only on record values). However, there seems to be no good reason to have two different operators for merging record values and for merging record types.

Recent discussions move towards making Dhall have similar facilities for record values and record types. As long as there is no ambiguity between them, we can keep the same operators (dot, //, with) to work with both record values and record types.

  • The Combine operation (/\) should work on record types just like the CombineTypes operation (//\\)
  • Typechecking must work
  • Beta-normalization must work
  • New regression tests (not sure how to organize that without breaking things)

Any comments welcome; I am fairly new to this code base and to Haskell.

Open questions:

  • Currently the /\ operator is inserted automatically when a record has duplicate fields. For example, { a = { x = 1 }, a = { y = 2 } } is translated into { a = { x = 1 } /\ { y = 2 } } at parsing stage and then reduced to { a = { x = 1, y = 2 } } at evaluation stage. Do we want the same behavior for record types? At the moment, { a : { x : Natural }, a : { y : Bool } } is a type error ("duplicate field: a"). Is there any reason not to rewrite this as { a : { x : Natural } //\\ { y : Bool } } like it is rewritten for record values? Why is it permitted for record values, what was the rationale? Perhaps it is related to the next question:
  • Currently the nested record definition is available for record values but not for record types. For example, { a.b = 1 } is equivalent to { a = { b = 1 }} but { a.b : Natural } is not equivalent to { a = { b : Natural } }, and in fact { a.b : Natural } is not allowed (it is a parse error). I see that there is an ambiguity: { a.b : Natural } could mean { a : { b : Natural } } or { a = { b : Natural } }. Perhaps we do not want to support the syntax { a.b : Natural } at all, because of that ambiguity?
  • Do we want to deprecate //\\ in the future and to remove that operator from the language, if the entire functionality of //\\ is reproduced by /\ after this PR?

Copy link
Collaborator

@mmhat mmhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@winitzki I am sympathic to the change, but I think we should change the standard first. Otherwise we risk that this Dhall implementation diverges from it, which may result in confusing errors: For example, dhall <<< '{foo : Text} /\ {bar : Text}' would work, but the same expression yields an error in another -- yet standard-compliant -- implementation.

VRecord xRs' ->
return xRs'
-- Make sure both are on the Left (both record values) or on the Right (both record types).
rightTypeOrRecord <- case (leftTypeOrRecord, _R', r') of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to first construct leftTypeOrRecord, then rightTypeOfRecord, and do the check that they are both Left or both Right afterwards in a separate step.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to first construct leftTypeOrRecord, then rightTypeOfRecord, and do the check that they are both Left or both Right afterwards in a separate step.

I'm not sure I understand your comment. It appears to me that my code already does what you say: it first constructs leftTypeOrRecord, then rightTypeOfRecord, and then checks that they are both Left or both Right in a separate expression.

Copy link
Collaborator

@mmhat mmhat Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that you match on leftTypeOrRecord when you construct rightTypeOrRecord.
I think something like the following results in better error messages:

let isTypeOrRecord t = do
        _T <- loop ctx t

        let t' = eval values t
            
        case (_T, t') of
                (VRecord xs, _) -> return (Left xs)

                (VConst _T', VRecord xs) -> return (Right (_T', xs))

                _ -> do
                    let _T'' = quote names _T'

                    case mk of
                        Nothing -> die (MustCombineARecord '' l'' _T'')
                        Just k  -> die (InvalidDuplicateField k t _T'')

leftTypeOrRecord <- isTypeOrRecord l
rightTypeOrRecord <- isTypeOrRecord r

case (leftTypeOrRecord, rightTypeOrRecord)
    (Left ..., Left ...) -> ...
    (Right ..., Right ...) -> ...
    (Left ..., Right ...) -> die (TriedToCombineLitWithType ...)
    (Right ..., Left ...) -> die (TriedToCombineTypeWithLit ...)

@winitzki
Copy link
Collaborator Author

@winitzki I am sympathic to the change, but I think we should change the standard first. Otherwise we risk that this Dhall implementation diverges from it, which may result in confusing errors: For example, dhall <<< '{foo : Text} /\ {bar : Text}' would work, but the same expression yields an error in another -- yet standard-compliant -- implementation.

I will make a PR for the standard.

@winitzki
Copy link
Collaborator Author

Started working on a PR for the language standard. Still lots of new tests need to be added.
dhall-lang/dhall-lang#1404

@winitzki
Copy link
Collaborator Author

Added new tests to this dhall-lang/dhall-lang#1404

@mmhat My question is - how can I run tests for dhall-haskell with the new tests that I have added to dhall-lang in that PR?

@TristanCacqueray
Copy link
Collaborator

@winitzki you'll have to update the submodule at dhall-haskell/dhall/dhall-lang. You can try locally by pulling the PR in this folder, but I think we'll have to merge the change in dhall-lang before updating the submodule in dhall-haskell.

@winitzki
Copy link
Collaborator Author

@winitzki you'll have to update the submodule at dhall-haskell/dhall/dhall-lang. You can try locally by pulling the PR in this folder, but I think we'll have to merge the change in dhall-lang before updating the submodule in dhall-haskell.

I see. Let me first try to run tests by manually copying dhall-lang over.

@winitzki
Copy link
Collaborator Author

@mmhat I have rewritten the code in that place. I no longer use any extra Left / Right, the code is a lot simpler now. Could you please take a look and let me know if there are further improvements, especially as concerns error messages?

@winitzki winitzki requested a review from mmhat March 21, 2025 09:38
@winitzki
Copy link
Collaborator Author

@mmhat I have run this PR against all regression tests from the PR dhall-lang/dhall-lang#1404 and all passed. (I made sure new tests were getting run.) So, at the moment this looks good.

@winitzki
Copy link
Collaborator Author

winitzki commented May 7, 2025

The error is due to http pastebin not available:

 Error: Server temporarily unavailable
        
        HTTP status code: 503
        
        URL: https://httpbin.org/user-agent

I had previously an idea that it would be good to stand up temporary local servers during test, so that tests no longer depend on actual external services being available.

This pertains to pastebin and also to the random string service.

Copy link
Collaborator

@mmhat mmhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can refactor the code base such that the cases for Combine and CombineTypes use the same code path? After all, the former is supposed to be an alias for the latter if it is used with record types.

@@ -845,7 +833,45 @@ infer typer = loop

return (VRecord xTs)

combineTypes [] xLs' xRs'
let combineTypesCheck xs xLs₀' xRs₀' = do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can pull this out of look and use it in both the Combine and CombineTypes cases?
Also, can we rename this to something more meaningful like checkForFieldCollisions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

-- If both sides of `Combine` are record terms, we use combineTypes to figure out the resulting type.
-- If both sides are record types, we use combineTypesCheck and then return the upper bound of two types.
-- Otherwise there is a type error.
case (_L', l', _R', r') of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we threw a CombineTypesRequiresRecordType if we were combining types. Do you think its feasible to retain that behaviour, i.e. throw that error when we combine types and one of the other errors when we combine values?

Copy link
Collaborator Author

@winitzki winitzki Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think about what error behavior we need. The new operator /\ should report an error if it is applied to a record term and a non-record term; or to a record type and a non-record type; or to a term and a type.

  • { a = 0 } /\ 123 Error: Must combine records, instead got 123 .
  • { a : Bool } /\ Natural Error: CombineTypes requires a record type, instead got Natural.
  • { a : Bool } /\ { b = 0 }. What should be the error? CombineTypes requires a record type but instead got a record term { b = 0 }, or Must combine records but instead got a record type { a : Bool } ? I think we can just assume that the first operand is correct and the second one is wrong, and report CombineTypes requires a record type but instead got a record term { b = 0 }.

I will work on this tomorrow. It's definitely feasible to implement detailed error messages here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the last case, maybe something like:

The combine operator `/\` works either on two record literals or two record types. You provided expressions
  * `{a : Bool}`, which is a record type, and
  * `{b = 0}`, which is a record literal of type `{b : Natural }`.

Also, for the other cases we probably want a similar error message if both arguments are neither record term nor record type; For example if the expression was 1 /\ 2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need unit tests for all those cases. I don't see any unit tests.

Right now, an incorrect use of /\ gives an error CombineTypesRequiresRecordType that talks about //\\ instead. I'll work on fixing this now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be great! Thank you for working on this!

Copy link
Collaborator Author

@winitzki winitzki Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the error messages, making them more detailed. I also added some unit tests to verify that those error messages are actually being generated. For instance, the tests verify that errors for the operator describe examples involving , while errors for the operator describe examples with .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants