Skip to content

Commit 8e1a934

Browse files
sgrifweiznich
authored andcommitted
Replace NonAggregate with something less broken
This commit implements the proposal laid out at https://discourse.diesel.rs/t/proposed-change-replace-nonaggregate-with-something-less-broken-that-can-support-group-by/18. The changes made in this commit now correctly enforce semantics around the mixing of aggregate/non-aggregate expressions, and lays the foundations required for full `GROUP BY` support in the future. This commit does not implement `GROUP BY` as a feature in public API, though I have added some minimal support to prove the soundness of the change. Since this will likely be the largest breaking change in 2.0, I am going to include as much detail as I can about the problem, the reasoning behind the solution, and the likely impact of the change. Recap of the problem ---- `NonAggregate` was originally introduced in c66d96f. The goal was to prevent the following error at compile time: [local] sean@sean=# select count(*), name from users; ERROR: 42803: column "users.name" must appear in the GROUP BY clause or be used in an aggregate function I didn't think about this nearly as much as I should have at the time, which resulted in a hilariously broken implementation that has prevented the addition of `group_by` support, and left bugs in the codebase for more than 4 years. At the time I was focused on "mixing aggregate and non-aggregate expressions in a select statement". Since select uses tuples to represent multiple values, I added a trait to represent non-aggregate values, added it as a bound for `impl Expression for AnyTuple`, and called it a day. This had a number of problems. The most obvious was that it prevented valid queries involving multiple aggregate expressions. At the time I figured we'd have a separate trait for aggregate expressions, and then `impl Expression for AnyTuple where AllFields: NonAggregate | AllFields: Aggregate`. This eventually led to [RFC #1268](rust-lang/rfcs#1268), which doesn't seem likely to be stabilized soon, and even if it were we still have the other issues with this solution. The next issue is that you cannot say whether a given expression is aggregate by looking at it on its own. The answer to "Is `users`.`name` aggregate?" depends on whether or not that column appears in the group by clause. So any trait which correctly handles this must be generic over the group by clause, or it cannot be correctly implemented for columns. However, the most egregious issue with that commit is that it does not handle *any* composite expressions besides tuples. Even today, `.select((count_star(), id))` fails, but `.select(count_star() + id)` will compile (and either error at runtime or give non-deterministic results depending on your backend). User Impact ---- This change will unfortunately have more breakage than I had anticipated. That said, the breakage only affects *extremely* generic code, and I do not believe that the majority of our users will notice or care about this change. There are three main ways in which the breakage will affect users: - The additional bound on `SelectStatement<...>: SelectDsl` and `SelectStatement<...>: Query`. - This one broke our test suite in a few places, mainly where we have *really* generic code to say "I can select T.eq(sql_string)". I do not believe this is representative of real code. - I did a GitHub-wide search of all occurances of `SelectableExpression` (which is *not* the bound on the public API, but is the bound on `SelectStatement`'s impl, and would point to broken code). I found one repo which is relying on internals that will break, and a lot of results from Wundergraph. I didnt' look at those. This probably breaks Wundergraph. Sorry, Georg. It should be easy to fix I promise. - `NonAggregate` can no longer be directly implemented - I did a GitHub-wide search for `NonAggregate`. With one exception, every repo implementing this trait directly was on pre-1.0. The only affected repo is manually implementing `Expression` instead of using `#[derive(AsExpression)]`. With that change they will be future-proofed. - `T: NonAggregate` no longer implies `(OtherType, T): NonAggregate` - This broke `infer_schema`, since it was relying on `AssocType: NonAggregate` to know `(known_column, AssocType, known_column): Expression`. Ironically, that's no longer important, it did still break due to the first item on this list. I could not find any code in the wild that fell into this category. Originally I thought that the only code which would be affected by this is code that wrote `impl NonAggregate`, since that code would need to change to `impl ValidGrouping`. However, I missed a few things when I made that assessment. The first is that... Well the feature exists. The whole point of this was to prevent `aggregate + non_aggregate` from compiling when passed to select, which implies a new trait bound *somewhere*. While `SelectStatement` and its impls are private API, it's really easy to couple yourself ot the bounds on those impls. It doesn't help that `rustc` literally recommends that folks do that when errors occur. Any code that is written as `Foo: SelectDsl<Selection>` will be fine, but any code that's gone down the rabbit hole and has copied the bounds from `impl SelectDsl for SelectStatement<...>` will break. I didn't find many cases in the wild, but I do think it's relatively common. The second thing I missed is that "is this aggregate" is not a binary question. Originally I expected we would have two answers to the question, and compound expressions would enforce that their sub-expressions all had the same answer. The issue is that `1` doesn't care. You can have `count(*) + 1`, and you can also have `non_aggregate + 1`. `1` is always valid, regardless of aggregation. So we need three answers. The problem is that this means we can't have `T: NonAggregate` mean everything it used to. On stable `T: NonAggregate` will mean `T: ValidGrouping<()>`, and that `T` can be passed to everything with a `NonAggregate` bound (`filter`, `order`, etc). On nightly, it also implies `T::IsAggregate: MixedAggregates<is_aggregate::No, Output = is_aggregate::No>`. In English, it implies that this is valid with no group by clause, and its output can appear with an expression which is not aggregate (but might be with a different group by clause). The outcome of this is that `T: NonAggregate` implies `(T, Other): NonAggregate`. However the missing link from both stable and unstable is `is_aggregate::No: MixedAggregates<T::IsAggregate>` being implied, which would mean `(Other, T): NonAggregate` would be implied. Ultimately this is a really long-winded way of saying that `T: NonAggregate` used to imply interactions with other types. That is no longer consistently true. It's unlikely there are many affected users, but any that are affected will need to directly have a `ValidGrouping` bound. Implementation Notes --- Because this broke diesel_infer_schema, I had to do a version bump to get that crate to rely on the released 1.4. There's a note on the unsoundness of the `NonAggregate` impl of `Subselect`. I haven't changed the bounds on that impl, but we almost certainly should address it before 2.0 is released. I opted to validate the new bound in `SelectDsl`, so folks get errors on `.select` instead of `.load`. We can't validate this on calls to both `.select` *and* `.group_by`, since a select statement valid with a group by is often invalid without one, and a group by clause usually makes the default select clause invalid. While this commit does not add proper group by support, I fleshed it out a bit to test the type constraints. Right now a column only considers itself valid if there is no group by clause, or if the group by clause is exactly that column. I had more to say but I went on a call and lost my train of thought. I need to come back and finish writing this later. Notable Limitations --- `SELECT lower(name) FROM users GROUP BY lower(name)` probably can't be represented. Unanswered Questions --- `BoxableExpression` has been limited to only work when there's no group by clause, and only work with types which are `is_aggregate::No`. This is probably not what we want to do, but was the smallest change to start.
1 parent 4847ecf commit 8e1a934

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+479
-144
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ matrix:
100100
script:
101101
- (cd diesel_cli && cargo test --no-default-features --features "sqlite-bundled")
102102
- rust: 1.40.0
103-
name: "Minimal supported rust version == 1.40.0"
103+
name: "Minimal supported rust version == 1.40.0"
104104
script:
105105
- cargo check --all
106106

CHANGELOG.md

Lines changed: 59 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,21 @@ for Rust libraries in [RFC #1105](https://github.com/rust-lang/rfcs/blob/master/
3030
* Added support for SQLite's `UPSERT`.
3131
You can use this feature above SQLite version 3.24.0.
3232

33+
* Multiple aggregate expressions can now appear together in the same select
34+
clause. See [the upgrade notes](#2-0-0-upgrade-non-aggregate) for details.
35+
36+
* `ValidGrouping` has been added to represent whether an expression is valid for
37+
a given group by clause, and whether or not it's aggregate. It replaces the
38+
functionality of `NonAggregate`. See [the upgrade
39+
notes](#2-0-0-upgrade-non-aggregate) for details.
40+
3341
### Removed
3442

3543
* All previously deprecated items have been removed.
36-
* Support for uuid version < 0.7.0 has been removed
44+
* Support for uuid version < 0.7.0 has been removed.
45+
* `#[derive(NonAggregate)]` has been deprecated in favor of
46+
`#[derive(ValidGrouping)]`. The `NonAggregate` trait is now a trait alias, and
47+
cannot be directly implemented. Both derives generate identical code.
3748

3849
### Changed
3950

@@ -63,6 +74,20 @@ for Rust libraries in [RFC #1105](https://github.com/rust-lang/rfcs/blob/master/
6374

6475
* Boxed queries (constructed from `.into_boxed()`) are now `Send`.
6576

77+
* The handling of mixed aggregate values is more robust. Invalid queries such as
78+
`.select(max(id) + other_column)` are now correctly rejected, and valid
79+
queries such as `.select((count_star(), max(other_column)))` are now correctly
80+
accepted. For more details, see [the upgrade notes](#2-0-0-upgrade-non-aggregate).
81+
82+
* `NonAggregate` is now a trait alias for `ValidGrouping<()>` for expressions
83+
that are not aggregate. On stable this is a normal trait with a blanket impl,
84+
but it should never be implemented directly. With the `unstable` feature, it
85+
will use trait aliases which prevent manual implementations.
86+
87+
Due to language limitations, we cannot make the new trait alias by itself
88+
represent everything it used to, so in some rare cases code changes may be
89+
required. See [the upgrade notes](#2-0-0-upgrade-non-aggregate) for details.
90+
6691
### Fixed
6792

6893
* Many types were incorrectly considered non-aggregate when they should not
@@ -93,6 +118,39 @@ for Rust libraries in [RFC #1105](https://github.com/rust-lang/rfcs/blob/master/
93118
Please use `diesel::upsert` instead.
94119

95120

121+
### Upgrade Notes
122+
123+
#### Replacement of `NonAggregate` with `ValidGrouping`
124+
<a name="2-0-0-upgrade-non-aggregate"></a>
125+
126+
FIXME: This should probably be on the website, but I wanted to document it in
127+
the PR adding the changes.
128+
129+
Key points:
130+
131+
- Rules for aggregation are now correctly enforced. They match the semantics of
132+
PG or MySQL with `ONLY_FULL_GROUP_BY` enabled.
133+
- As before, `sql` is the escape hatch if needed.
134+
- MySQL users can use `ANY_VALUE`, PG users can use `DISTINCT ON`. Also
135+
consider using max/min/etc to get deterministic values.
136+
- Any `impl NonAggregate` must be replaced with `impl ValidGrouping`
137+
- If you were deriving before you can still derive.
138+
- For most code, `T: NonAggregate` should continue to work. Unless you're
139+
getting a compiler error, you most likely don't need to change it.
140+
- The full equivalent of what `T: NonAggregate` used to mean is:
141+
142+
where
143+
T: ValidGrouping<()>,
144+
T::IsAggregate: MixedGrouping<is_aggregate::No, Output = is_aggregate::No>,
145+
is_aggreagte::No: MixedGrouping<T::IsAggregate, Output = is_aggreagte::No>,
146+
147+
- With `feature = "unstable"`, `T: NonAggregate` implies the first two bounds,
148+
but not the third. On stable only the first bound is implied. This is a
149+
language limitation.
150+
- `T: NonAggregate` can still be passed everywhere it could before, but `T:
151+
NonAggregate` no longer implies `(OtherType, T): NonAggregate`.
152+
- With `feature = "unstable"`, `(T, OtherType): NonAggregate` is still implied.
153+
96154
[2-0-migration]: FIXME write a migration guide
97155

98156
## [1.4.4] - 2020-03-22
@@ -1718,5 +1776,3 @@ for Rust libraries in [RFC #1105](https://github.com/rust-lang/rfcs/blob/master/
17181776
[1.4.0]: https://github.com/diesel-rs/diesel/compare/v1.3.0...v1.4.0
17191777
[1.4.1]: https://github.com/diesel-rs/diesel/compare/v1.4.0...v1.4.1
17201778
[1.4.2]: https://github.com/diesel-rs/diesel/compare/v1.4.1...v1.4.2
1721-
[1.4.3]: https://github.com/diesel-rs/diesel/compare/v1.4.2...v1.4.3
1722-
[1.4.4]: https://github.com/diesel-rs/diesel/compare/v1.4.3...v1.4.4

diesel/Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "diesel"
3-
version = "1.4.4"
3+
version = "2.0.0"
44
authors = ["Sean Griffin <[email protected]>"]
55
license = "MIT OR Apache-2.0"
66
description = "A safe, extensible ORM and Query Builder for PostgreSQL, SQLite, and MySQL"
@@ -34,7 +34,7 @@ bitflags = { version = "1.0", optional = true }
3434
r2d2 = { version = ">= 0.8, < 0.9", optional = true }
3535

3636
[dependencies.diesel_derives]
37-
version = "~1.4.0"
37+
version = "~2.0.0"
3838
path = "../diesel_derives"
3939

4040
[dev-dependencies]

diesel/src/expression/array_comparison.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,13 @@ use crate::query_builder::*;
55
use crate::result::QueryResult;
66
use crate::sql_types::Bool;
77

8-
#[derive(Debug, Copy, Clone, QueryId, NonAggregate)]
8+
#[derive(Debug, Copy, Clone, QueryId, ValidGrouping)]
99
pub struct In<T, U> {
1010
left: T,
1111
values: U,
1212
}
1313

14-
#[derive(Debug, Copy, Clone, QueryId, NonAggregate)]
14+
#[derive(Debug, Copy, Clone, QueryId, ValidGrouping)]
1515
pub struct NotIn<T, U> {
1616
left: T,
1717
values: U,
@@ -140,7 +140,7 @@ where
140140
}
141141
}
142142

143-
#[derive(Debug, Clone, NonAggregate)]
143+
#[derive(Debug, Clone, ValidGrouping)]
144144
pub struct Many<T>(Vec<T>);
145145

146146
impl<T: Expression> Expression for Many<T> {

diesel/src/expression/bound.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,4 +47,6 @@ impl<T, U, QS> SelectableExpression<QS> for Bound<T, U> where Bound<T, U>: Appea
4747

4848
impl<T, U, QS> AppearsOnTable<QS> for Bound<T, U> where Bound<T, U>: Expression {}
4949

50-
impl<T, U> NonAggregate for Bound<T, U> {}
50+
impl<T, U, GB> ValidGrouping<GB> for Bound<T, U> {
51+
type IsAggregate = is_aggregate::Never;
52+
}

diesel/src/expression/coerce.rs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,4 +53,8 @@ where
5353
}
5454
}
5555

56-
impl<T, ST> NonAggregate for Coerce<T, ST> where T: NonAggregate {}
56+
impl<T, ST, GB> ValidGrouping<GB> for Coerce<T, ST>
57+
where T: ValidGrouping<GB>,
58+
{
59+
type IsAggregate = T::IsAggregate;
60+
}

diesel/src/expression/count.rs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
use super::functions::sql_function;
2-
use super::Expression;
2+
use super::{Expression, ValidGrouping, is_aggregate};
33
use crate::backend::Backend;
44
use crate::query_builder::*;
55
use crate::result::QueryResult;
@@ -62,6 +62,10 @@ impl Expression for CountStar {
6262
type SqlType = BigInt;
6363
}
6464

65+
impl<GB> ValidGrouping<GB> for CountStar {
66+
type IsAggregate = is_aggregate::Yes;
67+
}
68+
6569
impl<DB: Backend> QueryFragment<DB> for CountStar {
6670
fn walk_ast(&self, mut out: AstPass<DB>) -> QueryResult<()> {
6771
out.push_sql("COUNT(*)");

diesel/src/expression/exists.rs

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
use crate::backend::Backend;
22
use crate::expression::subselect::Subselect;
3-
use crate::expression::{AppearsOnTable, Expression, NonAggregate, SelectableExpression};
3+
use crate::expression::{AppearsOnTable, Expression, ValidGrouping, SelectableExpression};
44
use crate::query_builder::*;
55
use crate::result::QueryResult;
66
use crate::sql_types::Bool;
@@ -42,7 +42,12 @@ where
4242
type SqlType = Bool;
4343
}
4444

45-
impl<T> NonAggregate for Exists<T> where Subselect<T, ()>: NonAggregate {}
45+
impl<T, GB> ValidGrouping<GB> for Exists<T>
46+
where
47+
Subselect<T, ()>: ValidGrouping<GB>,
48+
{
49+
type IsAggregate = <Subselect<T, ()> as ValidGrouping<GB>>::IsAggregate;
50+
}
4651

4752
#[cfg(not(feature = "unstable"))]
4853
impl<T, DB> QueryFragment<DB> for Exists<T>

diesel/src/expression/functions/date_and_time.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ use crate::sql_types::*;
99
/// Represents the SQL `CURRENT_TIMESTAMP` constant. This is equivalent to the
1010
/// `NOW()` function on backends that support it.
1111
#[allow(non_camel_case_types)]
12-
#[derive(Debug, Copy, Clone, QueryId, NonAggregate)]
12+
#[derive(Debug, Copy, Clone, QueryId, ValidGrouping)]
1313
pub struct now;
1414

1515
impl Expression for now {

diesel/src/expression/functions/mod.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ macro_rules! no_arg_sql_function_body_except_to_sql {
99
($type_name:ident, $return_type:ty, $docs:expr) => {
1010
#[allow(non_camel_case_types)]
1111
#[doc=$docs]
12-
#[derive(Debug, Clone, Copy, $crate::query_builder::QueryId, $crate::expression::NonAggregate)]
12+
#[derive(Debug, Clone, Copy, QueryId, ValidGrouping)]
1313
pub struct $type_name;
1414

1515
impl $crate::expression::Expression for $type_name {

0 commit comments

Comments
 (0)