Skip to content

Add RM_FlushExecutionUnit() API call #11993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: unstable
Choose a base branch
from

Conversation

sjpotter
Copy link
Contributor

@sjpotter sjpotter commented Apr 2, 2023

Currently, if one has a blocked client (say 'blpop x 0'). And one has an RM_Call() that should unblock it (say 'lpush x 1'). The client will not be directly unblocked by the RM_Call, and it will remain blocked until beforeSleep() is run, as that is when handleClientsBlockedOnKeys() will run.

This adds a new API RM_FlushExecutionUnit() that executes propagating pending ops and handleClientsBlockedOnKeys() in the same manner as occurs at the end of processCommand()

this came about in using RM_Call to issue blocking commands (ex blpop), commands that would unblock them (ex: lpush) and also aborting the promises returned from the blocking command.

Depending on the order of operations and when beforeSleep() occurs in the context of those operations, we saw inconsistent behavior in terms of aborts succeeding to abort blocked commands (after the commands that would unblock them have executed). This is to try to make the behavior be able to be consistent, if so desired.

In addition it extends GIL unlock, which already calls the code to propagate pending ops, but didn't unblock clients to also unblock clients

sjpotter added 2 commits April 2, 2023 18:33
Currently, if one has a blocked client (say 'blpop x 0').  And one has an RM_Call() that should unblock it (say 'lpush x 1').  The client will not be directly unblocked by the RM_Call, and it will remain blocked until beforeSleep() is run, as that is when handleClientsBlockedOnKeys() will run.

This adds a flag to RM_Call's ARGV, to enable handleClientsBlockedOnKeys() to be called at the end of an RM_Call if the ready_keys list has keys in it.

Note: if the ready_keys list has keys, from before this RM_Call execution, it will still run.  Perhaps it should check and only run if ready_keys list is empty at this time?
Note: handleClientsBlockedOnKeys() has an assert to prevent it from running when AOF/replica propegation operation are waiting to run, hence this prevents the RM_Call from running with replication enabled (perhaps, it should check if there are any pending pending propegation operations and not run if there are?)

this came about in using RM_Call to issue blocking commands (ex blpop), commands that would unblock them (ex: lpush) and also aborting the promises returned from the blocking command.

Depending on the order of operations and when beforeSleep() occurs in the context of those operations, we saw inconsistent behavior in terms of aborts succeeding to abort blocked commands (after the commands that would unblock them have executed).  This is to try to make the behavior be able to be consistent, if so desired.
@oranagra
Copy link
Member

oranagra commented Apr 2, 2023

I think RM_Call is already overloaded with flags, and this (rather low level) feature should be an API of its own.

note that the current behavior is not that it'll wait until beforeSleep, it can also be triggered when the module command that issued the RM_Call returns (unless that RM_Call is not called from within a module command).

in my eyes, one of the basic use cases of RM_Call is to implement something similar to what scripts can do (i.e. only execute a series of redis commands, without much more), and in this case, the unblocking should be done when the "script" returns.

in your case you're trying to execute a bunch of redis commands, but each of them stands for itself, and you want to it to behave as if they were executed from processCommand.

I think that in that case you should explicitly call RM_HandleUnblockedClients, and you may some day want to execute after several commands, or in a different workflow (not after RM_Call).
This also makes the fact that it can process clients that were blocked even before your RM_Call a non-issue.

@sjpotter
Copy link
Contributor Author

sjpotter commented Apr 2, 2023

I guess what worries me about it not being in RM_Call is the ability to not sanity check it well.

i.e. I was considering putting in a validator like this (re the note in the commit msg)

    if (flags & REDISMODULE_ARGV_UNBLOCK_CLIENTS) {
        if (replicate) {
            errno = ENOTSUP;
            if (error_as_call_replies) {
                sds msg = sdsnew("cannot unblock clients and use replication within the same call");
                reply = callReplyCreateError(msg, ctx);
            }
            return reply;
        }
        if (server.also_propagate.numops != 0) {
            errno = ENOTSUP;
            if (error_as_call_replies) {
                sds msg = sdsnew("cannot unblock clients when there are pending operations to propegate");
                reply = callReplyCreateError(msg, ctx);
            }
            return reply;
        }
        if (listLength(server.ready_keys) != 0) {
            errno = ENOTSUP;
            if (error_as_call_replies) {
                sds msg = sdsnew("cannot unblock clients when there are already keys pending to unblock clients");
                reply = callReplyCreateError(msg, ctx);
            }
            return reply;
        }
    }

a RM_HandleUnblockedClients() loses some context. Perhaps this context doesn't' matter (ala ready keys already being populated), but I'm a bit more nervous about the also_propagate ops. Perhaps there isn't a good reason why it asserts there and one should be able to unblock if there are ops there pending? Or perhaps it shouldn't matter and RM_HandleUnblockedClients() will just check that num_ops == 0 and fail if its not (though losing context of where/why that occurred).

I'm not anti the RM_HandleUnblockedClients() approach, and its easy enough to implement, just felt it was easier to start the conversation of it within RM_Call for those reasons.

I can reimplement it as a new RM_HandleUnblockedClients API, but would want some extra eyeballs on the also_propagate interactions.

@sjpotter
Copy link
Contributor Author

sjpotter commented Apr 2, 2023

also forgot to increment the flag bit from previous one, but probably doesn't matter if we want the new API approach.

@sjpotter sjpotter changed the title unblock clients as part of RM_Call Add RM_HandleUnblockedClients() API call Apr 2, 2023
@sjpotter
Copy link
Contributor Author

sjpotter commented Apr 2, 2023

updated PR/Comment/Title per @oranagra's suggestion.

@oranagra
Copy link
Member

oranagra commented Apr 3, 2023

i think that adding a some sort of RM_DrainPropagationQueue or alike is also needed.
i.e. in normal cases, a module command which works like a script, is executing several commands with RM_Call and then they're all propagated with a multi-exec when the module command exists.

but in your case (assuming you did use replication), you execute a set of RM_Call commands that are unrelated, as if they arrived from the network, so such case the module would want to flush the propagation queue after each RM_Call (before calling RM_HandleUnblockedClients).
that one indeed feels more like an RM_Call flag, but i still feel that maybe it should be an explicit API.
i feel that we have too many flags, and if some things can be useful regardless of RM_Call, and if they can be added as a post call action, rather than needing to modify how the call operates, i think we should consider making them an API.

@MeirShpilraien WDYT?

@sjpotter
Copy link
Contributor Author

sjpotter commented Apr 3, 2023

can an RM_DrainPropagationQueue() occur in all module contexts (where one might run an RM_Call) safely?

@oranagra
Copy link
Member

oranagra commented Apr 3, 2023

i think so, and if not we can always let it error.
@guybe7 @MeirShpilraien please comment.

@sjpotter
Copy link
Contributor Author

sjpotter commented Apr 3, 2023

@oranagra In speaking with @MeirShpilraien today, he was thinking (as I understood it) that perhaps having calls like this are too much of exposure into Redis internals for a module.

If we think of execution as an "Execution Unit" (using quotes, as not quite the same as Redis' current usage of the term, but a close analog), a processCommand() is a full execution unit, so it handles all these "side effects" (my terminology) after the command is executed. In general in Redis, one could look at taking/releasing the GIL as entering/exiting execution units, or even being called from a timer or the like. Though the first doesn't work for us (as operate on main thread) and the second doesn't quite work, as we could end up doing work that we consider to be multiple "execution units" within one timer call.

So the thought would be, why not just have an RM_EndExecutionUnit() (I'm not sure one really need a RM_BeginExecutionUnit(), as one could inherently be viewed as starting a new one with the same call). Where RM_EndExecutionUnit() would take care of all current/future side effects we expect to be exposed to the Redis DBs based on the commands executed.

@oranagra
Copy link
Member

oranagra commented Apr 3, 2023

ok, that RM_EndExecutionUnit API is similar to my RM_DrainPropagationQueue (being also combined with RM_HandleUnblockedClients).
the advantage is that it doesn't deal with internals much and just tells redis to do whatever is needed after a client command is processed, and the module doesn't need to know or worry about what these are.
the disadvantage is that if we'll ever want to let the module control more granularity, then it'll become a problem, maybe we can add an int flags argument for future use.

GIL Release should also be viewed as an end the execution unit (as is somewhat clear from it) and should therefore unblock clients as well (already propagates ops)
@sjpotter sjpotter changed the title Add RM_HandleUnblockedClients() API call Add RM_EndExecutionUnit() API call Apr 3, 2023
@sjpotter
Copy link
Contributor Author

sjpotter commented Apr 3, 2023

Updated to new name, also extended GIL unlock to also unblock clients (it already handled propagating pendings ops)

@@ -8420,6 +8438,9 @@ void moduleGILBeforeUnlock() {
* released we have to propagate here). */
exitExecutionUnit();
postExecutionUnitOperations();

if (listLength(server.ready_keys))
handleClientsBlockedOnKeys();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

@sjpotter sjpotter Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should note, I'm wondering if the unblock code should be pushed into postExecutionUnitOperations()

that function already short circuits if server.execution_nesting != 0, and moduleGILBeforeUnlock() essentially asserts if we wouldn't be 0 at this point (i.e. asserts if not 1, and exitExecutionUnit() would then make it 0).

the issue I had with that, is that this would impact all call usages (i.e. also from processCommand() and therefore would cause the unblock code to now execute in a different location).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I also thought it should be there but was afraid moving it there for the same reason you mentioned.

Copy link
Collaborator

@guybe7 guybe7 Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand it feels like a risky change, but unblocking clients at the end of an execution unit is the right thing to do

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should put some effort into investigating the option of adding handleClientsBlockedOnKeys to postExecutionUnitOperations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I want to look into it to see what that means, have to look at all points where it is called from.

src/module.c Outdated
@@ -6461,6 +6461,24 @@ const char *RM_CallReplyProto(RedisModuleCallReply *reply, size_t *len) {
return callReplyGetProto(reply, len);
}

int RM_EndExecutionUnit() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not really ends the execution unit right? It is just flush whatever is pending. Or maybe we can think about it as end the current execution unit and immediately start a new one? So maybe we can call it RM_ExecutionUnitDrain?

Copy link
Member

@oranagra oranagra Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not happy with the name ("drain" may be better, but i'm also not certain about the use of "execution unit").
maybe while writing the doc comment a new name will reveal itself.
also, we're missing the flags argument i suggested so that in the future we'll be able to let the module control what it does (if we'll need to).
and also probably missing a context argument for future use.

src/module.c Outdated
@@ -6461,6 +6461,24 @@ const char *RM_CallReplyProto(RedisModuleCallReply *reply, size_t *len) {
return callReplyGetProto(reply, len);
}

int RM_EndExecutionUnit() {
/* should only execute at top level of nesting, as otherwise would cause changes to impact DB before top level
* expects it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why the nesting level is relevant here? What risk can happened if nesting level is more then one that can not happened if nesting level is 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imagine we have 2 modules that use this API. Module X calls Module Y via an RM_Call (and other things). Module X expects no side effects (i.e. its a transaction until it ends/drains the execution unit). If Module Y is able to drain and cause side effects, module X no longer has that knowledge of what is going on.

as a simple example.

some client elsewhere brpop x

module X

RM_Call(lpush x 1)
RM_Call(module-y command) // nothing to do with 'x'
RM_Call(lrange x 0 -1)
RM_EndExecutionUnit()

module X might expect that since module Y's command doesn't have anything to do with key 'x', that nothing external will occur to change 'x'. but if we can nest and this occurs, we now have a problem.

With that said, one could argue that now with the GIL modification, that same thing could happen, so that expectation can no longer be justified and we can remove that check from here.

i.e. a redis module that runs on the main thread (so doesn't use the GIL) which calls a function on a module that uses threads and hence does use the GIL. taking/releasing the GIL will unblock pending clients.

on the flip side, module Y will actually assert on trying to take the GIL if called in this way, as the assert on server.execution_nesting == 0 will be false, so we can't call a GIL based module from a module that runs on the main thread.

@MeirShpilraien
Copy link

@sjpotter I believe we should add a test for the new API. And also document it?

@sjpotter
Copy link
Contributor Author

sjpotter commented Apr 3, 2023

yes, I'm not sure how to test it at this point in an non contrived manner (i.e. adding a command that does multiple hard coded RM_Call(s) itself).

test currently doesn't work in tcl due to my tcl knowledge limitations
@sjpotter
Copy link
Contributor Author

@oranagra I renamed it, but having trouble with the test I've written as don't know how to handle a ; in tcl correctly. i.e. I added 2 module commands that enable us to pass multiple commands to be issued via rm_call in sequence (separated by ;), one with a flush between them and one without. when I test by hand, it works as expected, but havin trouble getting it to work in the tcl code.

src/module.c Outdated
Comment on lines 6484 to 6494
int RM_FlushExecutionUnit() {
if (server.execution_nesting != 1)
return REDISMODULE_ERR;

propagatePendingCommands();

if (listLength(server.ready_keys))
handleClientsBlockedOnKeys();

return REDISMODULE_OK;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we add handleClientsBlockedOnKeys to postExecutionUnitOperations and just call the latter here?

it feels weird to have just a subset of the post-exec-unit operation done...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would be better, just have to understand all the impacts that could have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think you can do that since handleClientsBlockedOnKeys also calls propagatePendingCommands

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, maybe it requires moving code around, but i think we need to give it a shot

@sjpotter sjpotter changed the title Add RM_EndExecutionUnit() API call Add RM_FlushExecutionUnit() API call Apr 17, 2023
*/
int RM_FlushExecutionUnit() {
int RM_FlushExecutionUnit(RedisModuleCtx *ctx, unsigned int flags) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sadly for that to be useful we need to do a bit more.
first we need to document that flags argument (saying it's for future use).
secondly, i think we need to declare some DEFAULT constant, e.g. like we have for REDISMODULE_BLOCK_UNBLOCK_DEFAULT.
and then we need to decide and document how we handle future backwards compatibility.
i.e. are we gonna error on unsupported flags (in which case we should now error when 0 is used)?
or we can go with something similar to RM_GetModuleOptionsAll().
i think just erroring is enough and is simpler.
@MeirShpilraien @guybe7 WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added something like this, see if it was what you were thinking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts about having an even more explicit API, such as: UnblockPendingClients, that will explicitly unblock and handle pending clients (within the same execution unit).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madolson can you elaborate? i don't understand what you mean, or maybe i have the wrong context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it started off as similar to that (see my calling it of RM_HandleUnblockedClients() above), but was pushed in a more generic direction. but whatever naming people want is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oranagra I suppose two things come to mind. The first is that "flush execution unit" is extremely generic, it's hard to reason about what that it should or will do as we add features to Redis. The second is I don't want to expose the concept of an execution unit to modules, since that is already a hard of enough thought to understand for us internally. Having smaller APIs that are clearer to understand seem better than one big generic one.

Sorry for not reading the entire chain, I saw the conversation about not overloading RM_Call, which I agree with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madolson i think i'd like to make it generic (i.e. execution unit) on purpose, and not start exposing RM_HandleUnblockedClients, RM_FlushPropagationQueue, etc (which are a little bit low level).
instead, i want the module to tell redis: "i'm done" and redis should do what it needs to.
this way, when in the future we add more tasks to be performed between commands, we can add them to that API, and the module / API won't need any change.

I did add a flags input argument so that if in the future we'll wanna let modules trigger just part of the operations, and skip others, we'll be able to do that.
but the default should be to run them all..

Copy link
Contributor

@madolson madolson May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to tell Redis, "I'm done", do we want to expose the ability for modules to more clearly enter and exit execution units? Maybe with a RM_ExecuteSubContext(ctx, contextFn) which executes the contextFn in a separate execution unit. Not sure I really like this suggested API though.

Maybe I'd be okay if we renamed the API to just be something like RM_FlushContext(ctx) which retires the current context and returns a new one. It seems easier to reason about it in terms of context and not the underlying execution units (or maybe it does for me at least). Keeping the flags seems OK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think i like to retire the context (good change the user will keep using it after).
also, the context is used for many other things (not just RM_Call and RM_Replicate).
I do think there's no need for RM_BegingXxxx and RM_EndXxxx since one RM_FlushXxxx can do it, and i do think we want just one generic Flush function with flags that can be extended rather than an API for each thing we need to flush.
@yossigo maybe you have some better ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, the context is used for many other things (not just RM_Call and RM_Replicate).

You know the API and usages better than I do, but it seems better to keep APIs constrained to what we expose today. My main fear is we box ourselves into an API contract we have to maintain. Contexts are at least exposed today, and have semi-well defined lifetimes.

i don't think i like to retire the context (good chance the user will keep using it after).

If only we had moved semantics. There is definitely some hints we can give to users about misuse. This API maybe needs to return an allocated Context instead of a stack allocated one, so that we can fully NULL it out.

@@ -309,6 +309,9 @@ typedef uint64_t RedisModuleTimerID;
* Use RedisModule_GetModuleOptionsAll instead. */
#define _REDISMODULE_OPTIONS_FLAGS_NEXT (1<<4)

/* default set of items to flush in an execution unit */
#define REDISMODULE_FLUSH_FLAG_DEFAULT (1<<0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"FLUSH" is too generic. how about
REDISMODULE_FLUSH_EXEC_UNIT_FLAG_DEFAULT
or
REDISMODULE_EXECUTION_UNIT_FLAG_DEFAULT
i.e. maybe we'll use the later for other execution unit APIs in the future.

src/module.c Outdated
@@ -6479,14 +6479,24 @@ const char *RM_CallReplyProto(RedisModuleCallReply *reply, size_t *len) {
* be executed before the next RM_Call is made. By calling RM_FlushExecutionUnit between RM_Call operations, the
* module will get those semantics.
*
* Currently, the RedisModuleCtx is not used and only NULL should be passed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wouldn't document that it isn't used, and i'd certainly not validate that it is NULL.
the contrary, even if we're not using it, let's validate that it is not NULL.
then, i think it's clear and there's no need to document anything about it.

oranagra
oranagra previously approved these changes Apr 23, 2023
@oranagra oranagra added state:major-decision Requires core team consensus release-notes indication that this issue needs to be mentioned in the release notes approval-needed Waiting for core team approval to be merged labels Apr 23, 2023
@oranagra
Copy link
Member

@redis/core-team please approve or comment.

itamarhaber
itamarhaber previously approved these changes Apr 23, 2023
* - propagating replication/AOF operations
* - executing the commands of clients blocked on keys modified by commands
*
* In general, for modules, execution unit start and end with the taking/release of the GIL. For module code executed
Copy link
Contributor

@madolson madolson May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ended up calling the notification job AddPostNotificaitonJob. There is no concept of execution unit in modules today. I think we need to spend more effort here explaining what an execution unit is. (I have a separate comment about the naming, this comment is assuming we keep the name)

*/
int RM_FlushExecutionUnit() {
int RM_FlushExecutionUnit(RedisModuleCtx *ctx, unsigned int flags) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts about having an even more explicit API, such as: UnblockPendingClients, that will explicitly unblock and handle pending clients (within the same execution unit).

Co-authored-by: Madelyn Olson <[email protected]>
@oranagra
Copy link
Member

we discussed this in a core-team meeting again, and didn't reach a conclusion.
we want to re-consider adding an RM_Call flag to do that.
i see that i already spoke against it here in the past #11993 (comment)
my feeling is that this low level API is ok, but i agree it's ugly.
@redis/core-team please advise.

@madolson
Copy link
Contributor

Do we still need this change with #12159 ? It seems like you should be able to do everything asked for here.

@oranagra
Copy link
Member

Do we still need this change with #12159 ? It seems like you should be able to do everything asked for here.

i don't understand how they're related.
the other one let's the module re-use the per-client state tracking inside redis by switching clients. (i.e. redis will implement WATCH for the module)

@madolson
Copy link
Contributor

@oranagra Couldn't it allow modules to execute commands in batches, which was one of the alternatives mentioned here?

@oranagra
Copy link
Member

@madolson you mean that RM_Call with flush the exec-unit after each call, and if anyone wants to avoid it, they'll create a virtual client and wrap it in MULTI-EXEC? i don't think we can afford that (a breaking change for all modules that use several commands to achieve an atomic job).

or do you mean that if someone wants to flush the exec-unit after each command, they'll create a virtual client, run that one command and destroy it? i think that's a big overhead, and i'd rather add that simple flush API (or even the other alternatives we considered, like an RM_Call flag)

@madolson
Copy link
Contributor

madolson commented Jun 22, 2023

simple flush API

I think you're abusing the term simple here. Simple doesn't just mean simple to implement, but simple to reason about and understand. We could directly expose a lot of APIs directly through the module API to modules developers and pray they understand what they're doing. This sort of gets at the core of a bunch of recent PRs for modules that have been posted. They aren't coherent, they are trying to chase down a bunch of niche problems, and I'm not convinced anyone is thinking about this holistically.

The original command API was developed to allow the creation of commands that could do very complex things, such as call recursive commands. In that context, it made sense for it to execute call which was the mechanism for recursion. It's basically a lua script but written in a custom language. This evolved to threadsafe contexts, which allow basically running a "script" by synchronizing with the main thread. Generally still pretty sensible.

This ask is not running a "script" anymore, since it's no longer atomic. (In a sense this was already broken by https://github.com/redis/redis/pull/11568/files) It's trying to execute a bunch of commands as if was a client and observe the side effects interactively. In one of our earlier meetings I mentioned this seems really like it's trying to act like a client, which I still think is correct. I want us to think of Redis as an in-memory storage engine. In that world, there is no API for "expose side effects", there is an API for "Go execute transaction of commands". That API may block if there is blocking commands, and that is fine, but I don't think the module should have control over the side effects of the transaction. Maybe the better API is to "flush" the context, which closes out all of the pending state associated with it.

Let's talk about #12159.

I want to return to the framing of an in-memory storage engine. The API we expose today use a "context" in which a command is executed in. Contexts come in two flavors, inherited implicitly form the user calling the command, or anonymous contexts which are out of the scope of a clients. Clients are not really relevant to the module, since they are a resource of an external user, and modules interact more with just the storage engine. I don't like that we are adding a new domain object, "client", that is really just a persistent connection to the storage engine that has no real connection to an external client. So I would rather do one of two things:

  1. Really attach the client externally. If this client does a thing, it shows up in slow log. Instead of calling "Set client to context", we call "Create context from client".
  2. Not call it a client, and embrace that it is a long lived context that we will flush periodically. We could create a persistent context, that you can keep doing call on and just have to periodically call "flush".

Hopefully this made some amount of sense.

@oranagra
Copy link
Member

Maybe the better API is to "flush" the context, which closes out all of the pending state associated with it.

that's what we're aiming to achieve here. i thought an explicit API is better than closing and re-opening a context, but i'm also ok with forcing modules who want that to open separate contexts.

Let's talk about #12159.

  1. Not call it a client, and embrace that it is a long lived context that we will flush periodically. We could create a persistent context, that you can keep doing call on and just have to periodically call "flush".

What that PR is trying to achieve is letting the module re-use features redis offers for client (like tracking WATCH, and Multi-exec dirty state). we can certainly abandon that and let the module hook into SignalModifiedKey one way or another, and implement it on the module's side.
Without long living context/client, doing RM_Call for SELECT, WATCH, and anything that just affects the client struct, is meaningless, we can either block them / ignore them, or allow some form of long living clients.
if we wanna hide them inside a long living context, and refrain from using the term client, that's fine with me (although i'm not sure if it simplifies things, or creates confusion).
If we go with the SignalModifiedKey hook, we may need to add other similar features in the future, which could have been implicitly solved by a virtual client (e.g. think about CSC, this can let the module re-use redis logic with minimal effort on either side).

@madolson
Copy link
Contributor

madolson commented Jun 25, 2023

that's what we're aiming to achieve here. i thought an explicit API is better than closing and re-opening a context, but i'm also ok with forcing modules who want that to open separate contexts.

The other thing is that we shouldn't in the future allow partial completion of a transaction. In the past we discussed only executing certain operations, like unblocking. You shouldn't be able to partially observe parts of a transaction unless we're doing something like changing the consistency semantics.

@oranagra
Copy link
Member

@madolson i'm sorry, i don't follow your last post.
it seems that you're just pointing out something you already explained earlier, so i don't want to have you repeat that, but maybe you can point me to it, or help me understand what you mean.

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oranagra Instead of responding, I commented on the CR.

Comment on lines +6510 to +6514
/* Redis only exposes "side effects" of command execution at the end of an execution unit.
*
* Side effects currently handled are
* - propagating replication/AOF operations
* - executing the commands of clients blocked on keys modified by commands
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer we say, and mentally model, this API like:

/* Redis treats multiple invocations of RM_Call() to the same context as if they were executed in the same transaction. Effects
* of these commands are not visible to other clients or visible to the server until the transaction has been completed. This function
* allows completing the transaction without tearing down the context, allowing additional commands to be issued in a new transaction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your text seems fine to me.
i would prefer to also list the actual actions (while mentioning that the list may change between versions), i think it makes it easier to understand. but if you're not comfortable with this i'm ok to drop them (people who care or are curious, can easily check the implementation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my concern is I don't want to commit to something that may change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't want to commit either.. i just thought that listing them makes it easier to understand.
but i'm ok with not listing them.

* - `REDISMODULE_FLUSH_EXEC_UNIT_FLAG_DEFAULT`: We propagate pending command operations (AOF/Replication) and unblock
* clients
*/
int RM_FlushExecutionUnit(RedisModuleCtx *ctx, unsigned int flags) {
Copy link
Contributor

@madolson madolson Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int RM_FlushExecutionUnit(RedisModuleCtx *ctx, unsigned int flags) {
int RM_CompleteTransaction(RedisModuleCtx *ctx, unsigned int flags) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm certainly ok with the term "Pending" instead of "ExecutionUnit", but i think that "commands" could be confusing.
someone can think that this generates commands, where in fact it's the side effects of the commands already executed.
maybe "FlushPendingEffects"?

Copy link
Contributor

@madolson madolson Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could live with FlushPendingEffects. It seems to be the right intention. I like your description of the fact that it's really just showing the effects of existing commands. Would FlushContext or CompleteTransaction be too confusing? I feel like you focus on the fact that this uncovers side effects, where I would it to focus on the fact that we have finished a block of commands.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think FlushContext is too generic, context is used for so many things.

CompleteTransacton seems ok to me (p.s. it calls propagatePendingCommands which will wrap what you did with multi-exec).
i'm a little bit uncomfortable with using "complete" since it might suggest that we needed to explicitly "start" it too, but FlushTransaction is not much better.. maybe there's a better word, but if not, i'm ok with RM_CompleteTransacton.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with RM_CompleteTransaction.

Copy link
Contributor

@madolson madolson Jul 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I spent some time thinking, I can't come up with anything else.

Comment on lines +6530 to +6531
* - `REDISMODULE_FLUSH_EXEC_UNIT_FLAG_DEFAULT`: We propagate pending command operations (AOF/Replication) and unblock
* clients
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* - `REDISMODULE_FLUSH_EXEC_UNIT_FLAG_DEFAULT`: We propagate pending command operations (AOF/Replication) and unblock
* clients
* - `REDISMODULE_COMPLETE_TRANSACTION_FLAG_DEFAULT`: We propagate pending command operations (AOF/Replication) and process all side effects as if these commands were sent as a multi/exec transaction.

@oranagra
Copy link
Member

This was discussed in a core-team meeting and was conceptually approved (after applying the changes suggested above)

*/
int RM_FlushExecutionUnit(RedisModuleCtx *ctx, unsigned int flags) {
if (server.execution_nesting != 1)
return REDISMODULE_ERR;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this check is here to allow this function only from the top level. Should we make this check server.execution_nesting > 1 ? Looks like it is possible to reach here when exectuion_nesting = 0. I call this function while I'm in aux_load() callback and it fails as execution_nesting is zero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we wanna allow that from rdb loading. Same goes for RM_Call. What's the use case?

@CLAassistant
Copy link

CLAassistant commented Mar 24, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ oranagra
❌ sjpotter
You have signed the CLA already but the status is still pending? Let us recheck it.

@sjpotter sjpotter dismissed stale reviews from itamarhaber and oranagra via 87efd04 April 1, 2024 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

9 participants