Skip to content

Native function support #211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Conversation

rnett
Copy link
Contributor

@rnett rnett commented Feb 6, 2021

Adds support for creating TF_Functions from graphs and executing them in eager and graph mode.

I'm undecided as to whether I should keep the positional GraphFunction or just make the base function use named arguments. Thoughts @karllessard?

I'm also considering making the create methods automatically find the minimal subgraph for the passed inputs and outputs (and throw if that's impossible, as the function would break anyways). This would help with gradient handling. I'd add another overload to create everything (which could be used by ConcreteFunction), and possibly ones to detect inputs and outputs from the graph.

Before merging, I need to handle gradients for graph mode (it doesn't seem to need to be specified in eager mode, I need to check how Python does it), and add overloads with control outputs.

@rnett
Copy link
Contributor Author

rnett commented Feb 6, 2021

Notes to self:

Assuming that eager gradients is done by calling SymbolicGradient, I probably want an option to do the gradient calculation when the function is created (like for graph) and use that function in eager, rather than re-calculating it each time. Depends on what gradient tape looks like.

*
* Functions can also be created from {@link ConcreteFunction}s using {@link #create(ConcreteFunction, boolean)}.
*/
public class GraphFunction implements AutoCloseable {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GraphFunction is a bit confusing as a name, it sounds like it is a function attached to a Graph which is not necessarily a case. I'd turn this over to FunctionGraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, though lets have the "merge with ConcreteFunction" discussion first. I didn't want to use FunctionGraph because Python has a FuncGraph that subclasses Graph, and I'm not sure if we'll end up needing one for further Function support (I'd prefer a subclass of Ops, which should handle most of the same stuff).

if (op != null) {
outputs.add(op.output(index));
}
Operation op = graph.operationOrError(operation);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break current behavior by throwing an error where it was silently failing before... but I think it's for the best, so let's keep it that way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't actually change things, I just copied the Session function to Graph. The null check there was extraneous. But I agree about the behavior, I'll note it in the docs.

* @return the outputs of the function
*/
@Endpoint(name = "callFunction")
public static List<Operand<?>> callFunction(Scope scope, GraphFunction function, List<Operand<?>> inputs) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply call these endpoints call? I like the simplicity of tf.call(function, x).get("y")

The function instance type is enough to differentiate the different signatures

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had issues with call(NamedGraphFunction, List<>), in that I'd like to have it return a Map using the named outputs even if named inputs aren't used, but that collides with call(GraphFunction, List<>) which returns a List. Let's delay this till after the merging w/ ConcreteFunction conversation?

TF_Output handles = new TF_Output(operands.size());
for (int i = 0; i < operands.size(); i++) {
Operand<?> input = operands.get(i);
//TODO body.checkInput(input)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain more in detail here what is left TODO?

Copy link
Contributor Author

@rnett rnett Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from #207, I'd originally written this in that branch. I'll probably need to generalize the "handle input from another environment" anyways to handle closure captures in functions, at some point, and we'd want that to apply here, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm expecting it to get merged before this branch anyways, in which case I'll uncomment it.

*
* @param body the graph to use as the function body
* @param name the name of the function
* @param appendHash whether to append the has of the signature to the name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

".. the hash..."

btw, just curious, why do we let the user decide to append a hash or not to the signature name? Can we take this decision for him?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean in favor of always appending it, but I could see a case where you want to create a function with a specific name, so I left it in. It's definitely necessary most of the time to avoid name collisions for functions with different arguments (aka overloading support).

* @param function the {@code ConcreteFunction} to create the function from
* @param appendHash whether to append a hash of the signature to the name
*/
public static NamedGraphFunction create(ConcreteFunction function, boolean appendHash) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand why do we need both NamedGraphFunction and ConcreteFunction, they seem to achieve pretty much the same goal, which is to provided named inputs/outputs on an encapsulated function, right?

In the Python paradigm, a Function is a method that wraps an executable graph (or eager code) while a ConcreteFunction is the realization of this function for specific type of input arguments (since Python is non-typed, the function is realized (or "concretized") when its invoke for the first time with a given set of input types.

In Java, as everything is typed, we might not need to make this distinction. So why not just calling it simply Function and merge both classes together? The only difference I see is that one runs a main Graph (using a session) while the other runs a secondary graph (using a ops). But there might be a way to abstract this to the user.

Copy link
Contributor Author

@rnett rnett Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much, yeah, but NamedGraphFunction has to manage the TF_Function handle and doesn't require a signature. ConcreteFunction also allows for Tensor execution. As for function, see #205, this is generally modeling Python's DefinedFunction/_EagerDefinedFunction/defun, which is separate from the Function class.

I could look into merging them, but it would make it a bit harder to declare the GraphFunction. Would you be in favor of getting rid of the positional GraphFunction altogether and merging NamedGraphFunction with ConcreteFunction?

I'd want to keep a few of the positional function creator methods around somewhere, for gradient functions if nothing else. Probably package private though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm for merging everything together if that can simplify the API for the user. Or we can continue to segregate the logic into multiple classes but only one should be public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When trying to make GraphFunction (now called DefinedFunction) package local, I ran into the issue where I'd like to have a Ops.call(ConcreteFunction, Map<String, Operand<?>) function, but the necessary methods (and DefinedFunction) won't be accessible there. There's a few options, as far as I can see:

  1. Only have a call method on ConcreteFunction, like ConcreteFunction.call(Ops tf, Map<String, Operand<?>).
  2. Have a @Operator Function class in the main package
  3. Have a public call method on ConcreteFunction that takes Scope instead of Ops, call that from the operator class
  4. Move ConcreteFunction, DefinedFunction, the Operator class, and any future function classes to a new package (function, presumably).

4 is my preference, thoughts?

* TODO implement
*/
TF_Function gradNativeHandle() {
return null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how to implement this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to do some tests, but for Graph I'm thinking I'd call the existing gradient op on it, and then construct a function using the op list constructor. Eager is up in the air until we get gradient tape support.

@karllessard
Copy link
Collaborator

Hi @rnett , I'm really happy that you tackle this feature that was in my backlog for too long now, let's get it done. I lean to an approach that would abstract as much as possible all the implementation details to the user, whether the function is a graph, a subgraph or code executed eagerly. We'll see where it goes in the scope of this PR

@rnett
Copy link
Contributor Author

rnett commented Feb 9, 2021

Yeah, I very much agree, I didn't intend to put that in this PR, just the native support. I've put my thoughts on it in #205.

@rnett rnett force-pushed the rn_graph_functions branch from 9810b26 to 5f37483 Compare February 10, 2021 03:24
*
* Functions can also be created from {@link ConcreteFunction}s using {@link #create(ConcreteFunction, boolean)}.
*/
public class DefinedFunction implements AutoCloseable {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced that we need this extra class as part of the public API. Can't we have everything fronted by ConcreteFunction, eventually changing it to an interface that can have multiple implementations?

Copy link
Contributor Author

@rnett rnett Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I'm running to access modifiers (and lack of internal):

When trying to make DefinedFunction package local, I ran into the issue where I'd like to have a Ops.call(ConcreteFunction, Map<String, Operand<?>) function, but the necessary methods (and DefinedFunction) won't be accessible there. There's a few options, as far as I can see:

  1. Only have a call method on ConcreteFunction, like ConcreteFunction.call(Ops tf, Map<String, Operand<?>).
  2. Have a @Operator Function class in the main package
  3. Have a public call method on ConcreteFunction that takes Scope instead of Ops, call that from the operator class
  4. Move ConcreteFunction, DefinedFunction, the Operator class, and any future function classes to a new package (function, presumably).

4 is my preference, thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So like we discussed, let's add endpoints to ConcreteFunction that accepts a Scope object and do the actual job. Anyway, most users will always go through the ops so it's ok to have these kind of endpoints in the public API of the core classes even if they are more meant for internal usage.

We also agreed that only ConcreteFunction should remain and probably be converted to an interface that has 2 different possible implementations: one with a graph (like the actual one) and one with a function graph (the new one). Maybe add these implementations in the org.tensorflow.internal package?

And last, can we check if it will be possible to build ConcreteFunction instances out of function graphs found in a saved model that has been loaded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started on this, and as far as I can tell, the only thing ConcreteFunction provides that the native functions doesn't is the ability to call with Tensors, and even that's relocatable by using an temporary EagerSession. So do we even need it? The Signature builders can be ported just fine. The only thing I could think of that would cause issues is Saving, which I haven't looked at enough to be able to tell. If I understand it right it puts everything under a default named function anyways, so we could just create a graph and add the function to it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| So do we even need it?

The goal is that we don't! Maybe I'll preserve ConcreteFunction as the name for the common function interface since it is still accurate without making breaking changes again. But if you think it is simple to do everything within a single class instead of one interface and two implementations, you can give it a try and I'll take a look at the PR but I have slightly a preference for the latter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I'll start it without saving, and figure that out after. I'm also not sure how variables are handled within TF_Function, which may cause some differences from ConcreteFunction (especially wrt saving), but I suspect that resource variables will be handled just fine, it's only the legacy ones that will have issues, so imo we don't need to worry about it.

@rnett rnett force-pushed the rn_graph_functions branch from 5f37483 to f215237 Compare February 18, 2021 00:19
Copy link
Collaborator

@karllessard karllessard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rnett , I like where this is going. I'm concerned a bit about the support in the saved model though, for which reason the resource ownership of a ConcreteFunction was added for (and removed in this PR). I can give you more context about this if you want.

Also, I have a feeling that the size of this PR has increased dramatically (did it?), if that the case, is everything in there required to add this new functionality for native functions or there is a mix of other things as well? Reducing the size of the PRs helps a lot with the review process. Thanks!

@@ -39,16 +57,87 @@
*/
public class ConcreteFunction implements AutoCloseable {

private static TF_Operation outputHandle(Operand<?> operand) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we move the private parts at the bottom of the class? It's the Google way to do it and I kind of like it now :)

private final Signature signature;
private final Ownership ownership;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the ownership part? It is quite important to prevent unexpected behavior when closing a function that comes from a saved model vs. one that has been instantiated manually.

Tensor tensor = arguments.get(argName);
if (tensor == null) {
throw new IllegalArgumentException(String.format("Missing argument [%s]", argName));
public Map<String, Operand<?>> call(Scope scope,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc?

// session = function.session();
// } else if (session != function.session()) {
// throw new UnsupportedOperationException("Saving multiple functions with different graphs/sessions is not supported yet.");
// }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't merge this PR until we support them in saved models properly

* the function, they mean the type.
*/
@Operator(name = "call")
public abstract class Function {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this class to FunctionHelper or something like that, to prevent clashing with java.util.Function?

// }
// assertThrows(IllegalStateException.class, () -> s.run(Init.DEFAULT_NAME));
// g.toGraphDef(); // check that graph is still valid
// }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix these tests before merging

@rnett rnett marked this pull request as draft February 18, 2021 22:31
@rnett
Copy link
Contributor Author

rnett commented Feb 18, 2021

Thanks @rnett , I like where this is going. I'm concerned a bit about the support in the saved model though, for which reason the resource ownership of a ConcreteFunction was added for (and removed in this PR). I can give you more context about this if you want.

Also, I have a feeling that the size of this PR has increased dramatically (did it?), if that the case, is everything in there required to add this new functionality for native functions or there is a mix of other things as well? Reducing the size of the PRs helps a lot with the review process. Thanks!

I didn't think you'd get to reviewing it yet, parts are still WIP (tests and saving). I'll mark it as draft until they are done.

The size hasn't increased much, I think it's mostly bad diffs and perhaps some rebase issues, plus auto-formatting files that weren't previously (Graph mostly, it looks like). I can pull some parts out into an earlier PR. The only large part I added recently was tree-walking functions on GraphOperation, which is necessary for building the new functions (since they may be a subset of a graph).

Context on saving would be appreciated, I'm going to take a look at how python does it. If I understand it right, both the function native instances and the resource variables are not associated with any specific session or graph (although I need to test that for variables), so I can use a graph and session for loading and then discard them. We're going to need some kind of variable lifting anyways for @tf.function style stuff, so if the variables are session local I'd like to design and use something like that here, rather than duplicating it. Worst case scenario I can add a session back to ConcreteFunction with ownership, but I don't understand how that works with attaching the function to the graph (need to test that too). I.e. if I make a function and attach it to two different graphs, do they share variables. That's also probably different for resource and non-resource variables.

How much do we need to support non-resource variables, anyways? It seems like resource variables were introduced largely to make the functional stuff easier. Of course, we don't have access to them yet, but we should soon-ish (my gradient PR is sitting in their review queue).

@rnett
Copy link
Contributor Author

rnett commented Mar 5, 2021

Replaced by #233.

@rnett rnett closed this Mar 5, 2021
@rnett rnett deleted the rn_graph_functions branch October 17, 2021 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants