Native function support #211

rnett · 2021-02-06T02:49:43Z

Adds support for creating TF_Functions from graphs and executing them in eager and graph mode.

I'm undecided as to whether I should keep the positional GraphFunction or just make the base function use named arguments. Thoughts @karllessard?

I'm also considering making the create methods automatically find the minimal subgraph for the passed inputs and outputs (and throw if that's impossible, as the function would break anyways). This would help with gradient handling. I'd add another overload to create everything (which could be used by ConcreteFunction), and possibly ones to detect inputs and outputs from the graph.

Before merging, I need to handle gradients for graph mode (it doesn't seem to need to be specified in eager mode, I need to check how Python does it), and add overloads with control outputs.

rnett · 2021-02-06T03:33:32Z

Notes to self:

Look into partitioned_call, https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/functional_ops.py#L1126-L1126
SymbolicGradient for eager mode function gradients: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/gradients_util.py#L308-L308
See also https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/eager/function.py#L907-L907
TF_AddGradients seems to call it.
proto: https://github.com/tensorflow/tensorflow/blob/5bace7f1ca1dc43da4534df9f2e55d0f270601cf/tensorflow/core/ops/ops.pbtxt#L53586-L53586

Assuming that eager gradients is done by calling SymbolicGradient, I probably want an option to do the gradient calculation when the function is created (like for graph) and use that function in eager, rather than re-calculating it each time. Depends on what gradient tape looks like.

karllessard · 2021-02-08T14:04:50Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/GraphFunction.java

+ *
+ * Functions can also be created from {@link ConcreteFunction}s using {@link #create(ConcreteFunction, boolean)}.
+ */
+public class GraphFunction implements AutoCloseable {


GraphFunction is a bit confusing as a name, it sounds like it is a function attached to a Graph which is not necessarily a case. I'd turn this over to FunctionGraph.

I agree, though lets have the "merge with ConcreteFunction" discussion first. I didn't want to use FunctionGraph because Python has a FuncGraph that subclasses Graph, and I'm not sure if we'll end up needing one for further Function support (I'd prefer a subclass of Ops, which should handle most of the same stuff).

karllessard · 2021-02-08T14:07:05Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Session.java

-      if (op != null) {
-        outputs.add(op.output(index));
-      }
+      Operation op = graph.operationOrError(operation);


This will break current behavior by throwing an error where it was silently failing before... but I think it's for the best, so let's keep it that way

It won't actually change things, I just copied the Session function to Graph. The null check there was extraneous. But I agree about the behavior, I'll note it in the docs.

...core/tensorflow-core-api/src/main/java/org/tensorflow/internal/c_api/presets/tensorflow.java

karllessard · 2021-02-08T14:12:39Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/op/core/Function.java

+   * @return the outputs of the function
+   */
+  @Endpoint(name = "callFunction")
+  public static List<Operand<?>> callFunction(Scope scope, GraphFunction function, List<Operand<?>> inputs) {


Why not simply call these endpoints call? I like the simplicity of tf.call(function, x).get("y")

The function instance type is enough to differentiate the different signatures

I had issues with call(NamedGraphFunction, List<>), in that I'd like to have it return a Map using the named outputs even if named inputs aren't used, but that collides with call(GraphFunction, List<>) which returns a List. Let's delay this till after the merging w/ ConcreteFunction conversation?

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/GraphFunction.java

karllessard · 2021-02-08T14:23:06Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/GraphFunction.java

+    TF_Output handles = new TF_Output(operands.size());
+    for (int i = 0; i < operands.size(); i++) {
+      Operand<?> input = operands.get(i);
+      //TODO body.checkInput(input)


can you explain more in detail here what is left TODO?

It's from #207, I'd originally written this in that branch. I'll probably need to generalize the "handle input from another environment" anyways to handle closure captures in functions, at some point, and we'd want that to apply here, too.

I'm expecting it to get merged before this branch anyways, in which case I'll uncomment it.

karllessard · 2021-02-08T14:24:59Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/GraphFunction.java

+   *
+   * @param body the graph to use as the function body
+   * @param name the name of the function
+   * @param appendHash whether to append the has of the signature to the name


".. the hash..."

btw, just curious, why do we let the user decide to append a hash or not to the signature name? Can we take this decision for him?

I'd lean in favor of always appending it, but I could see a case where you want to create a function with a specific name, so I left it in. It's definitely necessary most of the time to avoid name collisions for functions with different arguments (aka overloading support).

karllessard · 2021-02-08T14:30:20Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/GraphFunction.java

+   * @param function the {@code ConcreteFunction} to create the function from
+   * @param appendHash whether to append a hash of the signature to the name
+   */
+  public static NamedGraphFunction create(ConcreteFunction function, boolean appendHash) {


I'm not sure to understand why do we need both NamedGraphFunction and ConcreteFunction, they seem to achieve pretty much the same goal, which is to provided named inputs/outputs on an encapsulated function, right?

In the Python paradigm, a Function is a method that wraps an executable graph (or eager code) while a ConcreteFunction is the realization of this function for specific type of input arguments (since Python is non-typed, the function is realized (or "concretized") when its invoke for the first time with a given set of input types.

In Java, as everything is typed, we might not need to make this distinction. So why not just calling it simply Function and merge both classes together? The only difference I see is that one runs a main Graph (using a session) while the other runs a secondary graph (using a ops). But there might be a way to abstract this to the user.

Pretty much, yeah, but NamedGraphFunction has to manage the TF_Function handle and doesn't require a signature. ConcreteFunction also allows for Tensor execution. As for function, see #205, this is generally modeling Python's DefinedFunction/_EagerDefinedFunction/defun, which is separate from the Function class.

I could look into merging them, but it would make it a bit harder to declare the GraphFunction. Would you be in favor of getting rid of the positional GraphFunction altogether and merging NamedGraphFunction with ConcreteFunction?

I'd want to keep a few of the positional function creator methods around somewhere, for gradient functions if nothing else. Probably package private though.

Yes, I'm for merging everything together if that can simplify the API for the user. Or we can continue to segregate the logic into multiple classes but only one should be public.

When trying to make GraphFunction (now called DefinedFunction) package local, I ran into the issue where I'd like to have a Ops.call(ConcreteFunction, Map<String, Operand<?>) function, but the necessary methods (and DefinedFunction) won't be accessible there. There's a few options, as far as I can see:

Only have a call method on ConcreteFunction, like ConcreteFunction.call(Ops tf, Map<String, Operand<?>).

Have a @Operator Function class in the main package

Have a public call method on ConcreteFunction that takes Scope instead of Ops, call that from the operator class

Move ConcreteFunction, DefinedFunction, the Operator class, and any future function classes to a new package (function, presumably).

4 is my preference, thoughts?

karllessard · 2021-02-08T14:32:27Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/GraphFunction.java

+   * TODO implement
+   */
+  TF_Function gradNativeHandle() {
+    return null;


Any idea how to implement this?

I need to do some tests, but for Graph I'm thinking I'd call the existing gradient op on it, and then construct a function using the op list constructor. Eager is up in the air until we get gradient tape support.

karllessard · 2021-02-08T14:41:27Z

Hi @rnett , I'm really happy that you tackle this feature that was in my backlog for too long now, let's get it done. I lean to an approach that would abstract as much as possible all the implementation details to the user, whether the function is a graph, a subgraph or code executed eagerly. We'll see where it goes in the scope of this PR

rnett · 2021-02-09T00:48:12Z

Yeah, I very much agree, I didn't intend to put that in this PR, just the native support. I've put my thoughts on it in #205.

karllessard · 2021-02-11T03:43:47Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/DefinedFunction.java

+ *
+ * Functions can also be created from {@link ConcreteFunction}s using {@link #create(ConcreteFunction, boolean)}.
+ */
+public class DefinedFunction implements AutoCloseable {


I'm still not convinced that we need this extra class as part of the public API. Can't we have everything fronted by ConcreteFunction, eventually changing it to an interface that can have multiple implementations?

I agree, but I'm running to access modifiers (and lack of internal):

When trying to make DefinedFunction package local, I ran into the issue where I'd like to have a Ops.call(ConcreteFunction, Map<String, Operand<?>) function, but the necessary methods (and DefinedFunction) won't be accessible there. There's a few options, as far as I can see:

Only have a call method on ConcreteFunction, like ConcreteFunction.call(Ops tf, Map<String, Operand<?>).

Have a @Operator Function class in the main package

Have a public call method on ConcreteFunction that takes Scope instead of Ops, call that from the operator class

Move ConcreteFunction, DefinedFunction, the Operator class, and any future function classes to a new package (function, presumably).

4 is my preference, thoughts?

So like we discussed, let's add endpoints to ConcreteFunction that accepts a Scope object and do the actual job. Anyway, most users will always go through the ops so it's ok to have these kind of endpoints in the public API of the core classes even if they are more meant for internal usage.

We also agreed that only ConcreteFunction should remain and probably be converted to an interface that has 2 different possible implementations: one with a graph (like the actual one) and one with a function graph (the new one). Maybe add these implementations in the org.tensorflow.internal package?

And last, can we check if it will be possible to build ConcreteFunction instances out of function graphs found in a saved model that has been loaded?

I've started on this, and as far as I can tell, the only thing ConcreteFunction provides that the native functions doesn't is the ability to call with Tensors, and even that's relocatable by using an temporary EagerSession. So do we even need it? The Signature builders can be ported just fine. The only thing I could think of that would cause issues is Saving, which I haven't looked at enough to be able to tell. If I understand it right it puts everything under a default named function anyways, so we could just create a graph and add the function to it.

| So do we even need it?

The goal is that we don't! Maybe I'll preserve ConcreteFunction as the name for the common function interface since it is still accurate without making breaking changes again. But if you think it is simple to do everything within a single class instead of one interface and two implementations, you can give it a try and I'll take a look at the PR but I have slightly a preference for the latter.

Will do. I'll start it without saving, and figure that out after. I'm also not sure how variables are handled within TF_Function, which may cause some differences from ConcreteFunction (especially wrt saving), but I suspect that resource variables will be handled just fine, it's only the legacy ones that will have issues, so imo we don't need to worry about it.

Signed-off-by: Ryan Nett <[email protected]>

…pparently forgot) Signed-off-by: Ryan Nett <[email protected]>

Signed-off-by: Ryan Nett <[email protected]>

karllessard

Thanks @rnett , I like where this is going. I'm concerned a bit about the support in the saved model though, for which reason the resource ownership of a ConcreteFunction was added for (and removed in this PR). I can give you more context about this if you want.

Also, I have a feeling that the size of this PR has increased dramatically (did it?), if that the case, is everything in there required to add this new functionality for native functions or there is a mix of other things as well? Reducing the size of the PRs helps a lot with the review process. Thanks!

karllessard · 2021-02-18T14:04:50Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/ConcreteFunction.java

@@ -39,16 +57,87 @@
 */
 public class ConcreteFunction implements AutoCloseable {

+  private static TF_Operation outputHandle(Operand<?> operand) {


Nit: can we move the private parts at the bottom of the class? It's the Google way to do it and I kind of like it now :)

karllessard · 2021-02-18T14:17:56Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/ConcreteFunction.java

  private final Signature signature;
-  private final Ownership ownership;


Why did you remove the ownership part? It is quite important to prevent unexpected behavior when closing a function that comes from a saved model vs. one that has been instantiated manually.

karllessard · 2021-02-18T14:21:06Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/ConcreteFunction.java

-      Tensor tensor = arguments.get(argName);
-      if (tensor == null) {
-        throw new IllegalArgumentException(String.format("Missing argument [%s]", argName));
+  public Map<String, Operand<?>> call(Scope scope,


karllessard · 2021-02-18T14:30:44Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/SavedModelBundle.java

+//        session = function.session();
+//      } else if (session != function.session()) {
+//        throw new UnsupportedOperationException("Saving multiple functions with different graphs/sessions is not supported yet.");
+//      }


We can't merge this PR until we support them in saved models properly

karllessard · 2021-02-18T14:33:44Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/op/core/Function.java

+ * the function, they mean the type.
+ */
+@Operator(name = "call")
+public abstract class Function {


Can we rename this class to FunctionHelper or something like that, to prevent clashing with java.util.Function?

karllessard · 2021-02-18T14:34:09Z

tensorflow-core/tensorflow-core-api/src/test/java/org/tensorflow/ConcreteFunctionTest.java

+//      }
+//      assertThrows(IllegalStateException.class, () -> s.run(Init.DEFAULT_NAME));
+//      g.toGraphDef();  // check that graph is still valid
+//    }


Need to fix these tests before merging

rnett · 2021-02-18T22:42:12Z

Thanks @rnett , I like where this is going. I'm concerned a bit about the support in the saved model though, for which reason the resource ownership of a ConcreteFunction was added for (and removed in this PR). I can give you more context about this if you want.

Also, I have a feeling that the size of this PR has increased dramatically (did it?), if that the case, is everything in there required to add this new functionality for native functions or there is a mix of other things as well? Reducing the size of the PRs helps a lot with the review process. Thanks!

I didn't think you'd get to reviewing it yet, parts are still WIP (tests and saving). I'll mark it as draft until they are done.

The size hasn't increased much, I think it's mostly bad diffs and perhaps some rebase issues, plus auto-formatting files that weren't previously (Graph mostly, it looks like). I can pull some parts out into an earlier PR. The only large part I added recently was tree-walking functions on GraphOperation, which is necessary for building the new functions (since they may be a subset of a graph).

Context on saving would be appreciated, I'm going to take a look at how python does it. If I understand it right, both the function native instances and the resource variables are not associated with any specific session or graph (although I need to test that for variables), so I can use a graph and session for loading and then discard them. We're going to need some kind of variable lifting anyways for @tf.function style stuff, so if the variables are session local I'd like to design and use something like that here, rather than duplicating it. Worst case scenario I can add a session back to ConcreteFunction with ownership, but I don't understand how that works with attaching the function to the graph (need to test that too). I.e. if I make a function and attach it to two different graphs, do they share variables. That's also probably different for resource and non-resource variables.

How much do we need to support non-resource variables, anyways? It seems like resource variables were introduced largely to make the functional stuff easier. Of course, we don't have access to them yet, but we should soon-ish (my gradient PR is sitting in their review queue).

rnett · 2021-03-05T01:05:53Z

Replaced by #233.

karllessard requested changes Feb 8, 2021

View reviewed changes

rnett force-pushed the rn_graph_functions branch from 9810b26 to 5f37483 Compare February 10, 2021 03:24

karllessard reviewed Feb 11, 2021

View reviewed changes

rnett added 5 commits February 16, 2021 18:08

Initial native function support

7e8c9c7

Signed-off-by: Ryan Nett <[email protected]>

Formatting, comments, and output bounds checking (and the codegen I a…

106b407

…pparently forgot) Signed-off-by: Ryan Nett <[email protected]>

Fold name handling into ConcreteFunction, fix tests

195096d

Signed-off-by: Ryan Nett <[email protected]>

Not sure why this wasn't caught

f8f5087

Signed-off-by: Ryan Nett <[email protected]>

Start of function refactor

f215237

Signed-off-by: Ryan Nett <[email protected]>

rnett force-pushed the rn_graph_functions branch from 5f37483 to f215237 Compare February 18, 2021 00:19

karllessard requested changes Feb 18, 2021

View reviewed changes

rnett marked this pull request as draft February 18, 2021 22:31

This was referenced Mar 4, 2021

Add graph walking functions to Graph and GraphOperation #232

Merged

Native functions v2 #233

Merged

rnett closed this Mar 5, 2021

rnett deleted the rn_graph_functions branch October 17, 2021 00:44

		private final Signature signature;
		private final Ownership ownership;

Native function support #211

Native function support #211

Uh oh!

Conversation

rnett commented Feb 6, 2021

Uh oh!

rnett commented Feb 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rnett Feb 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rnett Feb 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karllessard commented Feb 8, 2021

Uh oh!

rnett commented Feb 9, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rnett Feb 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karllessard left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rnett commented Feb 6, 2021 •

edited

Loading

rnett Feb 9, 2021 •

edited

Loading

rnett Feb 9, 2021 •

edited

Loading

rnett Feb 11, 2021 •

edited

Loading