Skip to content

Attach protobuf generated classes to TF Core API #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 11, 2020

Conversation

karllessard
Copy link
Collaborator

@karllessard karllessard commented Apr 7, 2020

The main purpose of this PR is to generate Java classes from TensorFlow core protocol buffer descriptions and to distribute them directly from the tensorflow-core-api artifact. This way, the API can use concrete typed classes in the signature of some methods accepting TF protocol buffers instead of raw byte arrays.

The classes are generated from the Bazel build, using the same protobuf compiler used to build TensorFlow native library, which matches the version of the protobuf Maven dependency. The generated classes are then unarchived in the /src/gen/java folder of the tensorflow-core-api for building the Java library.

A few other changes were also applied as side-effects of this feature:

  • Upgrade version of TensorFlow runtime library to 2.2.0-rc2
    • requires bazel = 2.0.0
    • this increase the size of this PR considerably but was required to catch up with new protos
  • Patch protobuf definitions that were missing the java_package option
  • Add a utility java_api_import to help select the right package/group for new operations
    • ok this is really just a little helper that gives hints on where maybe that operations has landed in the Python client, no rocket science here... but it is still very useful and for a moment I thought I lost it so I prefer to put it now in a safe place :) Don't spend too much time reviewing it.
  • Classify new ops and move some debugging ops to their own group/package debugging
  • Removed GRPC patch for Ubuntu (TF uses a new version of GRPC that shouldn't have this issue)
    • @dhruvrajan can you please validate this still compiles for you?

To see the impact of including the protobuf in the Java client, look at changes applied to the src/main/java/org/tensorflow classes (e.g. Graph, Session, EagerSession, SavedModelBundle, Server...)

Ref: #21

<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>${protobuf.version}</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the original protobuf artifact as is will cause issues with other libraries that need a different version of protobuf, so this should all be shaded. If the plan is to wait until users having this problem show up, is it going to be OK to break backward compatibility at that time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So embedding it using Maven shade plugin? I can give it a try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, we need to use a patched version of protobuf that can generate code that depends on shaded classes, like this one: https://github.com/os72/protoc-jar-maven-plugin

But like I said it's not important to do that right away if we plan on breaking compatibility at some point in the future anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that might require more changes then because right now, I let Bazel generating the Java protobuf classes and just match the version used in Maven, as I did not find how to export only the .proto files out of TensorFlow.

I also found out this, for what it is worth. Looks like there is already some org.tensorflow artifacts that shades protobuf in a different way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we can let users do it themselves when required...

@@ -0,0 +1,317 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be translated to Java? And if not, why not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I mentioned in the description of this PR, this is just an old utility I've written awhile ago when classifying the 900+ initial operation wrappers and I just put it there so it is not lost, as I still find it quite useful.

Now could it be rewritten in Java... I guess so, but with a lot of work as it depends on TF internal classes. I'm not planning to spend time on this anytime soon so if we think it does not fit in our repo, then I'll simply remove it from this PR and continue to keep it in my fork.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this code actually gets used during the build? If so, I suppose we can leave it here...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I only use it when I discover that new operations have been added after a TF upgrade, so I can classify them properly by configuring the API definitions properly before rebuilding. It just gives me some hints in which package they should be located based on Python endpoints that has a similar name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, so let's add the step to compile it to the build.sh script or something while we're at it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already there, see here and here

@Craigacp
Copy link
Collaborator

Craigacp commented Apr 7, 2020

Do we want the protobufs in the core artifact? Might be nice if they were packaged separately as they are conceptually different from the core jar, plus they wander all over our package namespace. That said, packaging them separately would cause module system issues later.

@karllessard
Copy link
Collaborator Author

karllessard commented Apr 7, 2020

Do we want the protobufs in the core artifact? Might be nice if they were packaged separately as they are conceptually different from the core jar, plus they wander all over our package namespace. That said, packaging them separately would cause module system issues later.

The "problem" we have in our actual repository setup is that the TensorFlow runtime is built and is only accessible from the tensorflow-core-api artifact. So anything depending on TensorFlow bazel build must be compiled in this artifact.

To generate protos as a separate module, we would need to move the JavaCPP build somewhere else (maybe in the tensorflow-core parent?) and have its childs cherry-picking what they need out of that build. I see other pieces of code that would make sense to be outside the core API too.

I don't know if it is possible to do this though and that might imply a lot of work. If we really want this, I suggest to open the discussion as a separate issue. cc: @saudet

@saudet
Copy link
Contributor

saudet commented Apr 8, 2020

Well, let's think of it that way. Can anyone anywhere use those protobufs with something else than the C API? Or, from the perspective of an end user, they are essentially just part of the C API? If it's the latter, I don't see any reason why they should be placed in another artifact. Anyone using them will also want the C API. If it's the former, what are those use cases exactly?

@Craigacp
Copy link
Collaborator

Craigacp commented Apr 8, 2020

It's more to lessen the load of anyone looking at the core API. That's where we expect people to look, it's what we'll document, and it's what the IDEs will show. If there's a bunch of protobufs in the namespace and we can't control what package names they have then it's going to mess with the structure and add a bunch of extra cognitive load when people look at it.

Though thinking about it, why can't we just change the package names in the Java protobufs? If the SIG is in charge of the Java API surely we should get to decide where they live.

@karllessard
Copy link
Collaborator Author

It's more to lessen the load of anyone looking at the core API. That's where we expect people to look, it's what we'll document, and it's what the IDEs will show.

Having them in a separate module will still pollute the package namespace and the IDE will show them the same way. For the source code, at least they reside in /src/gen and not in /src/main, so it is clear that they are not part of the maintained code.

Though thinking about it, why can't we just change the package names in the Java protobufs? If the SIG is in charge of the Java API surely we should get to decide where they live.

The protobuf compiler protoc will extract the package name from the .proto files themselves. If we want to change these packages, then we need to create a PR for the TensorFlow main repo to update all of them, which will break backward compatibility with their actual 1.x proto artifact (though I'm not sure if they are ever planning to build it again)... or we can extend this patch to override all of them on our side, which is quite a job and a bit ugly but still possible.

@Craigacp
Copy link
Collaborator

Craigacp commented Apr 8, 2020

I understand it involves changing files in the main TF repo, but that doesn't seem unreasonable if the change only affects Java.

And if I understand correctly it only breaks compatibility for people trying to use the 1.x artifacts with our new jars right? It'll still read the protos themselves just fine, but they'll come up with different names depending on what jar is loaded.

@karllessard
Copy link
Collaborator Author

karllessard commented Apr 8, 2020

@Craigacp : if changes made to the protos existing in 1.15 are backward compatible, then yes, it should only be matter of finding them in a different location if we rename those packages.

So you would suggest to have them in something like this?

org.tensorflow.proto.distruntime
org.tensorflow.proto.util
org.tensorflow.proto.framework
org.tensorflow.proto.example
org.tensorflow.proto.profiler

I understand the point and I also thought of it before. At the same time, I'm not sure how useful it is for a user to know that a file has been generated out of a protobuf file or not. For example, in this method, what would it provide to know that the MetaGraphDef is located in a proto package? BTW I'm not saying that we shouldn't do it but just challenging the idea.

@saudet
Copy link
Contributor

saudet commented Apr 8, 2020

If the goal is to avoid confusion, then just a different package name is fine in my opinion. FWIW, I don't find these classes to be any more user friendly than the C API itself, that is I still see all that as part of the "core API" anyway.

BTW, if we can't change the package name with protoc on the command line, sed is a more efficient option than modifying and maintaining manually a bunch of files. See for example how I'm translating JNI written by @Craigacp from C to C++ to make it work well with JavaCPP here:
https://github.com/bytedeco/javacpp-presets/blob/master/onnxruntime/cppbuild.sh#L57

@karllessard
Copy link
Collaborator Author

If we decide to rename the packages, I also think it make sense to apply the changes in the core repo since that is where the proto files reside but that won't probably be merged before TF2.3 and we can't wait for this release before having our protos available.

So the temporary solution would be to apply this PR internally as a Bazel patch.

@karllessard
Copy link
Collaborator Author

Now that being said, I kind of assume that Google will continue to add this java_package option in their new protos and classify them properly but that does not seemed to be enforced, as some protos were missing it.

@Craigacp
Copy link
Collaborator

Craigacp commented Apr 8, 2020

My reasoning for separating out the protos is because they aren't real code. We can't do anything about them, or change them in any way beyond repackaging, and having them sprinkled throughout the namespace disrupts what the packages mean. If a user knows that MetaGraphDef lives in a org.tensorflow.proto.<something> package then they immediately know it's part of the serialisation mechanism, and that it's fixed across Tensorflow languages. Otherwise they have to go and look at the class file.

@hthu
Copy link
Contributor

hthu commented Apr 8, 2020

My reasoning for separating out the protos is because they aren't real code. We can't do anything about them, or change them in any way beyond repackaging, and having them sprinkled throughout the namespace disrupts what the packages mean. If a user knows that MetaGraphDef lives in a org.tensorflow.proto.<something> package then they immediately know it's part of the serialisation mechanism, and that it's fixed across Tensorflow languages. Otherwise they have to go and look at the class file.

+1. I think having org.tensorflow.*.proto as the default prefix is going to make things clearer.
This can be done in two ways:

  1. we can just put .proto suffix for existing packages.
  2. Use org.tensorflow.proto as unique prefix for all generated protos.

@karllessard
Copy link
Collaborator Author

I guess we'll go with org.tensorflow.proto.*, it would look like the list I gave in my previous post. Ok, I'll check what is required to apply this change.

@hthu
Copy link
Contributor

hthu commented Apr 8, 2020

I guess we'll go with org.tensorflow.proto.*, it would look like the list I gave in my previous post. Ok, I'll check what is required to apply this change.

FYI, you can probably copy paste this under tensorflow/core/framework dir.
grep -rl 'java_package = ' . | xargs sed -i 's#^\(.*\) java_package = \"org.tensorflow.\(.*\)\"#\1 java_package = \"org.tensorflow.proto.\2\"#g'

@karllessard
Copy link
Collaborator Author

@hthu , in your command, you are kind of assuming that the only protos we want are those under /core/framework. In this PR, I've included all of those that are part of the protos_all rule.

I think it is ok (e.g. I'm pretty sure we want the protos under /core/example as well) but I just wanted to double check with you if there is any reason we shouldn't do this, thanks.

@karllessard
Copy link
Collaborator Author

Ok, all protos are under the org.tensorflow.proto package now

@hthu
Copy link
Contributor

hthu commented Apr 9, 2020

@hthu , in your command, you are kind of assuming that the only protos we want are those under /core/framework. In this PR, I've included all of those that are part of the protos_all rule.

I think it is ok (e.g. I'm pretty sure we want the protos under /core/example as well) but I just wanted to double check with you if there is any reason we shouldn't do this, thanks.

You're right :)
They should be all replaced.

BTW, some proto files don't have java_package option.

@karllessard
Copy link
Collaborator Author

BTW, some proto files don't have java_package option.

@hthu : for those I have a patch that adds the missing options before the command that fixes the .proto package is executed (hence why this package is not present in the patch). I guessed in which subpackage the protos should be located.

Now I probably should create a PR for the main TF repo that applies all these changes but at least, we are not blocked for doing it on our side. (BTW, I see that you are part of Google Brain team, will that PR ends up in your hands?)

@hthu
Copy link
Contributor

hthu commented Apr 9, 2020

BTW, some proto files don't have java_package option.

@hthu : for those I have a patch that adds the missing options before the command that fixes the .proto package is executed (hence why this package is not present in the patch). I guessed in which subpackage the protos should be located.

Now I probably should create a PR for the main TF repo that applies all these changes but at least, we are not blocked for doing it on our side. (BTW, I see that you are part of Google Brain team, will that PR ends up in your hands?)

I don't know.
I'm following this as I'm interested :)

@karllessard
Copy link
Collaborator Author

Ok, I think that’s ready to be merged, thanks everyone

@karllessard karllessard merged commit 9fe9d87 into tensorflow:master Apr 11, 2020
@karllessard karllessard deleted the proto branch April 11, 2020 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants