-
Notifications
You must be signed in to change notification settings - Fork 214
Attach protobuf generated classes to TF Core API #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
<dependency> | ||
<groupId>com.google.protobuf</groupId> | ||
<artifactId>protobuf-java</artifactId> | ||
<version>${protobuf.version}</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the original protobuf artifact as is will cause issues with other libraries that need a different version of protobuf, so this should all be shaded. If the plan is to wait until users having this problem show up, is it going to be OK to break backward compatibility at that time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So embedding it using Maven shade plugin? I can give it a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly, we need to use a patched version of protobuf that can generate code that depends on shaded classes, like this one: https://github.com/os72/protoc-jar-maven-plugin
But like I said it's not important to do that right away if we plan on breaking compatibility at some point in the future anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that might require more changes then because right now, I let Bazel generating the Java protobuf classes and just match the version used in Maven, as I did not find how to export only the .proto
files out of TensorFlow.
I also found out this, for what it is worth. Looks like there is already some org.tensorflow
artifacts that shades protobuf in a different way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we can let users do it themselves when required...
@@ -0,0 +1,317 @@ | |||
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be translated to Java? And if not, why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I mentioned in the description of this PR, this is just an old utility I've written awhile ago when classifying the 900+ initial operation wrappers and I just put it there so it is not lost, as I still find it quite useful.
Now could it be rewritten in Java... I guess so, but with a lot of work as it depends on TF internal classes. I'm not planning to spend time on this anytime soon so if we think it does not fit in our repo, then I'll simply remove it from this PR and continue to keep it in my fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this code actually gets used during the build? If so, I suppose we can leave it here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I only use it when I discover that new operations have been added after a TF upgrade, so I can classify them properly by configuring the API definitions properly before rebuilding. It just gives me some hints in which package they should be located based on Python endpoints that has a similar name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, so let's add the step to compile it to the build.sh script or something while we're at it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want the protobufs in the core artifact? Might be nice if they were packaged separately as they are conceptually different from the core jar, plus they wander all over our package namespace. That said, packaging them separately would cause module system issues later. |
The "problem" we have in our actual repository setup is that the TensorFlow runtime is built and is only accessible from the To generate protos as a separate module, we would need to move the JavaCPP build somewhere else (maybe in the I don't know if it is possible to do this though and that might imply a lot of work. If we really want this, I suggest to open the discussion as a separate issue. cc: @saudet |
Well, let's think of it that way. Can anyone anywhere use those protobufs with something else than the C API? Or, from the perspective of an end user, they are essentially just part of the C API? If it's the latter, I don't see any reason why they should be placed in another artifact. Anyone using them will also want the C API. If it's the former, what are those use cases exactly? |
It's more to lessen the load of anyone looking at the core API. That's where we expect people to look, it's what we'll document, and it's what the IDEs will show. If there's a bunch of protobufs in the namespace and we can't control what package names they have then it's going to mess with the structure and add a bunch of extra cognitive load when people look at it. Though thinking about it, why can't we just change the package names in the Java protobufs? If the SIG is in charge of the Java API surely we should get to decide where they live. |
Having them in a separate module will still pollute the package namespace and the IDE will show them the same way. For the source code, at least they reside in
The protobuf compiler |
I understand it involves changing files in the main TF repo, but that doesn't seem unreasonable if the change only affects Java. And if I understand correctly it only breaks compatibility for people trying to use the 1.x artifacts with our new jars right? It'll still read the protos themselves just fine, but they'll come up with different names depending on what jar is loaded. |
@Craigacp : if changes made to the protos existing in 1.15 are backward compatible, then yes, it should only be matter of finding them in a different location if we rename those packages. So you would suggest to have them in something like this?
I understand the point and I also thought of it before. At the same time, I'm not sure how useful it is for a user to know that a file has been generated out of a protobuf file or not. For example, in this method, what would it provide to know that the |
If the goal is to avoid confusion, then just a different package name is fine in my opinion. FWIW, I don't find these classes to be any more user friendly than the C API itself, that is I still see all that as part of the "core API" anyway. BTW, if we can't change the package name with |
If we decide to rename the packages, I also think it make sense to apply the changes in the core repo since that is where the proto files reside but that won't probably be merged before TF2.3 and we can't wait for this release before having our protos available. So the temporary solution would be to apply this PR internally as a Bazel patch. |
Now that being said, I kind of assume that Google will continue to add this |
My reasoning for separating out the protos is because they aren't real code. We can't do anything about them, or change them in any way beyond repackaging, and having them sprinkled throughout the namespace disrupts what the packages mean. If a user knows that |
+1. I think having
|
I guess we'll go with |
FYI, you can probably copy paste this under |
@hthu , in your command, you are kind of assuming that the only protos we want are those under I think it is ok (e.g. I'm pretty sure we want the protos under |
Ok, all protos are under the |
You're right :) BTW, some proto files don't have java_package option. |
@hthu : for those I have a patch that adds the missing options before the command that fixes the Now I probably should create a PR for the main TF repo that applies all these changes but at least, we are not blocked for doing it on our side. (BTW, I see that you are part of Google Brain team, will that PR ends up in your hands?) |
I don't know. |
Ok, I think that’s ready to be merged, thanks everyone |
The main purpose of this PR is to generate Java classes from TensorFlow core protocol buffer descriptions and to distribute them directly from the
tensorflow-core-api
artifact. This way, the API can use concrete typed classes in the signature of some methods accepting TF protocol buffers instead of raw byte arrays.The classes are generated from the Bazel build, using the same protobuf compiler used to build TensorFlow native library, which matches the version of the protobuf Maven dependency. The generated classes are then unarchived in the
/src/gen/java
folder of thetensorflow-core-api
for building the Java library.A few other changes were also applied as side-effects of this feature:
java_package
optionjava_api_import
to help select the right package/group for new operationsdebugging
To see the impact of including the protobuf in the Java client, look at changes applied to the
src/main/java/org/tensorflow
classes (e.g.Graph
,Session
,EagerSession
,SavedModelBundle
,Server
...)Ref: #21