Skip to content

Convert file system implementations to use C file system API in tensorflow #1111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
yongtang opened this issue Sep 10, 2020 · 7 comments
Open
2 of 3 tasks

Comments

@yongtang
Copy link
Member

yongtang commented Sep 10, 2020

As C file system API in tensorflow is available (see tensorflow/community#101), we should start migrate the file system implementations in tensorflow-io to use the C file system API.

This issue track the progresses for the following:

@yongtang
Copy link
Member Author

yongtang commented Sep 21, 2020

Before we can migrate to modular file system, additional work on tensorflow side is needed. The following is the progress of related PRs:

@yongtang
Copy link
Member Author

yongtang commented Sep 25, 2020

Additional fixes that need to happen on tensorflow/tensorflow:

@yongtang
Copy link
Member Author

yongtang commented Sep 27, 2020

One more fix (remove duplication of Env::Default() instance) that is needed in tensorflow/tensorflow:

@terrytangyuan
Copy link
Member

cc @zouxu09 Would anyone in your team be interested in helping out with OSS storage support?

@yongtang
Copy link
Member Author

yongtang commented Nov 8, 2020

@terrytangyuan One remaining issue in AZFS is that we commented out logging temporarily https://github.com/tensorflow/io/pull/1143/files#r517351641

This issue can be addressed by using C logging API in TensorFlow. A PR in tensorflow repo has been created to expose the header files in pip package:

@yongtang
Copy link
Member Author

With AZFS and HTTP file system converted to modular file system C API we can thinking about maintain forward-compatibility now. One thing we can do, is to lazy load the libtensorflow_io.so (non-forward-compatible) and only imperatively load libtensorflow_io_plugins.so (forward-compatible). In this way it will be possible to allow user to use file systems in future TF versions when they only touch the following:

import tensorflow_io as tfio

I have created a PR #1208 for that purpose.

The only outstanding issue is that, if we lazy-load libtensorflow_io.so, then OSS file system (still inside libtensorflow_io.so) will not be immediately available with import tensorflow_io as tfio. We can still have some other ways, e.g., expose import tensorflow_io.experimental.oss as oss to load libtensorflow_io.so for oss. Though we may also want to think about the timeline.

/cc @terrytangyuan do you know if there is any timeline with respect to oss moving to modular file system C API?

@terrytangyuan
Copy link
Member

@zouxu09 is working on the OSS part and it's tracked in #1180. I'll defer this to @zouxu09 to comment on the timeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants