Skip to content

Commit da8db18

Browse files
Tony Kaofacebook-github-bot
authored andcommitted
torchx - fix race condition issue that local_scheduler LogIterator that reads early
Summary: torchx/cli/test:cmd_run_test - test_run_with_log (https://www.internalfb.com/intern/test/281475186013299?ref_report_id=0) regularly failed due to assertion on local_scheduler output is missing expected content. This is causing noise to oncall due to failed release test blocking torchx release. https://fburl.com/conveyor/a5u31rby issue looked to be in the LogIterator abort early if content has not written: https://www.internalfb.com/code/fbsource/[922fd5827417][history]/fbcode/torchx/schedulers/local_scheduler.py?lines=1185-1189 The propose fixed is add a small delay before fp_log is setup. Differential Revision: D80716088
1 parent 29472a9 commit da8db18

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

torchx/schedulers/local_scheduler.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1159,6 +1159,7 @@ def __iter__(self) -> "LogIterator":
11591159
self._check_finished() # check to see if app has finished running
11601160

11611161
if os.path.isfile(self._log_file):
1162+
time.sleep(0.1) # fix timing issue
11621163
self._log_fp = open(
11631164
self._log_file,
11641165
mode="rt",

0 commit comments

Comments
 (0)