Description
System information
Linux.
test-run: e843552.
tarantool: 2.7.0-111-g28f3b2f1e.
python2: 2.7.17.
gevent: 1.5_alpha2 and 20.6.2.
greenlet: 0.4.15-r1 and 0.4.16.
How to observe the problem
-
Pull and build recent tarantool (2.7.0-111-g28f3b2f1e in my case).
-
Copy
test/app-tap/test-timeout.test.lua
from Add test_timeout to limit test run time #244 (comment) (don't forget to set the executable bit:chmod a+x test/app-tap/test-timeout.test.lua
). -
Mangle
test/app-tap/debug/server.lua
to fail after 1 second. Place this code at the end of the file:local fiber = require('fiber') fiber.create(function() fiber.sleep(1) os.exit(1) end)
-
Run the test:
./test/test-run.py -j1 app-tap/test-timeout.test.lua
.
Expected: the fail of the non-default server detected and the testing fails with appropriate message after ~1 second.
Got: fail after 120 seconds (default --no-output-timeout value), no report about the fail of the non-default server.
Investigation
I observed that self.process.returncode
in TarantoolServer.crash_detect()
is 0, while the process returns 1.
After any of the following two patches the exit code becomes correct.
Variant 1:
diff --git a/lib/tarantool_server.py b/lib/tarantool_server.py
index 481b08f..6624ccf 100644
--- a/lib/tarantool_server.py
+++ b/lib/tarantool_server.py
@@ -8,7 +8,7 @@ import re
import shlex
import shutil
import signal
-import subprocess
+from gevent import subprocess
import sys
import time
import yaml
Variant 2:
diff --git a/lib/tarantool_server.py b/lib/tarantool_server.py
index 481b08f..a26e7c6 100644
--- a/lib/tarantool_server.py
+++ b/lib/tarantool_server.py
@@ -928,7 +928,7 @@ class TarantoolServer(Server):
while self.process.returncode is None:
self.process.poll()
if self.process.returncode is None:
- gevent.sleep(0.1)
+ time.sleep(0.1)
if self.process.returncode in [0, -signal.SIGKILL, -signal.SIGTERM]:
return
test-run fails after this in app_server.py on if retval['returncode'] != 0
, but nevermind, it'll be fixed soon. The tarantool instance that executes app-tap/test-timeout.test.lua
hangs in the while true do end
, but it will be fixed within the scope of #65 and #157.
It seems there is some problem around python's subprocess module and gevent module, but I failed to create a small reproducer.