Description
How to reproduce
Mangle a supplementary (non-default) instance like so:
diff --git a/test/app-tap/debug/server.lua b/test/app-tap/debug/server.lua
index 3362d46cc..8dbd74b66 100644
--- a/test/app-tap/debug/server.lua
+++ b/test/app-tap/debug/server.lua
@@ -9,3 +9,5 @@ box.schema.func.create('debug.sourcefile')
box.schema.func.create('debug.sourcedir')
box.schema.user.grant('guest','execute','function','debug.sourcefile')
box.schema.user.grant('guest','execute','function','debug.sourcedir')
+
+while true do end
And run a test (remove --test-timeout 5
for test-run prior PR #244):
./test/test-run.py --test-timeout 5 -j1 app-tap/debug.test.lua
Expected behaviour: the test will fail at reaching the test timeout and all relevant instances will be terminated / killed.
Observed behaviour: the test timeout works, but the instance is not stopped. If the test is marked as fragile and we make attempt to rerun it, we'll get the 'The daemon is already running' error.
Investigation
It seems, the reason is the following check in the TarantoolServer.stop()
method:
test-run/lib/tarantool_server.py
Lines 845 to 849 in 1f6d7ba
Given server is not considered as 'started', so test-run does not stop it. However the process exists and should be killed.
The attempt to just remove the check fails:
The entire output (without this check)
$ ./test/test-run.py --test-timeout 5 -j1 --force app-tap/debug.test.lua
Started ./test/test-run.py --test-timeout 5 -j1 --force app-tap/debug.test.lua
Running in parallel with 1 workers
Collecting tests in 'app' (Found 0 tests): application server tests.
Collecting tests in 'app-tap' (Found 1 tests): application server tests (TAP).
Collecting tests in 'box' (Found 0 tests): Database tests.
Collecting tests in 'box-py' (Found 0 tests): legacy python tests.
Collecting tests in 'box-tap' (Found 0 tests): Database tests with #! using TAP.
Collecting tests in 'engine' (Found 0 tests): tarantool multiengine tests.
Collecting tests in 'engine_long' (Found 0 tests): tarantool engine stress tests.
Collecting tests in 'long_run-py' (Found 0 tests): long running tests.
Collecting tests in 'luajit-tap' (Found 0 tests): Luajit tests.
Collecting tests in 'replication' (Found 0 tests): tarantool/box, replication.
Collecting tests in 'replication-py' (Found 0 tests): tarantool/box, replication.
Collecting tests in 'small' (Found 0 tests): libsmall unit tests.
Collecting tests in 'sql' (Found 0 tests): sql tests.
Collecting tests in 'sql-tap' (Found 0 tests): Database tests with #! using TAP.
Collecting tests in 'swim' (Found 0 tests): SWIM tests.
Collecting tests in 'unit' (Found 0 tests): unit tests.
Collecting tests in 'vinyl' (Found 0 tests): vinyl integration tests.
Collecting tests in 'wal_off' (Found 0 tests): tarantool/box, wal_mode = none.
Collecting tests in 'xlog' (Found 0 tests): tarantool write ahead log tests.
Collecting tests in 'xlog-py' (Found 0 tests): legacy python tests.
Tarantool server information
| Found executable at /home/alex/projects/tarantool-meta/r/t-3/src/tarantool
| Found tarantoolctl at /home/alex/projects/tarantool-meta/r/t-3/extra/dist/tarantoolctl
| Tarantool 2.7.0-98-g54667f6fa
| Target: Linux-x86_64-Debug
| Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=ON
| Compiler: /usr/bin/cc /usr/bin/c++
| C_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror
| CXX_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror
======================================================================================
WORKR TEST PARAMS RESULT
---------------------------------------------------------------------------------
[001] app-tap/debug.test.lua Test timeout of 5 secs reached
[001] [Instance "server"] Tarantool server failed to start
[001]
[001] Last 15 lines of Tarantool Log file [Instance "server"][/home/alex/projects/tarantool-meta/r/t-3/test/var/001_app-tap/server.log]:
[001] 2020-12-15 22:37:46.033 [15737] main/103/server I> mapping 134217728 bytes for vinyl tuple arena...
[001] 2020-12-15 22:37:46.038 [15737] main/103/server I> instance uuid 69e0dc32-5dee-4d07-ae65-e7753c86689d
[001] 2020-12-15 22:37:46.038 [15737] iproto/101/main I> binary: bound to unix/:(socket)
[001] 2020-12-15 22:37:46.039 [15737] main/103/server I> initializing an empty data directory
[001] 2020-12-15 22:37:46.067 [15737] main/103/server I> assigned id 1 to replica 69e0dc32-5dee-4d07-ae65-e7753c86689d
[001] 2020-12-15 22:37:46.067 [15737] main/103/server I> cluster uuid d4a453ad-6d80-49bd-8350-c7ec495db6ef
[001] 2020-12-15 22:37:46.068 [15737] snapshot/101/main I> saving snapshot `/home/alex/projects/tarantool-meta/r/t-3/test/var/001_app-tap/server/00000000000000000000.snap.inprogress'
[001] 2020-12-15 22:37:46.071 [15737] snapshot/101/main I> done
2 l/tarantool_server.py 2 l/app_server.py l/server.py l/test.py X
[001] 2020-12-15 22:37:46.072 [15737] main/103/server I> ready to accept requests
[001] 2020-12-15 22:37:46.072 [15737] main/103/server I> set 'log_level' configuration option to 5
[001] 2020-12-15 22:37:46.072 [15737] main/106/checkpoint_daemon I> scheduled next checkpoint for Wed Dec 16 00:35:02 2020
[001] 2020-12-15 22:37:46.072 [15737] main/103/server I> set 'listen' configuration option to "\/home\/alex\/projects\/tarantool-meta\/r\/t-3\/test\/var\/001_app-tap\/server.socket-iproto"
[001] 2020-12-15 22:37:46.072 [15737] main/103/server I> set 'log_format' configuration option to "plain"
[001] 2020-12-15 22:37:46.073 [15737] main/103/server I> set 'replication_sync_timeout' configuration option to 100
[001] 2020-12-15 22:37:46.073 [15737] main/117/console/::1:8670 I> started
[001]
[001] Worker "001_app-tap" received the following error; stopping...
[001] Traceback (most recent call last):
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/worker.py", line 316, in run_task
[001] task, self.server, self.inspector)
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/test_suite.py", line 261, in run_test
[001] short_status, result_checksum = test.run(server)
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/test.py", line 182, in run
[001] self.execute(server)
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/app_server.py", line 77, in execute
[001] self.finish(server, ts)
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/app_server.py", line 87, in finish
[001] ts.stop_nondefault(signal=signal.SIGKILL)
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/preprocessor.py", line 442, in stop_nondefault
[001] v.stop(silent=True, signal=signal)
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/tarantool_server.py", line 1059, in stop
[001] save_join(self.crash_detector)
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/tarantool_server.py", line 52, in save_join
[001] green_obj.get(timeout=timeout)
[001] File "src/gevent/greenlet.py", line 790, in gevent._gevent_cgreenlet.Greenlet.get
[001] File "src/gevent/greenlet.py", line 778, in gevent._gevent_cgreenlet.Greenlet.get
[001] File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
[001] File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
[001] File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
[001] File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
[001] File "src/gevent/greenlet.py", line 854, in gevent._gevent_cgreenlet.Greenlet.run
[001] File "/usr/lib64/python2.7/site-packages/gevent/baseserver.py", line 34, in _handle_and_close_when_done
[001] return handle(*args_tuple)
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/inspector.py", line 101, in handle
[001] result = self.parser.parse_preprocessor(line)
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/preprocessor.py", line 122, in parse_preprocessor
[001] return self.server(stype, sname, options)
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/preprocessor.py", line 348, in server
[001] return getattr(self, attr)(ctype, sname, opts)
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/preprocessor.py", line 209, in server_start
[001] wait_load=wait_load, args=args)
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/tarantool_server.py", line 881, in start
[001] self.wait_until_started(wait_load)
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/tarantool_server.py", line 1106, in wait_until_started
[001] self.logfile_pos.seek_wait(msg, p, self.name)
[001] File "/home/alex/projects/tarantool-meta/r/t-3/test-run/lib/tarantool_server.py", line 455, in seek_wait
[001] raise TarantoolStartError(name)
[001] TarantoolStartError
[001]
[001] Exception:
---------------------------------------------------------------------------------
[Internal test-run error] The following tasks were dispatched to some worker task queue, but were not reported as done (does not matters success or fail):
- [app-tap/debug.test.lua, null]
Background information
I doubt that it is possible to get this situation in a real testing, so let's consider this issue with lowest priority. Take care to it if you'll perform some refactoring around stopping / killing tarantool instances.