Skip to content

test: vinyl/select_consistency test flaky hangs #16

Closed
@avtikhon

Description

@avtikhon

Tarantool version:
master

OS version:
OSX (very often)

Bug description:
Use debug build with '--long' flag

Failed at:
https://gitlab.com/tarantool/tarantool/-/jobs/259711765#L3466

Fail message:

Test hung! Result content mismatch:
--- vinyl/select_consistency.result	Fri Jul 26 04:26:49 2019
+++ var/060_vinyl/select_consistency.result	Fri Jul 26 04:28:50 2019
@@ -137,16 +137,3 @@
 for i = 1, ch:size() do
     ch:get()
 end;
----
-...
-test_run:cmd("setopt delimiter ''");
----
-- true
-...
-#failed == 0 or failed
----
-- true
-...
-s:drop()
----
-...

Really test hangs on snapshot creation:

[084] Last 15 lines of Tarantool Log file [Instance "vinyl"][/Users/tntmac03.tarantool.i/tnt/test/var/084_vinyl/vinyl.log]:
[084] 2020-12-02 22:25:51.792 [88742] vinyl.dump.0/155/task I> writing `/Users/tntmac03.tarantool.i/tnt/test/var/084_vinyl/vinyl/512/0/00000000000000000554.index'
[084] 2020-12-02 22:25:51.792 [88742] vinyl.compaction.0/137/task I> writing `/Users/tntmac03.tarantool.i/tnt/test/var/084_vinyl/vinyl/512/2/00000000000000000555.index'
[084] 2020-12-02 22:25:51.792 [88742] vinyl.compaction.1/129/task I> writing `/Users/tntmac03.tarantool.i/tnt/test/var/084_vinyl/vinyl/512/0/00000000000000000556.index'
[084] 2020-12-02 22:25:51.793 [88742] main/105/gc I> removed /Users/tntmac03.tarantool.i/tnt/test/var/084_vinyl/vinyl/512/1/00000000000000000246.run
[084] 2020-12-02 22:25:51.793 [88742] main/108/vinyl.scheduler I> 512/0: dump completed
[084] 2020-12-02 22:25:51.793 [88742] main/108/vinyl.scheduler I> dumped 0 bytes in 0.0 s, rate 0.0 MB/s
[084] 2020-12-02 22:25:51.793 [88742] main/749/lua I> vinyl checkpoint completed
[084] 2020-12-02 22:25:51.794 [88742] main/108/vinyl.scheduler I> 512/2: completed compacting range ([8, 9, 89]..inf)
[084] 2020-12-02 22:25:51.794 [88742] main/105/gc I> removed /Users/tntmac03.tarantool.i/tnt/test/var/084_vinyl/vinyl/512/1/00000000000000000008.index
[084] 2020-12-02 22:25:51.795 [88742] main/750/lua [string "function snap_loop()     while not stop do   ..."]:1 E> error: box.snapshot failed with error Snapshot is already in progress
[084] 2020-12-02 22:25:51.795 [88742] main/105/gc I> removed /Users/tntmac03.tarantool.i/tnt/test/var/084_vinyl/vinyl/512/1/00000000000000000008.run
[084] 2020-12-02 22:25:51.796 [88742] main/105/gc I> removed /Users/tntmac03.tarantool.i/tnt/test/var/084_vinyl/vinyl/512/1/00000000000000000214.index
[084] 2020-12-02 22:25:51.796 [88742] main/105/gc I> removed /Users/tntmac03.tarantool.i/tnt/test/var/084_vinyl/vinyl/512/1/00000000000000000214.run
[084] 2020-12-02 22:25:51.797 [88742] main/749/lua F> can't rename .snap.inprogress
[084] 2020-12-02 22:25:51.797 [88742] main/749/lua F> can't rename .snap.inprogress
[084] [ fail ]

Steps to reproduce:

  1. use mac host to reproduce this issue where it happens really often.
  2. enable fragile list in parallel mode in test-run submodule:
diff --git a/lib/worker.py b/lib/worker.py
index a8643fe..698e9d2 100644
--- a/lib/worker.py
+++ b/lib/worker.py
@@ -89,7 +89,7 @@ def get_task_groups():
             res[key + '_fragile'] = {
                 'gen_worker': gen_worker,
                 'task_ids': fragile_task_ids,
-                'is_parallel': False,
+                'is_parallel': suite.is_parallel(),
                 'show_reproduce_content': suite.show_reproduce_content(),
             }
     return res
  1. increase the issue reproducability - set sleep time for snapshot as smaller as possible:
diff --git a/test/vinyl/select_consistency.result b/test/vinyl/select_consistency.result
index f6d96473d..4047c3c3d 100644
--- a/test/vinyl/select_consistency.result
+++ b/test/vinyl/select_consistency.result
@@ -91,7 +91,7 @@ function snap_loop()
             failed = true
             break
         end
-        fiber.sleep(0.5)
+        fiber.sleep(0.01)
     end
     ch:put(true)
 end;
diff --git a/test/vinyl/select_consistency.test.lua b/test/vinyl/select_consistency.test.lua
index 644f68f5f..34b093d2a 100644
--- a/test/vinyl/select_consistency.test.lua
+++ b/test/vinyl/select_consistency.test.lua
@@ -69,7 +69,7 @@ function snap_loop()
             failed = true
             break
         end
-        fiber.sleep(0.5)
+        fiber.sleep(0.01)
     end
     ch:put(true)
 end;
  1. run flaky test in parallel many times:
( c=0 ; while ./test-run.py --builddir ~/tnt -j 100 `for r in {1..200} ; do echo vinyl/select_consistency.test.lua ; done` --long --force 2>&1 ; do date ; c=$((c+1)) ; echo RUN $c ; done ; echo FAILED on RUN $c ) | tee a.log

Optional (but very desirable):

  • coredump
  • backtrace
  • netstat

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions