Skip to content

luajit: box/bitset.test.lua flaky fails on Mac OS #235

Closed
tarantool/tarantool
#7328
@Totktonada

Description

@Totktonada

Tarantool version: 2.2.0-97-g96cdc5ba8.
OS version: Mac OS 10.12.6 (16G29).

How to reproduce:

TEST_RUN_TESTS="$(yes "box/bitset box/net_msg_max" | head -n 100)" make test

Sometimes bitset.test.lua hits by an assertion fail:

[005] Assertion failed: (tuple == tuple_end), function mp_tuple_assert, file /Users/a.turenko/tarantool/src/box/tuple.h, line 798.

But sometimes it is on the line 794.

The function is so:

 786 /**
 787  * Assert that buffer is valid MessagePack array
 788  * @param tuple buffer
 789  * @param the end of the buffer
 790  */
 791 static inline void
 792 mp_tuple_assert(const char *tuple, const char *tuple_end)
 793 {
 794     assert(mp_typeof(*tuple) == MP_ARRAY);
 795 #ifndef NDEBUG
 796     mp_next(&tuple);
 797 #endif
 798     assert(tuple == tuple_end);
 799     (void) tuple;
 800     (void) tuple_end;
 801 }

Backtrace and more info:

(lldb) bt
* thread tarantool/tarantool#1, stop reason = signal SIGSTOP
  * frame #0: 0x00007fffc3e04d42 libsystem_kernel.dylib`__pthread_kill + 10
    frame tarantool/tarantool#1: 0x00007fffc3ef2457 libsystem_pthread.dylib`pthread_kill + 90
    frame tarantool/tarantool#2: 0x00007fffc3d6a420 libsystem_c.dylib`abort + 129
    frame tarantool/tarantool#3: 0x00007fffc3d31893 libsystem_c.dylib`__assert_rtn + 320
    frame tarantool/tarantool#4: 0x0000000104a1fa59 tarantool`mp_tuple_assert(tuple="", tuple_end="") at tuple.h:798
    frame tarantool/tarantool#5: 0x0000000104a203ce tarantool`::box_index_iterator(space_id=512, index_id=1, type=7, key="\x91", key_end="") at index.cc:358
    frame tarantool/tarantool#6: 0x0000000104b9cb3d tarantool`lj_vm_ffi_call + 132
    frame tarantool/tarantool#7: 0x0000000104c4fe80 tarantool`lj_ccall_func(L=0x00000001073262f0, cd=0x000000010e04c040) at lj_ccall.c:1150
    frame tarantool/tarantool#8: 0x0000000104c80e50 tarantool`lj_cf_ffi_meta___call(L=0x00000001073262f0) at lib_ffi.c:230
    frame tarantool/tarantool#9: 0x0000000104b9a70d tarantool`lj_BC_FUNCC + 68
    frame tarantool/tarantool#10: 0x0000000104bc8d00 tarantool`lua_pcall(L=0x00000001073262f0, nargs=3, nresults=-1, errfunc=0) at lj_api.c:1139
    frame tarantool/tarantool#11: 0x0000000104b52eb3 tarantool`luaT_call(L=0x00000001073262f0, nargs=3, nreturns=-1) at utils.c:975
    frame tarantool/tarantool#12: 0x0000000104b4bcbc tarantool`lua_fiber_run_f(ap=0x0000000107400378) at fiber.c:366
    frame tarantool/tarantool#13: 0x0000000104a111b1 tarantool`fiber_cxx_invoke(f=(tarantool`lua_fiber_run_f at fiber.c:360), ap=0x0000000107400378)(__va_list_tag*), __va_list_tag*) at fiber.h:666
    frame tarantool/tarantool#14: 0x0000000104b6b91b tarantool`fiber_loop(data=0x0000000000000000) at fiber.c:694
    frame tarantool/tarantool#15: 0x0000000104d98c77 tarantool`coro_init at coro.c:110
(lldb) f 5
frame tarantool/tarantool#5: 0x0000000104a203ce tarantool`::box_index_iterator(space_id=512, index_id=1, type=7, key="\x91", key_end="") at index.cc:358
   355 	                   const char *key, const char *key_end)
   356 	{
   357 		assert(key != NULL && key_end != NULL);
-> 358 		mp_tuple_assert(key, key_end);
   359 		if (type < 0 || type >= iterator_type_MAX) {
   360 			diag_set(ClientError, ER_ILLEGAL_PARAMS,
   361 				 "Invalid iterator type");
(lldb) p key
(const char *) $2 = 0x00000001be2a4530 "\xffffff91"
(lldb) p key_end
(const char *) $3 = 0x00000001be2a453c <no value available>
(lldb) p key
(const char *) $4 = 0x00000001be2a4530 "\xffffff91"
(lldb) x key
0x1be2a4530: 91 00 00 00 00 00 00 00 00 00 00 05 00 00 00 00  ................
0x1be2a4540: 23 00 00 00 00 00 00 00 f8 44 2a be 01 00 00 00  #........D*.....
(lldb) f 4
frame tarantool/tarantool#4: 0x0000000104a1fa59 tarantool`mp_tuple_assert(tuple="", tuple_end="") at tuple.h:798
   795 	#ifndef NDEBUG
   796 		mp_next(&tuple);
   797 	#endif
-> 798 		assert(tuple == tuple_end);
   799 		(void) tuple;
   800 		(void) tuple_end;
   801 	}
(lldb) p tuple
(const char *) $0 = 0x00000001be2a4532 <no value available>
(lldb) p tuple_end
(const char *) $1 = 0x00000001be2a453c <no value available>

It seems the key that is passed from Lua via ffi is broken. After adding require('jit'').off() at the start of the bitset.test.lua it works stably! (It also works stably on Linux even with JIT enabled.) So the root of the problem seems to be found. Anyway, I'll dump other related info that I gathered on Mac OS with JIT enabled.

Backtrace and more info (another run):

(lldb) bt
* thread tarantool/tarantool#1, stop reason = signal SIGSTOP
  * frame #0: 0x00007fffc3e04d42 libsystem_kernel.dylib`__pthread_kill + 10
    frame tarantool/tarantool#1: 0x00007fffc3ef2457 libsystem_pthread.dylib`pthread_kill + 90
    frame tarantool/tarantool#2: 0x00007fffc3d6a420 libsystem_c.dylib`abort + 129
    frame tarantool/tarantool#3: 0x00007fffc3d31893 libsystem_c.dylib`__assert_rtn + 320
    frame tarantool/tarantool#4: 0x000000010bfa5a12 tarantool`mp_tuple_assert(tuple="", tuple_end="\x80") at tuple.h:794
    frame tarantool/tarantool#5: 0x000000010bfa614e tarantool`::box_index_count(space_id=522, index_id=1, type=9, key="", key_end="\x80") at index.cc:322
    frame tarantool/tarantool#6: 0x000000010c122b3d tarantool`lj_vm_ffi_call + 132
    frame tarantool/tarantool#7: 0x000000010c1d5e80 tarantool`lj_ccall_func(L=0x000000010cf101e0, cd=0x000000010cfbf3b0) at lj_ccall.c:1150
    frame tarantool/tarantool#8: 0x000000010c206e50 tarantool`lj_cf_ffi_meta___call(L=0x000000010cf101e0) at lib_ffi.c:230
    frame tarantool/tarantool#9: 0x000000010c12070d tarantool`lj_BC_FUNCC + 68
    frame tarantool/tarantool#10: 0x000000010c14ed00 tarantool`lua_pcall(L=0x000000010cf101e0, nargs=3, nresults=-1, errfunc=0) at lj_api.c:1139
    frame tarantool/tarantool#11: 0x000000010c0d8eb3 tarantool`luaT_call(L=0x000000010cf101e0, nargs=3, nreturns=-1) at utils.c:975
    frame tarantool/tarantool#12: 0x000000010c0d1cbc tarantool`lua_fiber_run_f(ap=0x000000010f000378) at fiber.c:366
    frame tarantool/tarantool#13: 0x000000010bf971b1 tarantool`fiber_cxx_invoke(f=(tarantool`lua_fiber_run_f at fiber.c:360), ap=0x000000010f000378)(__va_list_tag*), __va_list_tag*) at fiber.h:666
    frame tarantool/tarantool#14: 0x000000010c0f191b tarantool`fiber_loop(data=0x0000000000000000) at fiber.c:694
    frame tarantool/tarantool#15: 0x000000010c31ec77 tarantool`coro_init at coro.c:110
frame tarantool/tarantool#5: 0x000000010bfa614e tarantool`::box_index_count(space_id=522, index_id=1, type=9, key="", key_end="\x80") at index.cc:322
   319 			const char *key, const char *key_end)
   320 	{
   321 		assert(key != NULL && key_end != NULL);
-> 322 		mp_tuple_assert(key, key_end);
   323 		if (type < 0 || type >= iterator_type_MAX) {
   324 			diag_set(ClientError, ER_ILLEGAL_PARAMS,
   325 				 "Invalid iterator type");
(lldb) f 4
frame tarantool/tarantool#4: 0x000000010bfa5a12 tarantool`mp_tuple_assert(tuple="", tuple_end="\x80") at tuple.h:794
   791 	static inline void
   792 	mp_tuple_assert(const char *tuple, const char *tuple_end)
   793 	{
-> 794 		assert(mp_typeof(*tuple) == MP_ARRAY);
   795 	#ifndef NDEBUG
   796 		mp_next(&tuple);
   797 	#endif

Sometimes bitset.test.lua fails with a result miscompare:

[012] --- box/bitset.result	Mon Jul 23 15:01:38 2018
[012] +++ box/bitset.reject	Wed Apr  3 11:26:16 2019
[012] @@ -1249,37 +1249,37 @@
[012]  ------------------------------------------------------------------------------
[012]  dump(box.index.BITS_ALL_NOT_SET, 3)
[012]  ---
[012] -- - $       4$
[012] -  - $       8$
[012] -  - $      12$
[012] -  - $      16$
<...>
[012] +- - $       4$       4$       4$
[012] +  - $       8$       8$       8$
[012] +  - $      12$      12$      12$
[012] +  - $      16$      16$      16$
<...>

Or like so:

[001] --- box/bitset.result	Mon Jul 23 15:01:38 2018
[001] +++ box/bitset.reject	Wed Apr  3 11:26:17 2019
[001] @@ -700,70 +700,7 @@
[001]  ...
[001]  dump(box.index.BITS_ALL_NOT_SET, 2)
[001]  ---
[001] -- - $       1$
[001] -  - $       4$
[001] -  - $       5$
[001] -  - $       8$
[001] -  - $       9$
[001] -  - $      12$
[001] -  - $      13$
[001] -  - $      16$
<...>
[001] +- error: './utils.lua:29: bad argument tarantool/tarantool#2 to ''get_field'' (number expected, got nil)'
[001]  ...
[001]  box.space.tweedledum.index.bitset:count(2, { iterator = box.index.BITS_ALL_NOT_SET})
[001]  ---
<...>

Or like so:

[015] --- box/bitset.result	Mon Jul 23 15:01:38 2018
[015] +++ box/bitset.reject	Wed Apr  3 11:35:34 2019
[015] @@ -9,7 +9,7 @@
[015]  ------------------------------------------------------------------------------
[015]  test_insert_delete(128)
[015]  ---
[015] -- - $       1$
[015] +- - $
[015]  ...
<...>

Or like so:

[001] --- box/bitset.result	Mon Jul 23 15:01:38 2018
[001] +++ box/bitset.reject	Wed Apr  3 11:40:31 2019
[001] @@ -1961,6 +1961,7 @@
[001]  ...
[001]  for j=1,100 do check(math.random(9) - 1) end
[001]  ---
[001] +- error: 'Supplied key type of part 0 does not match index part type: expected unsigned'
[001]  ...
[001]  for j=1,100 do check(math.random(9) - 1, {iterator = box.index.BITS_ANY_SET}) end
[001]  ---

Or like so:

[008] --- box/bitset.result	Mon Jul 23 15:01:38 2018
[008] +++ box/bitset.reject	Wed Apr  3 11:48:12 2019
[008] @@ -3,13 +3,14 @@
[008]  ...
[008]  create_space()
[008]  ---
[008] +- error: Duplicate key exists in unique index 'primary' in space '_index'

Or like so:

[016] --- box/bitset.result	Mon Jul 23 15:01:38 2018
[016] +++ box/bitset.reject	Wed Apr  3 12:14:25 2019
[016] @@ -1834,6 +1834,7 @@
[016]  ...
[016]  drop_space()
[016]  ---
[016] +- error: 'Can''t drop space ''tweedledum'': the space has indexes'

At least some of miscompares appear even after pretest_clean = True in suite.ini.

Don't sure whether the assertion fail and the miscompares are caused by one problem.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions