Description
Tarantool 2.8.0-114-g9ccd4eab6
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=ON
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror
CXX_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror
OS: Linux
Check issues at #93
Reproduce on dev1:
Memory began to hang
Total Memory | Swap | # of test runs | RSS Memory | Time |
---|---|---|---|---|
2Gb | 10 | 4 secs | ||
2Gb | 12 | 13 secs | ||
2Gb | 14 | timeout | ||
4Gb | 24 | 4 secs | ||
4Gb | 26 | timeout | ||
8Gb | 4Gb | 48 | 7652044800 | 2 m 36 secs |
8Gb | 4Gb | 50 | 7691522048 | 2 m 42 secs |
8Gb | 4Gb | 56 | 8315879424 | 3 m 05 secs |
8Gb | 4Gb | 58 | 8336347136 | OOM + hanged container & host |
Other tests from box/ suite (8 Gb | 4Gb):
Test | # of test runs | RSS Memory | Time | OOM on # runs | Memory per test |
---|---|---|---|---|---|
access | 58 | 6183579648 | 1 m 25 secs | 107 Mb | |
blackhole | 72 | 8306712576 | 1 m 14 secs | 80 | 115 Mb |
func_reload | 90 | 7675752448 | 0 m 32 secs | 100 | 85 Mb |
gh-5135-invalid-upsert | 110 | 8098770944 | 0 m 29 secs | 120 | 74 Mb |
gh-5422-broken_snapshot | 56 | 8315879424 | 3 m 05 secs | 58 | 150 Mb |
iterator | 110 | 8132214784 | 0 m 38 secs | 120 | 74 Mb |
misc | 110 | 8134795264 | 1 m 00 secs | 120 | 74 Mb |
net_msg_max | 3 | 7678414848 | 0 m 03 secs | 4 | 2.5 Gb |
Tool atop could not show the real issue in RSS, due to hanged itself (check RGROW):
THR | SYSCPU | USRCPU | VGROW | RGROW | RDDSK | WRDSK | ST | S | CPU | CMD |
---|---|---|---|---|---|---|---|---|---|---|
4 | 0.02s | 0.12s | 713.4M | 71644K | 0K | 8K | N- | S | 13% | tarantool |
1 | 0.05s | 0.05s | 167.7M | 49160K | 0K | 140K | N- | S | 0% | python2 |
Try memory overload for container run with:
--cpus=2 --memory=8G --memory-swap=12G --memory-reservation=8G:
- run in container:
rm -rf rss_persec.log ; ( ( while date && sleep 1 ; do cat /sys/fs/cgroup/memory/memory.stat ; done ) >>rss_persec.log & echo $! >rss.pid & ) ; ( export PATH=$PATH:/tnt/src ; export REPLICATION_SYNC_TIMEOUT=2500 ; export TEST_TIMEOUT=2510 ; export NO_OUTPUT_TIMEOUT=2520 ; date ; time ./test-run.py -j 1200 --builddir /tnt --vardir var_hdd_vinyl `for r in {1..64} ; do echo box/gh-5422-broken_snapshot. ; done` --force 2>&1 ; sleep 1 ; kill -USR2 `cat rss.pid` ; date ) > test.log &
- run on host which runs container:
docker events
2021-03-16T08:21:57.833804548+03:00 container oom 7129eaff03192cfda5896e17ae3506935cc59c2c6310015b47ed28ffc7c41cc0 (image=registry.gitlab.com/tarantool/tarantool/testing/debian-stretch, name=goofy_mcnulty)
2021-03-16T08:22:31.626244711+03:00 container oom 7129eaff03192cfda5896e17ae3506935cc59c2c6310015b47ed28ffc7c41cc0 (image=registry.gitlab.com/tarantool/tarantool/testing/debian-stretch, name=goofy_mcnulty)
2021-03-16T08:22:36.535503324+03:00 container oom 7129eaff03192cfda5896e17ae3506935cc59c2c6310015b47ed28ffc7c41cc0 (image=registry.gitlab.com/tarantool/tarantool/testing/debian-stretch, name=goofy_mcnulty)
- check in container RSS maximums with:
grep total_rss\ rss_persec.log | sort
Try disk overload:
# start docker container with limitations in memory and enabled swap
docker run --network=host -v /export/avtikhon/src:/source -ti --cpus=40 --memory=2G --memory-swap=-1 --memory-reservation=1G registry.gitlab.com/tarantool/tarantool/testing/debian-stretch
# check available memory size with
/sys/fs/cgroup/memory/memory.limit_in_bytes
# run tests
( export PATH=$PATH:/tnt/src; export REPLICATION_SYNC_TIMEOUT=500; export TEST_TIMEOUT=510; export NO_OUTPUT_TIMEOUT=520; date; time ./test-run.py -j 1200 --builddir /tnt --vardir var_hdd_vinyl `for r in {1..12} ; do echo box/gh-5422-broken_snapshot ; done` --force 2>&1; sleep 1; kill -USR2 `cat atop.pid`; date ) > test_atop.log
Disks usage log from atop:
LVM | dm-3 | busy 718% | | read 60409 | write 501 | KiB/r 22 | KiB/w 4 | | MBr/s 1343.0 | MBw/s 2.0 | avq 12.84 | | avio 0.13 ms |
LVM | dm-2 | busy 10% | | read 612 | write 694 | KiB/r 18 | KiB/w 5 | | MBr/s 10.9 | MBw/s 3.7 | avq 7.85 | | avio 0.08 ms |
MDD | md1 | busy 0% | | read 60361 | write 519 | KiB/r 22 | KiB/w 3 | | MBr/s 1342.3 | MBw/s 1.9 | avq 0.00 | | avio 0.00 ms |
DSK | sda | busy 711% | | read 30011 | write 258 | KiB/r 22 | KiB/w 7 | | MBr/s 662.9 | MBw/s 2.0 | avq 6.67 | | avio 0.25 ms |
DSK | sdb | busy 697% | | read 29744 | write 258 | KiB/r 23 | KiB/w 7 | | MBr/s 679.0 | MBw/s 2.0 | avq 6.33 | | avio 0.25 ms |
Github Actions use hosts:
OSX:
Hardware:
Hardware Overview:
Model Name: Mac
Model Identifier: VMware7,1
Processor Name: Unknown
Processor Speed: 3.33 GHz
Number of Processors: 1
Total Number of Cores: 3
L2 Cache (per Core): 256 KB
L3 Cache: 12 MB
Memory: 14 GB
System Firmware Version: VMW71.00V.13989454.B64.1906190538
Apple ROM Info: [MS_VM_CERT/SHA1/27d66596a61c48dd3dc7216fd715126e33f59ae7]Welcome to the Virtual Machine
SMC Version (system): 2.8f0
Serial Number (system): VMXWGNGFhEKt
Hardware UUID: 4203018E-580F-C1B5-9525-B745CECA79EB
Provisioning UDID: 4203018E-580F-C1B5-9525-B745CECA79EB
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk1s5s1 380Gi 14Gi 210Gi 7% 568975 3981971425 0% /
/dev/disk1s4 380Gi 1.0Mi 210Gi 1% 1 3982540399 0% /System/Volumes/VM
/dev/disk1s2 380Gi 279Mi 210Gi 1% 685 3982539715 0% /System/Volumes/Preboot
/dev/disk1s6 380Gi 244Ki 210Gi 1% 14 3982540386 0% /System/Volumes/Update
/dev/disk1s1 380Gi 154Gi 210Gi 43% 3970663 3978569737 0% /System/Volumes/Data
Linux:
sudo cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
stepping : 1
microcode : 0xffffffff
cpu MHz : 2294.688
cache size : 51200 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt md_clear
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips : 4589.37
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
stepping : 1
microcode : 0xffffffff
cpu MHz : 2294.688
cache size : 51200 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt md_clear
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips : 4589.37
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
free
total used free shared buff/cache available
Mem: 7121288 467568 5682140 29240 971580 6319500
Swap: 4194300 0 4194300
Filesystem Size Used Avail Use% Mounted on
udev 3.4G 0 3.4G 0% /dev
tmpfs 696M 680K 695M 1% /run
/dev/sda1 84G 61G 23G 73% /
tmpfs 3.4G 8.0K 3.4G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.4G 0 3.4G 0% /sys/fs/cgroup
/dev/sda15 105M 3.7M 101M 4% /boot/efi
/dev/sdb1 14G 4.1G 9.0G 32% /mnt
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 86G 0 disk
├─sda1 8:1 0 85.9G 0 part /
├─sda14 8:14 0 4M 0 part
└─sda15 8:15 0 106M 0 part /boot/efi
sdb 8:16 0 14G 0 disk
└─sdb1 8:17 0 14G 0 part /mnt
sudo lsblk -o NAME,MOUNTPOINT,MODEL,ROTA
NAME MOUNTPOINT MODEL ROTA
sda Virtual Disk 1
├─sda1 / 1
├─sda14 1
└─sda15 /boot/efi 1
sdb Virtual Disk 1
└─sdb1 /mnt 1
Steps to resolve the issue: