Skip to content

resources: flaky issues on lack of memory #98

Open
@avtikhon

Description

@avtikhon

Tarantool 2.8.0-114-g9ccd4eab6
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=ON
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror
CXX_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror

OS: Linux

Check issues at #93

Reproduce on dev1:

Memory began to hang

Total Memory Swap # of test runs RSS Memory Time
2Gb 10 4 secs
2Gb 12 13 secs
2Gb 14 timeout
4Gb 24 4 secs
4Gb 26 timeout
8Gb 4Gb 48 7652044800 2 m 36 secs
8Gb 4Gb 50 7691522048 2 m 42 secs
8Gb 4Gb 56 8315879424 3 m 05 secs
8Gb 4Gb 58 8336347136 OOM + hanged container & host

Other tests from box/ suite (8 Gb | 4Gb):

Test # of test runs RSS Memory Time OOM on # runs Memory per test
access 58 6183579648 1 m 25 secs 107 Mb
blackhole 72 8306712576 1 m 14 secs 80 115 Mb
func_reload 90 7675752448 0 m 32 secs 100 85 Mb
gh-5135-invalid-upsert 110 8098770944 0 m 29 secs 120 74 Mb
gh-5422-broken_snapshot 56 8315879424 3 m 05 secs 58 150 Mb
iterator 110 8132214784 0 m 38 secs 120 74 Mb
misc 110 8134795264 1 m 00 secs 120 74 Mb
net_msg_max 3 7678414848 0 m 03 secs 4 2.5 Gb

Tool atop could not show the real issue in RSS, due to hanged itself (check RGROW):

THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST S CPU CMD
4 0.02s 0.12s 713.4M 71644K 0K 8K N- S 13% tarantool
1 0.05s 0.05s 167.7M 49160K 0K 140K N- S 0% python2

Try memory overload for container run with:
--cpus=2 --memory=8G --memory-swap=12G --memory-reservation=8G:

  1. run in container:
rm -rf rss_persec.log ; ( ( while date && sleep 1 ; do cat /sys/fs/cgroup/memory/memory.stat ; done ) >>rss_persec.log & echo $! >rss.pid & ) ; ( export PATH=$PATH:/tnt/src ; export REPLICATION_SYNC_TIMEOUT=2500 ; export TEST_TIMEOUT=2510 ; export NO_OUTPUT_TIMEOUT=2520 ; date ; time ./test-run.py -j 1200 --builddir /tnt --vardir var_hdd_vinyl `for r in {1..64} ; do echo box/gh-5422-broken_snapshot. ; done` --force 2>&1 ; sleep 1 ; kill -USR2 `cat rss.pid` ; date ) > test.log &
  1. run on host which runs container:
docker events
2021-03-16T08:21:57.833804548+03:00 container oom 7129eaff03192cfda5896e17ae3506935cc59c2c6310015b47ed28ffc7c41cc0 (image=registry.gitlab.com/tarantool/tarantool/testing/debian-stretch, name=goofy_mcnulty)
2021-03-16T08:22:31.626244711+03:00 container oom 7129eaff03192cfda5896e17ae3506935cc59c2c6310015b47ed28ffc7c41cc0 (image=registry.gitlab.com/tarantool/tarantool/testing/debian-stretch, name=goofy_mcnulty)
2021-03-16T08:22:36.535503324+03:00 container oom 7129eaff03192cfda5896e17ae3506935cc59c2c6310015b47ed28ffc7c41cc0 (image=registry.gitlab.com/tarantool/tarantool/testing/debian-stretch, name=goofy_mcnulty)
  1. check in container RSS maximums with:
grep total_rss\  rss_persec.log | sort

Try disk overload:

# start docker container with limitations in memory and enabled swap
docker run --network=host -v /export/avtikhon/src:/source -ti --cpus=40 --memory=2G --memory-swap=-1 --memory-reservation=1G registry.gitlab.com/tarantool/tarantool/testing/debian-stretch

# check available memory size with
/sys/fs/cgroup/memory/memory.limit_in_bytes

# run tests
( export PATH=$PATH:/tnt/src; export REPLICATION_SYNC_TIMEOUT=500; export TEST_TIMEOUT=510; export NO_OUTPUT_TIMEOUT=520; date; time ./test-run.py -j 1200 --builddir /tnt --vardir var_hdd_vinyl `for r in {1..12} ; do echo box/gh-5422-broken_snapshot ; done` --force 2>&1; sleep 1; kill -USR2 `cat atop.pid`; date ) > test_atop.log

Disks usage log from atop:

LVM | dm-3 | busy 718% | | read 60409 | write 501 | KiB/r 22 | KiB/w 4 | | MBr/s 1343.0 | MBw/s 2.0 | avq 12.84 | | avio 0.13 ms |
LVM | dm-2 | busy 10% | | read 612 | write 694 | KiB/r 18 | KiB/w 5 | | MBr/s 10.9 | MBw/s 3.7 | avq 7.85 | | avio 0.08 ms |
MDD | md1 | busy 0% | | read 60361 | write 519 | KiB/r 22 | KiB/w 3 | | MBr/s 1342.3 | MBw/s 1.9 | avq 0.00 | | avio 0.00 ms |
DSK | sda | busy 711% | | read 30011 | write 258 | KiB/r 22 | KiB/w 7 | | MBr/s 662.9 | MBw/s 2.0 | avq 6.67 | | avio 0.25 ms |
DSK | sdb | busy 697% | | read 29744 | write 258 | KiB/r 23 | KiB/w 7 | | MBr/s 679.0 | MBw/s 2.0 | avq 6.33 | | avio 0.25 ms |

Github Actions use hosts:
OSX:

Hardware:

    Hardware Overview:

      Model Name: Mac
      Model Identifier: VMware7,1
      Processor Name: Unknown
      Processor Speed: 3.33 GHz
      Number of Processors: 1
      Total Number of Cores: 3
      L2 Cache (per Core): 256 KB
      L3 Cache: 12 MB
      Memory: 14 GB
      System Firmware Version: VMW71.00V.13989454.B64.1906190538
      Apple ROM Info: [MS_VM_CERT/SHA1/27d66596a61c48dd3dc7216fd715126e33f59ae7]Welcome to the Virtual Machine
      SMC Version (system): 2.8f0
      Serial Number (system): VMXWGNGFhEKt
      Hardware UUID: 4203018E-580F-C1B5-9525-B745CECA79EB
      Provisioning UDID: 4203018E-580F-C1B5-9525-B745CECA79EB

Filesystem       Size   Used  Avail Capacity iused      ifree %iused  Mounted on
/dev/disk1s5s1  380Gi   14Gi  210Gi     7%  568975 3981971425    0%   /
/dev/disk1s4    380Gi  1.0Mi  210Gi     1%       1 3982540399    0%   /System/Volumes/VM
/dev/disk1s2    380Gi  279Mi  210Gi     1%     685 3982539715    0%   /System/Volumes/Preboot
/dev/disk1s6    380Gi  244Ki  210Gi     1%      14 3982540386    0%   /System/Volumes/Update
/dev/disk1s1    380Gi  154Gi  210Gi    43% 3970663 3978569737    0%   /System/Volumes/Data

Linux:

sudo cat /etc/os-release

NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

cat /proc/cpuinfo

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 79
model name	: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
stepping	: 1
microcode	: 0xffffffff
cpu MHz		: 2294.688
cache size	: 51200 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt md_clear
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 4589.37
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 79
model name	: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
stepping	: 1
microcode	: 0xffffffff
cpu MHz		: 2294.688
cache size	: 51200 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt md_clear
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 4589.37
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

free

              total        used        free      shared  buff/cache   available
Mem:        7121288      467568     5682140       29240      971580     6319500
Swap:       4194300           0     4194300
Filesystem      Size  Used Avail Use% Mounted on
udev            3.4G     0  3.4G   0% /dev
tmpfs           696M  680K  695M   1% /run
/dev/sda1        84G   61G   23G  73% /
tmpfs           3.4G  8.0K  3.4G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.4G     0  3.4G   0% /sys/fs/cgroup
/dev/sda15      105M  3.7M  101M   4% /boot/efi
/dev/sdb1        14G  4.1G  9.0G  32% /mnt
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda       8:0    0   86G  0 disk 
├─sda1    8:1    0 85.9G  0 part /
├─sda14   8:14   0    4M  0 part 
└─sda15   8:15   0  106M  0 part /boot/efi
sdb       8:16   0   14G  0 disk 
└─sdb1    8:17   0   14G  0 part /mnt

sudo lsblk -o NAME,MOUNTPOINT,MODEL,ROTA

NAME    MOUNTPOINT MODEL            ROTA
sda                Virtual Disk        1
├─sda1  /                              1
├─sda14                                1
└─sda15 /boot/efi                      1
sdb                Virtual Disk        1
└─sdb1  /mnt                           1

Steps to resolve the issue:

  1. Add memory usage profiling for running tests test-run#277

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions