-
Notifications
You must be signed in to change notification settings - Fork 174
Description
After PR #6787 exec has become more stable in the concurrent and container shutdown paths. However, it still suffers from some race condition style failures. This has mainly been seen in the CI environment and has passed locally(consecutively) against a high resource complex VSAN deployment. below is a short catalog of the failures that have been seen that will cause intermittent failures:
Simple Concurrent Exec:
- Failed with a
Feb 26 2018 16:27:06.142Z ERROR op=301.310: CommitHandler error on handle(c13c27abff7908e6146498325fa944c3) for 46bec94ca67b332d0f924df4f4d7c84168693b876a8aa15658050b2e3b2bf46e: The operation is not allowed in the current state.
Exec During Poweroff Of A Container Performing A Long Running Task:
- Failed because all execs actually succeeded(might need to improve test expectations)
- Container reported being in an invalid state.
Exec During Poweroff Of A Container Performing A Short Running Task:
- Long running exec returned an rc of 1 even though the output looked present.(looks successful in the portlayer of Exec-Failure-1)
REFERENCE LOGS:
Exec-CI-Failure-1-16518.zip
Exec-CI-Failure-2-16515.zip
Exec-Failure-3-FROM-FULL-CI.html.zip
Exec-Failure-3-FROM-FULL-CI-16510.zip
Currently using the following for basic concurrency testing:
c=1;date;id=`docker ps -q | awk '{print$1}'`; for i in `seq 1 $c`; do docker ^Cec $id /bin/echo /tmp/$i &done;for i in `seq 1 $c`;do wait %$i 2>/dev/null;done;dateTODO:
- investigate why vCenter reconfigure slows down over time with number of execs. Could be number of extraconfig keys in the VM config (we should not be sending the entire set to VC, but VC may be relaying the entire set to ESX - compare/contrast with ESX directy). Could also be that reconfigures result in accrual of state, irrespective of the keys.