-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
Often see this failure in running Native unit tests. Co-pilot is suggesting these fixes. Lets understand these more deeply and do follow-up fixes.
Root Cause:
The job is failing due to improper cleanup of "Zombie Tasks" in Velox execution (see TaskManager.cpp), which are tasks referenced by more than one owner, blocking cleanup. There are also multiple frequent errors with Aborted for external error, Source: RUNTIME, ErrorCode: INVALID_STATE from Task.cpp. Some test cases report memory cap exceeded errors (ErrorCode: MEM_CAP_EXCEEDED).
How to Fix:
-
Zombie Task Cleanup
- In
TaskManager.cpp, update the logic for handling extra references for zombie tasks to reliably clean up such tasks, possibly by ensuring references are released on error/abort. Consider adding debug logging to trace task ownership. - Example: Locate code in
TaskManager.cppresponsible for cleanup (line ~313, see TaskManager.cpp#313) and review all increments/decrements of references. - Code Suggestion:
// After abort/failure, explicitly release references for tasks if safe. if (task->state() == TaskState::kAborted && task->extraRefs() > 0) { task->releaseExtraRefs(); // You may need to implement this safely }
- In
-
Memory Capping Issues
- The unit tests
TaskManagerTest.outOfQueryUserMemoryand others hit the memory cap quickly. Consider increasing test memory caps or modifying memory tracking mechanisms to avoid premature aborts for test scenarios. - Code Suggestion: If using test-specific config, raise memory limits for these tests.
- The unit tests
-
INVALID_STATE Handling
- Many errors stem from invalid states in
Task.cpparound terminate (see Task.cpp#2443). Ensure that state transitions are atomic and error paths do not leave tasks in dangling/invalid states, and always attempt cleanup on errors. - Example change:
// Wrap error-prone sections in proper state management if (isAbortedByExternalError) { setState(TaskState::kAborted); cleanup(); // Ensure cleanup is always called }
- Many errors stem from invalid states in
References:
- TaskManager.cpp#313
- Task.cpp#2443
- Job logs show repeated failures at zombie task cleanup and external abort error paths.
Summary:
- Refactor zombie task cleanup so dangling references do not block deletion.
- Improve memory tracking and/or raise test memory caps for unit tests.
- Ensure all aborted/failed tasks are transitioned safely to final states and cleaned up, not left "referenced."
- Review/strengthen error handling in test cases for robust resource cleanup.
If you want code for the memory cap config, details on the task ownership logic, or need to see specific tests, let me know.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status