-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Ask your question here:
Hi,
We have C++ grpc service running. We are using KNative serving to do autoscaling of pods based on number of input requests. Auto scale up and down happens nicely with KNative,
but we do have some requests getting error when the requests are going to terminating state pods, client gets stream error when the pods gets killed after termination grace period if the request is still in progress.
This doesn't happen always. We have observed that even some pods are in terminating state for about 5mins, new requests coming during this period will go to other pods or new pods get created,
but at times we see that new requests going into terminating pods that is causing error.
We had tried handling SIGTERM to do server shutdown, but didnt help much.
We see new requests going to terminating pods and error happening more frequently.
I wanted to understand, how do we make KNative to not to send new requests when the pod goes to terminating state.
Highly appreciate your suggestion.
Here is my service code:
std::unique_ptrgrpc::Server server;
//thread function
void doShutdown()
{
cout << "Entering doShutdown" << endl;
//getchar(); // press a key to shutdown the thread
auto deadline = std::chrono::system_clock::now() +
std::chrono::milliseconds(300);
server->Shutdown(deadline);
//server->Shutdown();
std::cout << "Server is shutting down. " << std::endl;
}
void signal_handler(int signal_num)
{
//std::lock_guardstd::recursive_mutex lock(server_mutex);
cout << "The interrupt signal is (" << signal_num
<< "). \n";
LOG_INFO(LogLayer::Application) << "The interrupt signal is " << signal_num;
switch (signal_num)
{
case SIGINT:
std::puts("It was SIGINT");
LOG_INFO(LogLayer::Application) << "It was SIGINT called";
break;
case SIGTERM:
std::puts("It was SIGTERM");
LOG_INFO(LogLayer::Application) << "It was SIGTERM called";
break;
default:
break;
}
// It terminates the program
LOG_INFO(LogLayer::Application) << "Calling Server Shutdown ";
cout << "Calling Server Shutdown" << endl;;
std::thread t = std::thread(doShutdown);
LOG_INFO(LogLayer::Application) << "Call exit() ";
cout << "Calling exit()" << endl;
t.join();
//exit(0);
}
int appMain(const variables_map &values)
{
const auto port = boost::any_caststd::string(values[Services_Common_Options::PORT].value());
MyServiceImpl my_service;
grpc::EnableDefaultHealthCheckService(true);
grpc::reflection::InitProtoReflectionServerBuilderPlugin();
grpc::ServerBuilder builder;
builder.AddChannelArgument(GRPC_ARG_KEEPALIVE_TIME_MS, 1000 * 60 * 1);
builder.AddChannelArgument(GRPC_ARG_KEEPALIVE_TIMEOUT_MS, 1000 * 10);
builder.AddChannelArgument(GRPC_ARG_HTTP2_MIN_SENT_PING_INTERVAL_WITHOUT_DATA_MS, 1000 * 10);
builder.AddChannelArgument(GRPC_ARG_HTTP2_MAX_PINGS_WITHOUT_DATA, 0);
builder.AddChannelArgument(GRPC_ARG_KEEPALIVE_PERMIT_WITHOUT_CALLS, 1);
//TODO: use secure SSL connection
builder.AddListeningPort(port, grpc::InsecureServerCredentials());
// Register "service" as the instance through which we'll communicate with
// clients. In this case it corresponds to an synchronous service.
builder.RegisterService(&my_service);
// Finally assemble the server.
server = builder.BuildAndStart();
LOG_INFO(LogLayer::Application) << SERVICE_NAME << " listening on " << port;
/std::signal(SIGTERM, signal_handler);
std::signal(SIGSEGV, signal_handler);
std::signal(SIGINT, signal_handler);
std::signal(SIGABRT, signal_handler);/
// Wait for the server to shutdown. Note that some other thread must be
// responsible for shutting down the server for this call to ever return.
cout << "Server waiting " << endl;
server->Wait();
LOG_INFO(LogLayer::Application) << "Server Shutdown ";
cout << "Server exited " << endl;
return 0;
}
and here is my KNative service yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: MyKnativeService
spec:
template:
metadata:
name: MyKnativeService-rev1
annotations:
Target 10 in-flight-requests per pod.
#autoscaling.knative.dev/target: "1"
container-concurrency-target-percentage: "80"
autoscaling.knative.dev/targetUtilizationPercentage: "100"
#autoscaling.knative.dev/metric: "concurrency"
autoscaling.knative.dev/initialScale: "0"
autoscaling.knative.dev/minScale: "0"
autoscaling.knative.dev/maxScale: "100"
autoscaling.knative.dev/scaleDownDelay: "3m"
spec:
containerConcurrency: 1
containers:
- name: MyKnativeService_container
image: ppfaservice:latest
imagePullPolicy: Always