prediction values vary in direct calls (both REST-API and gRPC) and call to TF-serving (we assume the TF prediction is correct, TF-serving is incorrect)

Hello everyone! I have a TensorFlow Serving related question. I have also posted it to StackOverflow ( but as no-one is answering there i thought to repost it here, maybe any of you has encountered such problems and can provide me some advice.

PROBLEM: prediction values vary in direct call and call to TF-serving (we assume the TF prediction is correct, TF-serving is incorrect)

SETUP: a TF model that uses Keras as backend and being saved as 2 files: model.h5 and architechture.json.

GOAL: predict using TF-serving


  1. In order to use the model on TF-serving I save the model into Saved_Model format:

model = model_from_json(json_string) model.load_weights(model_path) export_path = './gModel_Volume/3' tf.keras.models.save_model(model, export_path) 

there are several approaches to save it that we tried, all produce same saved_models

  1. Load the data from image is same for both setups:

    X = [] X.append(preprocess_input(img_to_array(load_img(img_url_str, target_size=(260,260))))) X = np.asarray(X)

3a. for plain TF to predict we use:

#load: json_string = open(architecture_path).read() gModel = model_from_json(json_string) gModel.load_weights(model_path) #predict: predicted = gModel.predict_on_batch(X) # predicted = 0.2... 

3b. for TF-servin via REST-API to predict we use:

dataToSend = X.tolist() predicted ='http://tf-serving:8501/v1/models/gModel_Volume:predict', json = {"signature_name": "serving_default", "instances": dataToSend}) # predicted = -1.17... 

3c. for TF-servin via gRPC to predict we use:

GRPC_MAX_RECEIVE_MESSAGE_LENGTH = 4096 * 4096 * 3 channel = grpc.insecure_channel('tf-serving:8500', options=[('grpc.max_receive_message_length', GRPC_MAX_RECEIVE_MESSAGE_LENGTH)]) stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) grpc_request = predict_pb2.PredictRequest() = 'gModel_Volume' grpc_request.model_spec.signature_name = 'serving_default' grpc_request.inputs['input_1'].CopyFrom(tf.make_tensor_proto(X)) result = stub.Predict(grpc_request,10) predicted = result.outputs['dense_1'].float_val[0] # predicted = -1.17... 

As one can see the results (0.2 and -1.17) are completely different. Same saved_model was used in all cases.

Note: TF-serving is run under docker-compose and the service “tf-serving” is using the official tensorflow/serving image

I would appreciate any hits and testing suggestions.

If there are better solutions to run multiple TF models in production than TF-serving (except Cloud-based solutions like SageMaker) please share some knowledge with me.


  • Batching is disabled by default settings, but I disabled it manually too
  • There are 4 different models that produce different type of output (single value, set of values, bitmap data). ALL 4 have different result while being served by TFServing and plain TF. Dimensions of the output tensors are fine, values they have are “wrong”.

submitted by /u/SG_Able
[visit reddit] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *