Merge Sort Explained: A Data Scientist’s Algorithm Guide

The article includes a step by step explanation of the merge sort algorithm and code snippets illustrating the implementation of the algorithm itself.

Data Scientists deal with algorithms daily. However, the data science discipline as a whole has developed into a role that does not involve implementation of sophisticated algorithms. Nonetheless, practitioners can still benefit from building an understanding and repertoire of algorithms.

In this article, the sorting algorithm merge sort is introduced, explained, evaluated, and implemented. The aim of this post is to provide you with robust background information on the merge sort algorithm, which acts as foundational knowledge for more complicated algorithms.

Although merge sort is not considered to be complex, understanding this algorithm will help you recognize what factors to consider when choosing the most efficient algorithm to perform data-related tasks. Created in 1945, John Von Neumann developed the merge sort algorithm using the divide-and-conquer approach.

Divide and conquer

To understand the merge sort algorithm, you must be familiar with the divide and conquer paradigm, alongside the programming concept of recursion. Recursion within the computer science domain is when a method defined to solve a problem involves an invocation of itself within its implementation body.

In other words, the function calls itself repeatedly.

Visual illustration of recursion.
Figure 1. Visual illustration of recursion – Image by author.

Divide and conquer algorithms (which merge sort is a type of) employ recursion within its approach to solve specific problems. Divide and conquer algorithms decompose complex problems into smaller sub-parts, where a defined solution is applied recursively to each sub-part. Each sub-part is then solved separately, and the solutions are recombined to solve the original problem.

The divide-and-conquer approach to algorithm design combines three primary elements:

  • Decomposition of the larger problem into smaller subproblems. (Divide)
  • Recursive utilization of functions to solve each of the smaller subproblems. (Conquer)
  • The final solution is a composition of the solution to the smaller subproblems of the larger problem. (Combine)

Other algorithms use the divide-and-conquer paradigm, such as Quicksort, Binary Search, and Strassen’s algorithm.

Merge sort

In the context of sorting elements in a list and in ascending order, the merge sort method divides the list into halves, then iterates through the new halves, continually dividing them down further to their smaller parts.

Subsequently, a comparison of smaller halves is conducted, and the results are combined together to form the final sorted list.

Steps and implementation

Implementation of the merge sort algorithm is a three-step procedure. Divide, conquer, and combine.

The divide component of the divide-and-conquer approach is the first step. This initial step separates the overall list into two smaller halves. Then, the lists are broken down further until they can no longer be divided, leaving only one element item in each halved list.

The recursive loop in merge sort’s second phase is concerned with the list’s elements being sorted in a particular order. For this scenario, the initial array is sorted in ascending order.

In the following illustration, you can see the division, comparison, and combination steps involved in the merge sort algorithm.

Image showing the divide component of the merge sort algorithm.
Figure 2. Divide component illustration of the Merge sort algorithm—Image by Author.
Image showing the Conquer and combine component of the merge sort algorithm.
Figure 3. Conquer and combine components—Image by author.

To implement this yourself:

  • Create a function called merge_sort that accepts a list of integers as its argument. All following instructions presented are within this function.
  • Start by dividing the list into halves. Record the initial length of the list.
  • Check that the recorded length is equal to 1. If the condition evaluates to true, return the list as this means that there is just one element within the list. Therefore, there is no requirement to divide the list.
  • Obtain the midpoint for a list with a number of elements greater than 1. When using the Python language, the // performs division with no remainder. It rounds the division result to the nearest whole number. This is also known as floor division.
  • Using the midpoint as a reference point, split the list into two halves. This is the divide aspect of the divide-and-conquer algorithm paradigm.
  • Recursion is leveraged at this step to facilitate the division of lists into halved components. The variables ‘left_half’ and ‘right_half’ are assigned to the invocation of the ‘merge_sort’ function, accepting the two halves of the initial list as parameters.
  • The ‘merge_sort’ function returns the invocation of a function that merges two lists to return one combined, sorted list.
def merge_sort(list: [int]):
    list_length = len(list)
    if list_length == 1:
        return list
    mid_point = list_length // 2
    left_half = merge_sort(list[:mid_point])
    right_half = merge_sort(list[mid_point:])
    return merge(left_half, right_half)
  • Create a ‘merge’ function that accepts two lists of integers as its arguments. This function contains the conquer and combine aspects of the divide-and-conquer algorithm paradigm. All following steps are executed within the body of this function.
  • Assign an empty list to the variable ‘output’ that holds the sorted integers.
  • The pointers ‘i’ and ‘j’ are used to index the left and right lists, respectively.
  • Within the while loop, there is a comparison between the elements of both the left and right lists. After each comparison, the output list is populated within the two compared elements. The pointer of the list of the appended element is incremented.
  • The remaining elements to be added to the sorted list are elements obtained from the current pointer value to the end of the respective list.
def merge(left, right):
    output = []
    i = j = 0
    while (i 

Performance and complexity

Big O notation is a standard for defining and organizing the performance of algorithms in terms of their space requirement and execution time.

Merge sort algorithm time complexity is the same for its best, worst, and average scenarios. For a list of size n, the expected number of steps, minimum number of steps, and maximum number of steps for the merge sort algorithm to complete, are all the same.

As noted earlier in this article, the merge sort algorithm is a three-step process: divide, conquer, and combine. The ‘divide’ step involves the computation of the midpoint of the list, which, regardless of the list size, takes a single operational step. Therefore the notation for this operation is denoted as O(1).

The ‘conquer’ step involves dividing and recursively solving subarrays–the notation log n denotes this. The ‘combine’ step consists of combining the results into a final list; this operation execution time is dependent on the list size and denoted as O(n).

The merge sort notation for its average, best, and worst time complexity is log n * n * O(1). In Big O notation, low-order terms and constants are negligible, meaning the final notation for the merge sort algorithm is O(n log n). For a detailed analysis of the merge sort algorithm, refer to this article.


Merge sort performs well when sorting large lists, but its operation time is slower than other sorting solutions when used on smaller lists. Another disadvantage of merge sort is that it will execute the operational steps even if the initial list is already sorted. In the use case of sorting linked lists, merge sort is one of the fastest sorting algorithms to use. Merge sort can be used in file sorting within external storage systems, such as hard drives.

Key takeaways

This article describes the merge sort technique by breaking it down in terms of its constituent operations and step-by-step processes.

Merge sort algorithm is commonly used and the intuition and implementation behind the algorithm is rather straightforward in comparison to other sorting algorithms. This article includes the implementation step of the merge sort algorithm in Python.

You should also know that the time complexity of the merge sort method’s execution time in different situations, remains the same for best, worst, and average scenarios. It is recommended that merge sort algorithm is applied in the following scenarios:

  • When dealing with larger sets of data, use the merge sort algorithm. Merge sort performs poorly on small arrays when compared to other sorting algorithms.
  • Elements within a linked list have a reference to the next element within the list. This means that within the merge sort algorithm operation, the pointers are modifiable, making the comparison and insertion of elements have a constant time and space complexity.
  • Have some form of certainty that the array is unsorted. Merge sort will execute its operations even on sorted arrays, a waste of computing resources.
  • Use merge sort when there is a consideration for the stability of data. Stable sorting involves maintaining the order of identical values within an array. When compared with the unsorted data input, the order of identical values throughout an array in a stable sort is kept in the same position in the sorted output.

Inversion of a multivariable function

Hello everyone,

I’m pretty new to ML and tensorflow. What I’m trying to do now is practically to inverse a function using tensorflow.

So I have a function h=f(c1, c2,, T). It is a smooth function of all the variables. I want to train a model which would give me T given known values of c1…cn and h.

For now I’m using a keras.Sequential model with 2 or 3 dense layers.

For loss I use ‘mean_absolute_error’, For optimizer – Adam().

To train the model I generate a dataset using my h(c1…cn, T) function by varying its arguments and using values of T as train_labels.

The accuracy of the resulting model is not very good to my mind – I’m getting errors of about 10%. To my mind this is not very good, given that the training dataset is ideally smooth.

My questions are:

  1. Am I doing something particularly wrong?

  2. How many units should I provide for each layer? I mean in tutorials they are using either Dense(64) or Dense(1). What difference does it make in my particular case? Should it be proportional to the number of parameters of the model?

  3. May be I should use some other types of layers/optimizers/losses?

Thank you in advance for your replies!

submitted by /u/_padla_
[visit reddit] [comments]


Is it possible to serve models that have a distributed architecture, with multiple shards using tfx serving?

I’m planning to build a model that uses ParameterServerStrategy to distribute its parameters across multiple VMs with an assumption that it cannot fit into any one of the VMs. Is it possible to use TensorFlow serving to distribute the model across multiple VMs?

I was reading through this ( and I discovered that you need multiple servables for composite models that have multiple parts to them. But I couldn’t find any documentation that talks specifically about creating a cluster with tf serving, that uses multiple servables with each servable in a single VM.

Is it possible to use TensorFlow serving for very large models, where the models have its parameters spread across multiple VMs? If yes, can you please tell me how it can be done?

Thanks in advance!

submitted by /u/deathconqueror
[visit reddit] [comments]


An A-peel-ing GFN Thursday Sprouts 20+ New Games Coming to GeForce NOW in April

In addition to GFN Thursday, it’s National Tater Day. Hooray! To honor the spud-tacular holiday, we’re closing out March with seven new games streaming this week. And a loaded 20+ titles are coming to the GeForce NOW library in April to play — even on a potato PC, thanks to GeForce NOW. Plus, the GeForce Read article >

The post An A-peel-ing GFN Thursday Sprouts 20+ New Games Coming to GeForce NOW in April appeared first on NVIDIA Blog.


Data Science

Hi, very fresh to ML in general, I’m overwhelmed by choice for loss functions and accuracy measurements. Doing my masters project and my supervisor wants me to teach myself tf and throw the data through a model.

I’m feeding in an array of particle data from a decay, and asking for it to predict a 1 or -1 at the end representing parity of the original particle.

What loss fn and accuracy metric is best to use for something like this.

Other parameters, about 2 million events, 6particles in each decay/event. Their 4 momenta in each. That makes 24 in put strings.

First layer is a tf.flatten,

submitted by /u/DriftingRumour
[visit reddit] [comments]


Generating negative outputs with only nonnegative weights and biases.

I am trying to have a layer with only positive weights generate a partially negative output. What would the least hacky way to achieve this be? My initial idea was using a modified activation function that is shifted by 1 along the x axis, however, this feels a bit hacky to me and I was wondering if there was a better way to achieve this

submitted by /u/salted_kinase
[visit reddit] [comments]


Storing a large Tensor in multiple GPUs

Hi Guys,

Is there a way to store a tensor in multiple GPUs? I have a tensor that is so large that requires >16Gb of GPU RAM to store. I was wonder how that can be achieved. Thanks!

submitted by /u/dr_meme_69
[visit reddit] [comments]


Correct way to get output of hidden layers after replacing some hidden layers?


I am working on a project to replace layers by new layers to see if the changes affected positively or negatively. I want to then get the output feature map and input feature map after replacement. The issue I am having is that after a couple of changes, I get that I have multiple connections and a new column called ‘connnected to’ appears. Here are the summaries and the code I am using for replacing layers. I sometimes get this warning after replacing a convolutional layer with the code provided.

pastebin to warning and model summary

I have tried to create an Input layer and then use the same functional approach. My first layer being the input layer and the second the conv2d_0 layer. However, I get ValueError Disconnected from Graph for the input layer after two layer changes.


inputs = self.model.layers[0].input x = self.model.layers[0](inputs) for layer in self.model.layers[1:]: if == layer_name: new_layer = #creation of custom layer that generates output of same shape as replaced layer. x = new_layer(x) else: layer.trainable = False x = layer(x) self.model = tf.keras.Model(inputs, x) 

submitted by /u/ElvishChampion
[visit reddit] [comments]


Polestar’s Dennis Nobelius on the Sustainable Performance Brand’s Plans

Four words: smart, sustainable, Super Bowl. Polestar’s commercial during the big game made it clear no-compromise electric vehicles are now mainstream. Polestar Chief Operating Officer Dennis Nobelius sees driving enjoyment and autonomous-driving capabilities complementing one another in sustainable vehicles that keep driving — and the driver — front and center. NVIDIA’s Katie Washabaugh spoke with Read article >

The post Polestar’s Dennis Nobelius on the Sustainable Performance Brand’s Plans appeared first on NVIDIA Blog.


TTS mobile help

I’m trying to implement fastspeech_quat.tflite into a flutter app. I’m using tflite_flutter package. I’ve loaded up the model like this Interpreter _interpreter = await Interpreter.fromAsset(‘fastspeech_quant.tflite’);

Next I wanted to run an inference on some text so I would use _interpreter.runForMultipleInputs(input, output)

I just don’t understand how to format the input and output for the model. So I ran _interpreter.getInputTensors() and I get

[ Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6b16ef80, name: input_ids, type: TfLiteType.int32, shape: [1, 1], data: 4, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6b16eff0, name: speaker_ids, type: TfLiteType.int32, shape: [1], data: 4, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6b16f060, name: speed_ratios, type: TfLiteType.float32, shape: [1], data: 4, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6b16f0d0, name: f0_ratios, type: TfLiteType.float32, shape: [1], data: 4, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6b16f140, name: energy_ratios, type: TfLiteType.float32, shape: [1], data: 4 ]

_interpreter.getOutputTensors() give me

[Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6ae108e0, name: Identity, type: TfLiteType.float32, shape: [1, 1, 80], data: 320, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6ae118a0, name: Identity_1, type: TfLiteType.float32, shape: [1, 1, 80], data: 320, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6ae05820, name: Identity_2, type: TfLiteType.int32, shape: [1, 1], data: 4, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6ae04e10, name: Identity_3, type: TfLiteType.float32, shape: [1, 1], data: 4, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6ae03750, name: Identity_4, type: TfLiteType.float32, shape: [1, 1], data: 4]

I need an example of how I would go about it. I’ve combed through examples but it’s just not clicking for me.

submitted by /u/kai_zen_kid
[visit reddit] [comments]