Wednesday, July 6, 2022
HomeArtificial IntelligenceProfiling Python Code

Profiling Python Code

Profiling is a method to determine how time is spent in a program. With these statistics, we will discover the “scorching spot” of a program and take into consideration methods of enchancment. Typically, a scorching spot in an surprising location might trace at a bug in this system as effectively.

On this tutorial, we’ll see how we will use the profiling facility in Python. Particularly, you will note:

  • How we will evaluate small code fragments utilizing the timeit module
  • How we will profile the complete program utilizing the cProfile module
  • How we will invoke a profiler inside an current program
  • What the profiler can’t do

Let’s get began.

Profiling Python Code. Picture by Prashant Saini. Some rights reserved.

Tutorial Overview

This tutorial is in 4 elements; they’re:

  • Profiling small fragments
  • The profile module
  • Utilizing profiler inside code
  • Caveats

Profiling Small Fragments

If you find yourself requested in regards to the other ways of doing the identical factor in Python, one perspective is to verify which one is extra environment friendly. In Python’s normal library, we’ve the timeit module that enables us to do some easy profiling.

For instance, to concatenate many brief strings, we will use the be a part of() perform from strings or the + operator. So, how do we all know which is quicker? Think about the next Python code:

This can produce a protracted string 012345.... within the variable longstr. An alternate strategy to write that is:

To match the 2, we will do the next on the command line:

These two instructions will produce the next output:

The above instructions are to load the timeit module and move on a single line of code for measurement. Within the first case, we’ve two strains of statements, and they’re handed on to the timeit module as two separate arguments. In the identical rationale, the primary command can be introduced as three strains of statements (by breaking the for-loop into two strains), however the indentation of every line must be quoted accurately:

The output of timeit is to search out the perfect efficiency amongst a number of runs (default to be 5). Every run is to run the offered statements just a few instances (which is dynamically decided). The time is reported as the common to execute the statements as soon as in the perfect run.

Whereas it’s true that the be a part of perform is quicker than the + operator for string concatenation, the timing above just isn’t a good comparability. It’s as a result of we use str(x) to make brief strings on the fly throughout the loop. The higher manner to do that is the next:

which produces:

The -s possibility permits us to offer the “setup” code, which is executed earlier than the profiling and never timed. Within the above, we create the record of brief strings earlier than beginning the loop. Therefore the time to create these strings just isn’t measured within the “per loop” timing. The above reveals that the be a part of() perform is 2 orders of magnitude sooner than the + operator. The extra frequent use of the -s possibility is to import the libraries. For instance, we will evaluate the sq. root perform from Python’s math module from NumPy and use the exponential operator ** as follows:

The above produces the next measurement, which we see that math.sqrt() is quickest whereas numpy.sqrt() is slowest on this explicit instance:

In case you marvel why NumPy is the slowest, it’s as a result of NumPy is optimized for arrays. You will note its distinctive pace within the following different:

the place the result’s:

In case you choose, you may also run timeit in Python code. For instance, the next might be much like the above however provide the complete uncooked timing for every run:

Within the above, every run is to execute the assertion 10,000 instances; the result’s as follows. You possibly can see the results of roughly 98 usec per loop in the perfect run:

The Profile Module

Specializing in an announcement or two for efficiency is from a microscopic perspective. Likelihood is, we’ve a protracted program and wish to see what’s inflicting it to run gradual. That occurs earlier than we will take into account different statements or algorithms.

A program working gradual can typically be resulting from two causes: An element is working gradual, or a component is working too many instances, including up and taking an excessive amount of time. We name these “efficiency hogs” the new spot. Let’s take a look at an instance. Think about the next program that makes use of a hill-climbing algorithm to search out hyperparameters for a perceptron mannequin:

Assume we saved this program within the file, we will run the profiler within the command line as follows:

and the output would be the following:

The conventional output of this system might be printed first, after which the profiler’s statistics might be printed. From the primary row, we see that the perform goal() in our program has run 101 instances and took 4.89 seconds. However these 4.89 seconds are principally spent on the features it known as, which the whole time spent on that perform is merely 0.001 seconds. The features from dependent modules are additionally profiled. Therefore you see quite a lot of NumPy features above too.

The above output is lengthy and is probably not helpful to you as it may be troublesome to inform which perform is the new spot. Certainly we will type the above output. For instance, to see which perform known as probably the most variety of instances, we will type by ncalls:

Its output is as follows: It says the get() perform from a Python dict is probably the most used perform (however it solely consumed 0.03 seconds in complete out of the 5.6 seconds to complete this system):

The opposite type choices are as follows:

Type string That means
calls Name rely
cumulative Cumulative time
cumtime Cumulative time
file File identify
filename File identify
module File identify
ncalls Name rely
pcalls Primitive name rely
line Line quantity
identify Operate identify
nfl Identify/file/line
stdname Normal identify
time Inner time
tottime Inner time

If this system takes a while to complete, it isn’t affordable to run this system many instances simply to search out the profiling lead to a special type order. Certainly, we will save the profiler’s statistics for additional processing as follows:

Much like the above, it is going to run this system. However this won’t print the statistics to the display however save them right into a file. Afterward, we will use the pstats module like the next to open up the statistics file and supply us a immediate to control the info:

For instance, we will use the type command to vary the type order and use stats to print what we noticed above:

You’ll discover that the stats command above permits us to offer an additional argument. The argument could be a common expression to seek for the features such that solely these matched might be printed. Therefore it’s a manner to offer a search string to filter.

This pstats browser permits us to see extra than simply the desk above. The callers and callees instructions present us which perform calls which perform, what number of instances it’s known as, and the way a lot time is spent. Therefore we will take into account that as a breakdown of the function-level statistics. It’s helpful you probably have quite a lot of features that decision one another and wish to know the way the time is spent in several situations. For instance, this reveals that the goal() perform known as solely by the hillclimbing() perform, however the hillclimbing() perform calls a number of different features:

Utilizing Profiler Inside Code

The above instance assumes you’ve gotten the entire program saved in a file and profile the complete program. Typically, we give attention to solely part of the complete program. For instance, if we load a big module, it takes time to bootstrap, and we wish to take away this from the profiler. On this case, we will invoke the profiler just for sure strains. An instance is as follows, which is modified from this system above:

It can output the next:


Utilizing profiler with Tensorflow fashions might not produce what you’ll count on, particularly you probably have written your individual customized layer or customized perform for the mannequin. In case you did it accurately, Tensorflow is meant to construct the computation graph earlier than your mannequin is executed, and therefore the logic might be modified. The profiler output will subsequently not present your customized lessons.

It’s the identical for some superior modules that contain binary code. The profiler can see you known as some features and marked them as “built-in” strategies, however it can’t go any additional into the compiled code.

Beneath is a brief code of the LeNet5 mannequin for the MNIST classification downside. In case you attempt to profile it and print the highest 15 rows, you will note {that a} wrapper is occupying nearly all of the time, and nothing will be proven past that:

Within the end result under, the TFE_Py_Execute is marked as a “built-in” methodology, and it consumes 30.1 sec out of the whole run time of 39.6 sec. Be aware that the tottime is similar because the cumtime, that means from the profiler’s perspective, it appears all time is spent at this perform, and it doesn’t name some other features. This illustrates the limitation of Python’s profiler.

Lastly, Python’s profiler provides you solely the statistics on time however not reminiscence utilization. Chances are you’ll must search for one other library or instruments for this goal.

Additional Readings

The usual library modules timeit, cProfile, and pstats have their documentation in Python’s documentation:

The usual library’s profiler may be very highly effective however not the one one. If you’d like one thing extra visible, you possibly can check out the Python Name Graph module. It could possibly produce an image of how features name one another utilizing the GraphViz software:

The limitation of not with the ability to dig into the compiled code will be solved by not utilizing Python’s profiler however as a substitute utilizing one for compiled packages. My favourite is Valgrind:

However to make use of it, you might must recompile your Python interpreter to activate debugging assist.


On this tutorial, we realized what a profiler is and what it might do. Particularly,

  • We all know how you can evaluate small code with the timeit module
  • We see Python’s cProfile module can present us with detailed statistics on how time is spent
  • We realized to make use of the pstats module in opposition to the output of cProfile to type or filter


Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments