Ligtweight profiling for your Coq hacks

3 minute read

Published: August 11, 2011

Let’s suppose we want to hack a few functions in the Coq compiler. The classic situation is that we have the base implementation of our function as it stands, and an optimized version for which we want to check it really improves things.

We all know Donald Knuth’s line on “premature optimization” …

Crucially, we want execution time information, not just call count information: there is no reason our “optimized” subroutine, deep in the entrails of the compiler, to be called a different number of times.

The easiest way to profile Coq is to compile it as native code, passing the -profile option at the execution of the configure script:

huitseeker@fuwaad:/opt/coq/trunk$ ./configure --help| grep -A1 -e '-profile'
-profile
    Add profiling information in the Coq executables

This will in fact modify the native compilation using ocamlopt, so that it generates profiling information by adding calls to the _mcount hook, at each ML definition. The details are in the OCaml manual

Once you have called the configure script with the appropriate options, and recompiled Coq, you can execute your toplevel as usual:

coqtop.opt  -q  -dont-load-proofs -I .    -compile mytheoryfile

This will generate a gmon.out file that you can then parse with gprof.

gprof /opt/coq/trunk/bin/coqtop.opt gmon.out &gt; ~/gmonoutput.log

Here is a quick guide on how to interpret the output.

However, the problem then becomes that on a number of proof files, Coq spends most of its time within reduction primitives … as can be expected, I guess. This often dwarfs the mesurements for our specific function in reports from huge reduction loops that don’t concern it. Thankfully, a more lightweight option exists in the source of the compiler, hidden in library/profile.ml. It’s documented in comments in library/profile.mli:

> To trace a function f you first need to get a key for it by using :

> let fkey = declare_profile “f”;;

> (the string is used to print the profile infomation). Warning: this function does a side effect. Choose the ident you want instead “fkey”. Then if the function has ONE argument add the following just after the definition of f or just after the declare_profile if this one follows the definition of f.

> let f = profile1 fkey f;;

> If f has two arguments do the same with profile2, idem with 3, … For more arguments than provided in this module, make a new copy of function profile and adapt for the needed arity.

This is really the 5-minute way to profile your Coq hack (modulo recompilation time, of course) : there is an example of profiling of a 4-argument function in the comments of kernel/reductionops.ml, and Coq even has the niceness of already calling Profile.print_profile() for you at the end of the toplevel execution. You get a result such as :

Function name              Own time Total time  Own alloc Tot. alloc     Calls
f                            0.28      0.47        116        116      5      4
h                            0.19      0.19          0          0      4      0
g                            0.00      0.00          0          0      0      0
others                       0.00      0.47        392        508             9
Est. overhead/total          0.00      0.47       2752       3260

Ocamlviz is another option, but we don’t care much about real-time profiling information. Running the profiler as server corresponds to another usecase : optimizing a proof script rather than the compiler itself. Besides, it requires linking with the libocamlviz library, which is somehow difficult to integrate to the monolithic (and somewhat labyrinthic) build process of Coq (especially when aiming to analyze just a few functions).
You may also want to have a look at this issue of the ocaml newsletter, it details a few of the alternatives for profiling ocaml code, including oprofile, which I haven’t tested.

Share on

Twitter Facebook LinkedIn

François Garillot

Ligtweight profiling for your Coq hacks

Share on

You May Also Enjoy

Drilling down on Rust Performance Bottlenecks with tokio-tracing and texray

Byzantine-Consistent Broadcast- A Promising Yet Challenging Frontier in Digital Asset Transfers

PSA: maven central and sonatype are slow

A June 2016 roundup of distributed Deep Learning projects on Apache Spark