Performance Counter Peripheral
A performance-counter unit is just a block of big counters for timing sections in your software.
This block lets you accurately measure execution-time taken by blocks C-code. Simple, efficient, minimally-intrusive macros allow you to mark the start and end of blocks-of-interest in your program. Each block-of-interest is called a section.
This peripheral has a measurement start/stop feature that lets you measure each section as a fraction of some larger program (or enclosing task).
This block
to keep track of as many sections as you like (the default is 3). You change
the number of sections in the GUI. Choose the number of sections you want
to measure. The peripheral will be
generated with one additional counter-pair,
the “Global Counter.”
See below.
This peripheral will contain two counters for every section:
Example Usage:
If you had some function, and you wanted to know how much of your execution time it was taking, you would modify your C source-file like this:
#include "altera_avalon_performance_counter.h"
#include "system.h"
// Use section-counter #1 to measure this function:
// NOTE: -Never- use counter #0. Your counter-numbers
// start at 1.
//
#define INTERESTING_FUNCTION_SECTION 1
int my_interesting_function (int a, int b, ....)
{
int result;
PERF_BEGIN (PERF_UNIT_BASE, INTERESTING_FUNCTION_SECTION);
..body of subroutine...
PERF_END (PERF_UNIT_BASE, INTERESTING_FUNCTION_SECTION);
return result;
}
Then you would need to turn measurement on & off only for the "interesting" part of your program, like this:
#include "altera_avalon_performance_counter.h"
#include "system.h"
int main(){
// Reset the counters before every run
PERF_RESET (PERF_UNIT_BASE);
// First, do things that we don't want to measure:
//
get_user_input_that_might_take_arbitrarily_long();
// Now our program starts in earnest. Begin measuring:
//
PERF_START_MEASURING (PERF_UNIT_BASE);
while (things_to_do()) {
my_interesting_function (a, b);
my_boring_function (c,d);
my_time_consuming_function (e,f);
}
PERF_STOP_MEASURING (PERF_UNIT_BASE);
clean_up();
// Here is where we print-out the results for
// so we can see what happened. The report goes to
// STDOUT.
//
perf_print_formatted_report (PERF_UNIT_BASE,
ALT_CPU_FREQ, // defined in "system.h"
1, // How many sections to print
"interesting_fn"); // Display-name of section(s).
return 0;
}
You can simultaneously measure as many sections as you like (up to <module_name>_HOW_MANY_COUNTERS, defined in "system.h").
Macros
To measure a section in your code, surround it with the macros PERF_BEGIN and PERF_END. These macros are very efficient, typically requiring only two or three machine instructions. All of the macros needed to control the performance counter peripheral are defined in the file “altera_avalon_performance_counter.h”.
Viewing the Results.
Most commonly, the results from the performance-counters are printed to the STDOUT device. To read the results and print a nicely-formatted summary-table, call this function:
int perf_print_formatted_report (void* perf_base,
alt_u32 clock_freq_hertz,
int num_sections, ...);
An example call to this function might look like this:
perf_print_formatted_report (PERF_BASE, ALT_CPU_FREQ, 3,
"DAC Wait",
"MLA loop",
"synth_frame" );
The table, sent to STDOUT, will look like this:
--Performance Counter Report--
Total Time: 205.991 seconds (20599054185 clock-cycles)
+---------------+-----+-----------+---------------+-----------+
| Section | % | Time (sec)| Time (clocks)|Occurrences|
+---------------+-----+-----------+---------------+-----------+
|DAC Wait | 35.9| 73.93756| 7393755930| 1728|
+---------------+-----+-----------+---------------+-----------+
|MLA loop | 33.5| 69.10533| 6910532857| 566784|
+---------------+-----+-----------+---------------+-----------+
|synth_frame | 39.9| 82.15605| 8215605100| 7872|
+---------------+-----+-----------+---------------+-----------+
Alternatively, your program can read the performance-counters directly by using these functions:
unsigned long long perf_get_section_time
(void* hw_base_address, int which_section);
unsigned long long perf_get_num_starts
(void* hw_base_address, int which_section);
Using straightforward arithmetic, you can compute the total "hard" runtime, in real seconds, taken by each section (using the constant macro ALT_CPU_FREQ, defined in "system.h").
When calling these retrieval-functions, Section #0 is the special "Global" section which keeps track of the total elapsed time, and the total number of measurement start/stops, for the entire run.
You can also compute the actual percent-time-taken by each section by dividing the section-time by the overall measurement time (the overall measurement time is returned by perf_get_section_time for section #0).
Register Map
31 7 6 5 4 3 2 1 0
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
Tlo_0 | Global Time Counter [31: 0] | 0
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
Tlo_0 | Global Time Counter [63:32] | 4
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
Ev_0 | Global Measurement-Start Counter | 8
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
- | --reserved-- | C
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
Tlo_1 | Section 1 Time Counter [31: 0] | 10
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
Tlo_1 | Section 1 Time Counter [63:32] | 14
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
Ev_1 | Section 1 Start Counter | 18
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
- | --reserved-- | 1C
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
Tlo_2 | Section 2 Time Counter [31: 0] | 20
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
Tlo_2 | Section 2 Time Counter [63:32] | 24
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
Ev_2 | Section 2 Start Counter | 28
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
- | --reserved-- | 2C
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
| |
~ ... ~
~ ... ~
| |
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
Tlo_N | Section N Time Counter [31: 0] | n0
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
Tlo_N | Section N Time Counter [63:32] | n4
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
Ev_N | Section N Start Counter | n8
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
- | --reserved-- | nC
+---+-/../--+-----+-----+-----+-----+-----+-----+-----+-----+
Global Counters
This unit uses section #0 as a special "global" section, which counts the total time during which measurements are being taken. None of the other section-counters are allowed to run at all (not even the other event counters) when the global time-counter is stopped.
Special macros (PERF_START_MEASURING and PERF_STOP_MEASURING) are defined to control the global counters. Users should not manipulate the global counters directly through PERF_BEGIN and PERF_END.
Limitations
The performance counter reporting function, perf_print_formatted_report(), uses floating point numbers to display time
in seconds and percentages. If
"Small C Library" is selected in the Nios II IDE System Library
Properties dialog for your project, these floating point values will not be
displayed correctly at runtime. When the "Small C Library" is
enabled, printf() does not support floating point numbers, which
causes the incorrect display.
Clock-cycle and occurrence displays will continue to be printed correctly.