Benchmark Your Model
| Field | Value |
|---|---|
| Difficulty | Beginner |
| Estimated Read Time | 5-10 minutes |
| Labels | benchmark, synthetic, latency, throughput, power |
Chapters 001 and 002 showed how to run a model once and then how to drive it asynchronously. This chapter answers the next practical question: "How fast does this model run on the device?" The benchmark API is intentionally small. You load the model, choose how many samples to measure, call benchmark(...), and read the returned BenchmarkReport.
The benchmark uses the model's input_specs() to create deterministic synthetic inputs. That makes it useful for a quick model smoke benchmark and for comparing compiled model variants, but it is not a camera benchmark. It does not include camera decode, real preprocessing variability, dynamic input sizes, or data-dependent postprocessing behavior.
Walkthrough
Load the model
Start with the same compiled .tar.gz archive used by the earlier model tutorials. No image is needed because the benchmark creates synthetic tensors from the model's declared input specs.
Construct simaai::neat::Model from the archive path.
simaai::neat::Model model(model_path);
Run the benchmark
Call benchmark(samples). The API warms up the async model runner, measures an async push/pull window, prints a summary to stdout, and returns the same headline values in a BenchmarkReport.
The sample count is the number of measured synthetic inputs. Use a larger number for steadier throughput and power numbers; use a smaller number when you only want a quick smoke check.
simaai::neat::BenchmarkReport report = model.benchmark(samples);
Read the report
The returned report keeps only the headline fields most users need: average end-to-end latency in milliseconds, throughput in frames per second, average board power in watts when available, and measured energy in joules when available.
Power telemetry depends on board support. If the runtime cannot sample power rails on the current target, the benchmark still reports latency and throughput and leaves the power fields at zero.
std::cout << "report_latency_ms=" << report.latency_ms << "\n";
std::cout << "report_fps=" << report.fps << "\n";
std::cout << "report_avg_power_watts=" << report.avg_power_watts << "\n";
std::cout << "report_energy_joules=" << report.energy_joules << "\n";
Run
Run it and you should see the benchmark summary printed by benchmark(), followed by the same values printed from the returned report. Run the Python and C++ (prebuilt) commands from the Neat install root (the directory that contains share/ and lib/); run the build from source commands from the repo root.
C++ (prebuilt):
./lib/sima-neat/tutorials/tutorial_003_benchmark_your_model \
--model /tmp/resnet_50.tar.gz --samples 100
C++ (build from source):
./build.sh --target tutorial_003_benchmark_your_model
./build/tutorials-standalone/tutorial_003_benchmark_your_model \
--model /tmp/resnet_50.tar.gz --samples 100
Expected output (exact numbers depend on the model, board, and current load; the C++ build also prints the trailing [OK] line):
NEAT Benchmark
Input: synthetic
Samples: 100
Latency: 12.4 ms
FPS: 80.6
Power avg: 2.3 W
Energy: 2.8 J
report_latency_ms=12.4
report_fps=80.6
report_avg_power_watts=2.3
report_energy_joules=2.8
[OK] 003_benchmark_your_model
To integrate this chapter's C++ source into your own project with a custom CMakeLists.txt (no extras folder required), see How to Run Tutorials on the landing page.
In Practice
Use this benchmark when you want a quick answer for a compiled model archive: does it run, what is the measured async throughput, and what are the headline board power numbers on this target?
For application performance, benchmark the real pipeline too. Synthetic model input is deliberately stable, so it does not represent camera jitter, codec cost, real preprocessing, host scheduling under load, or downstream application logic. For queue-depth and backpressure tuning with a hand-built async run, see Tune Throughput and Queue Depth.
Model::benchmark() requires concrete input_specs() dimensions. If an input shape is dynamic or non-concrete, the benchmark fails clearly instead of guessing a shape.
Full source
Show the complete C++ and Python programs
// Benchmark a compiled model with deterministic synthetic inputs.
//
// Usage:
// tutorial_003_benchmark_your_model --model /path/to/model.tar.gz [--samples 100]
#include "neat.h"
#include <iostream>
#include <stdexcept>
#include <string>
namespace {
bool get_arg(int argc, char** argv, const std::string& key, std::string& out) {
for (int i = 1; i + 1 < argc; ++i) {
if (key == argv[i]) {
out = argv[i + 1];
return true;
}
}
return false;
}
int parse_int_arg(int argc, char** argv, const std::string& key, int def) {
std::string value;
if (!get_arg(argc, argv, key, value))
return def;
return std::stoi(value);
}
} // namespace
int main(int argc, char** argv) {
try {
std::string model_path;
if (!get_arg(argc, argv, "--model", model_path)) {
std::cerr << "Usage: tutorial_003_benchmark_your_model --model <path> [--samples <n>]\n";
return 1;
}
const int samples = parse_int_arg(argc, argv, "--samples", 100);
// CORE LOGIC
simaai::neat::Model model(model_path);
simaai::neat::BenchmarkReport report = model.benchmark(samples);
std::cout << "report_latency_ms=" << report.latency_ms << "\n";
std::cout << "report_fps=" << report.fps << "\n";
std::cout << "report_avg_power_watts=" << report.avg_power_watts << "\n";
std::cout << "report_energy_joules=" << report.energy_joules << "\n";
std::cout << "[OK] 003_benchmark_your_model\n";
return 0;
} catch (const std::exception& e) {
std::cerr << "[FAIL] " << e.what() << "\n";
return 1;
}
}