Benchmark Your Model

Field	Value
Difficulty	Beginner
Estimated Read Time	5-10 minutes
Labels	`benchmark`, `synthetic`, `latency`, `throughput`, `power`

Chapters 001 and 002 showed how to run a model once and then how to drive it asynchronously. This chapter answers the next practical question: "How fast does this model run on the device?" The benchmark API is intentionally small. You load the model, choose how many samples to measure, call benchmark(...), and read the returned BenchmarkReport.

The benchmark uses the model's input_specs() to create deterministic synthetic inputs. That makes it useful for a quick model smoke benchmark and for comparing compiled model variants, but it is not a camera benchmark. It does not include camera decode, real preprocessing variability, dynamic input sizes, or data-dependent postprocessing behavior.

Walkthrough

Load the model

Start with the same compiled .tar.gz archive used by the earlier model tutorials. No image is needed because the benchmark creates synthetic tensors from the model's declared input specs.

Construct simaai::neat::Model from the archive path.

tutorials/003_benchmark_your_model/benchmark_your_model.cpp
simaai::neat::Model model(model_path);

Run the benchmark

Call benchmark(samples). The API warms up the async model runner, measures an async push/pull window, prints a summary to stdout, and returns the same headline values in a BenchmarkReport.

The sample count is the number of measured synthetic inputs. Use a larger number for steadier throughput and power numbers; use a smaller number when you only want a quick smoke check.

tutorials/003_benchmark_your_model/benchmark_your_model.cpp
simaai::neat::BenchmarkReport report = model.benchmark(samples);
if (report.latency_ms <= 0.0 || report.fps <= 0.0)
  throw std::runtime_error("benchmark produced no measured latency/fps");

Read the report

The returned report keeps only the headline fields most users need: average end-to-end latency in milliseconds, throughput in frames per second, average board power in watts when available, and measured energy in joules when available.

Power telemetry depends on board support. If the runtime cannot sample power rails on the current target, the benchmark still reports latency and throughput and leaves the power fields at zero.

tutorials/003_benchmark_your_model/benchmark_your_model.cpp
std::cout << "report_latency_ms=" << report.latency_ms << "\n";
std::cout << "report_fps=" << report.fps << "\n";
std::cout << "report_avg_power_watts=" << report.avg_power_watts << "\n";
std::cout << "report_energy_joules=" << report.energy_joules << "\n";

Run

Run it and you should see the benchmark summary printed by benchmark(), followed by the same values printed from the returned report. Run the Python and C++ (prebuilt) commands from the Neat install root (the directory that contains share/ and lib/); run the build from source commands from the repo root.

C++ (prebuilt):

./lib/sima-neat/tutorials/tutorial_003_benchmark_your_model \
  --model /tmp/resnet_50.tar.gz --samples 100

C++ (build from source):

./build.sh --target tutorial_003_benchmark_your_model
./build/tutorials-standalone/tutorial_003_benchmark_your_model \
  --model /tmp/resnet_50.tar.gz --samples 100

Expected output (exact numbers depend on the model, board, and current load; the C++ build also prints the trailing [OK] line):

NEAT Benchmark
Input: synthetic
Samples: 100
Latency:      12.4 ms
FPS:          80.6
Power avg:    2.3 W
Energy:       2.8 J
report_latency_ms=12.4
report_fps=80.6
report_avg_power_watts=2.3
report_energy_joules=2.8
[OK] 003_benchmark_your_model

To integrate this chapter's C++ source into your own project with a custom CMakeLists.txt (no extras folder required), see How to Run Tutorials on the landing page.

In Practice

Use this benchmark when you want a quick answer for a compiled model archive: does it run, what is the measured async throughput, and what are the headline board power numbers on this target?

For application performance, benchmark the real pipeline too. Synthetic model input is deliberately stable, so it does not represent camera jitter, codec cost, real preprocessing, host scheduling under load, or downstream application logic. For queue-depth and backpressure tuning with a hand-built async run, see Tune Throughput and Queue Depth.

Model::benchmark() requires concrete input_specs() dimensions. If an input shape is dynamic or non-concrete, the benchmark fails clearly instead of guessing a shape.

Full source

Show the complete source programs

tutorials/003_benchmark_your_model/benchmark_your_model.cpp
// Benchmark a compiled model with deterministic synthetic inputs.
//
// Usage:
//   tutorial_003_benchmark_your_model --model /path/to/model.tar.gz [--samples 100]

#include "neat.h"

#include <iostream>
#include <stdexcept>
#include <string>

namespace {

bool get_arg(int argc, char** argv, const std::string& key, std::string& out) {
  for (int i = 1; i + 1 < argc; ++i) {
    if (key == argv[i]) {
      out = argv[i + 1];
      return true;
    }
  }
  return false;
}

int parse_int_arg(int argc, char** argv, const std::string& key, int def) {
  std::string value;
  if (!get_arg(argc, argv, key, value))
    return def;
  return std::stoi(value);
}

} // namespace

int main(int argc, char** argv) {
  try {
    std::string model_path;
    if (!get_arg(argc, argv, "--model", model_path)) {
      std::cerr << "Usage: tutorial_003_benchmark_your_model --model <path> [--samples <n>]\n";
      return 1;
    }
    const int samples = parse_int_arg(argc, argv, "--samples", 100);

    // CORE LOGIC
    simaai::neat::Model model(model_path);

    simaai::neat::BenchmarkReport report = model.benchmark(samples);
    if (report.latency_ms <= 0.0 || report.fps <= 0.0)
      throw std::runtime_error("benchmark produced no measured latency/fps");

    std::cout << "report_latency_ms=" << report.latency_ms << "\n";
    std::cout << "report_fps=" << report.fps << "\n";
    std::cout << "report_avg_power_watts=" << report.avg_power_watts << "\n";
    std::cout << "report_energy_joules=" << report.energy_joules << "\n";

    std::cout << "[OK] 003_benchmark_your_model\n";
    return 0;
  } catch (const std::exception& e) {
    std::cerr << "[FAIL] " << e.what() << "\n";
    return 1;
  }
}

Walkthrough​

Load the model​

Run the benchmark​

Read the report​

Run​

In Practice​

Full source​

Source​