Skip to main content

Compose GenAI into a Graph

Compose GenAI into a Graph — animated walkthrough overview

FieldValue
DifficultyAdvanced
Estimated Read Time20-25 minutes
Labelsgenai, graph, composition, streaming, advanced

Most GenAI applications should start with direct model APIs. Graph composition becomes useful when GenAI needs to sit beside other Neat stages, named inputs, named outputs, routing, or application-level orchestration.

Walkthrough

Create a GenAI graph fragment

Create a task-specific model handle, configure graph-fragment options, and build a public Graph fragment.

The vision-language fragment exposes prompt, image, and use_cached_image inputs plus tokens, done, encoded, and error outputs. The speech transcriber fragment exposes audio and audio_path inputs plus tokens, done, and error outputs.

tutorials/022_compose_genai_into_graph/compose_genai_into_graph.cpp
auto model = std::make_shared<genai::VisionLanguageModel>(args.model);

genai::VisionLanguageOptions options;
options.system_prompt = "You are concise.";
options.max_new_tokens = 96;
options.streaming = true;

simaai::neat::Graph genai_fragment =
genai::graphs::VisionLanguage(model, options, "genai_stage");

Add the fragment to an app graph

Add the fragment to a larger application graph. The fragment keeps its public endpoint names, so application code can push and pull by name.

tutorials/022_compose_genai_into_graph/compose_genai_into_graph.cpp
simaai::neat::Graph app("genai_app");
app.add(genai_fragment);
std::cout << app.describe() << "\n";

Build and push a prompt

Build the graph into a Run, push a text sample to the prompt input, and let the GenAI stage produce tokens.

tutorials/022_compose_genai_into_graph/compose_genai_into_graph.cpp
simaai::neat::Run run = app.build();
if (!run.push("prompt", make_text_sample("prompt", "Explain what an API gateway does."))) {
throw std::runtime_error("push(prompt) failed: " + run.last_error());
}

Pull tokens and completion metadata

Pull from tokens until a done sample arrives. The done sample is a bundle with fields such as generated token count and finish reason.

tutorials/022_compose_genai_into_graph/compose_genai_into_graph.cpp
std::cout << "assistant: ";
for (int i = 0; i < 256; ++i) {
if (auto token = run.pull("tokens", 250)) {
std::cout << sample_text(*token) << std::flush;
continue;
}
if (auto done = run.pull("done", 10)) {
(void)done;
break;
}
if (auto error = run.pull("error", 10)) {
throw std::runtime_error(sample_text(*error));
}
}
std::cout << "\n";
run.close();

Run

First, download an LLM such as Qwen3 4B from Hugging Face using the LLiMa CLI:

llima pull Qwen3-4B-Instruct-2507-GPTQ-a16w4

Run the tutorial on Modalix with the deployed model directory:

C++ (prebuilt):

./lib/sima-neat/tutorials/tutorial_022_compose_genai_into_graph \
--model /media/nvme/llima/models/Qwen3-4B-Instruct-2507-GPTQ-a16w4

C++ (build from source):

./build.sh --target tutorial_022_compose_genai_into_graph
./build/tutorials-standalone/tutorial_022_compose_genai_into_graph \
--model /media/nvme/llima/models/Qwen3-4B-Instruct-2507-GPTQ-a16w4

Expected output prints the graph description and a streamed answer pulled from the tokens output.

In Practice

Use this pattern when GenAI is part of a larger application graph. Keep direct GenAIModel, VisionLanguageModel, and ASRModel calls for simple request/response application code.

Full source

Show the complete C++ and Python programs
tutorials/022_compose_genai_into_graph/compose_genai_into_graph.cpp
#include "neat.h"

#include <filesystem>
#include <iostream>
#include <memory>
#include <stdexcept>
#include <string>

namespace genai = simaai::neat::genai;

struct Args {
std::filesystem::path model;
};

Args parse_args(int argc, char** argv) {
Args args;
for (int i = 1; i < argc; ++i) {
const std::string arg = argv[i];
if (arg == "--model" && i + 1 < argc) {
args.model = argv[++i];
} else {
throw std::runtime_error("usage: compose_genai_into_graph --model <llima_model_dir>");
}
}
if (args.model.empty()) {
throw std::runtime_error("missing required --model <llima_model_dir>");
}
return args;
}

simaai::neat::Sample make_text_sample(const std::string& port, const std::string& text) {
return simaai::neat::make_tensor_sample(port, simaai::neat::Tensor::from_text(text));
}

std::string sample_text(const simaai::neat::Sample& sample) {
if (sample.kind == simaai::neat::SampleKind::Tensor && sample.tensor.has_value()) {
return sample.tensor->to_text();
}
if (sample.kind == simaai::neat::SampleKind::TensorSet && sample.tensors.size() == 1U) {
return sample.tensors.front().to_text();
}
return {};
}

int main(int argc, char** argv) {
try {
const Args args = parse_args(argc, argv);

auto model = std::make_shared<genai::VisionLanguageModel>(args.model);

genai::VisionLanguageOptions options;
options.system_prompt = "You are concise.";
options.max_new_tokens = 96;
options.streaming = true;

simaai::neat::Graph genai_fragment =
genai::graphs::VisionLanguage(model, options, "genai_stage");

simaai::neat::Graph app("genai_app");
app.add(genai_fragment);
std::cout << app.describe() << "\n";

simaai::neat::Run run = app.build();
if (!run.push("prompt", make_text_sample("prompt", "Explain what an API gateway does."))) {
throw std::runtime_error("push(prompt) failed: " + run.last_error());
}

std::cout << "assistant: ";
for (int i = 0; i < 256; ++i) {
if (auto token = run.pull("tokens", 250)) {
std::cout << sample_text(*token) << std::flush;
continue;
}
if (auto done = run.pull("done", 10)) {
(void)done;
break;
}
if (auto error = run.pull("error", 10)) {
throw std::runtime_error(sample_text(*error));
}
}
std::cout << "\n";
run.close();

return 0;
} catch (const std::exception& e) {
std::cerr << "error: " << e.what() << "\n";
return 1;
}
}

Source