Compose GenAI into a Graph
| Field | Value |
|---|---|
| Difficulty | Advanced |
| Estimated Read Time | 20-25 minutes |
| Labels | genai, graph, composition, streaming, advanced |
Most GenAI applications should start with direct model APIs. Graph composition becomes useful when GenAI needs to sit beside other Neat stages, named inputs, named outputs, routing, or application-level orchestration.
Walkthrough
Create a GenAI graph fragment
Create a task-specific model handle, configure graph-fragment options, and build a public Graph fragment.
The vision-language fragment exposes prompt, image, and use_cached_image inputs plus tokens, done, encoded, and error outputs. The speech transcriber fragment exposes audio and audio_path inputs plus tokens, done, and error outputs.
auto model = std::make_shared<genai::VisionLanguageModel>(args.model);
genai::VisionLanguageOptions options;
options.system_prompt = "You are concise.";
options.max_new_tokens = 96;
options.streaming = true;
simaai::neat::Graph genai_fragment =
genai::graphs::VisionLanguage(model, options, "genai_stage");
Add the fragment to an app graph
Add the fragment to a larger application graph. The fragment keeps its public endpoint names, so application code can push and pull by name.
simaai::neat::Graph app("genai_app");
app.add(genai_fragment);
std::cout << app.describe() << "\n";
Build and push a prompt
Build the graph into a Run, push a text sample to the prompt input, and let the GenAI stage produce tokens.
simaai::neat::Run run = app.build();
if (!run.push("prompt", make_text_sample("prompt", "Explain what an API gateway does."))) {
throw std::runtime_error("push(prompt) failed: " + run.last_error());
}
Pull tokens and completion metadata
Pull from tokens until a done sample arrives. The done sample is a bundle with fields such as generated token count and finish reason.
std::cout << "assistant: ";
for (int i = 0; i < 256; ++i) {
if (auto token = run.pull("tokens", 250)) {
std::cout << sample_text(*token) << std::flush;
continue;
}
if (auto done = run.pull("done", 10)) {
(void)done;
break;
}
if (auto error = run.pull("error", 10)) {
throw std::runtime_error(sample_text(*error));
}
}
std::cout << "\n";
run.close();
Run
First, download an LLM such as Qwen3 4B from Hugging Face using the LLiMa CLI:
llima pull Qwen3-4B-Instruct-2507-GPTQ-a16w4
Run the tutorial on Modalix with the deployed model directory:
C++ (prebuilt):
./lib/sima-neat/tutorials/tutorial_022_compose_genai_into_graph \
--model /media/nvme/llima/models/Qwen3-4B-Instruct-2507-GPTQ-a16w4
C++ (build from source):
./build.sh --target tutorial_022_compose_genai_into_graph
./build/tutorials-standalone/tutorial_022_compose_genai_into_graph \
--model /media/nvme/llima/models/Qwen3-4B-Instruct-2507-GPTQ-a16w4
Expected output prints the graph description and a streamed answer pulled from the tokens output.
In Practice
Use this pattern when GenAI is part of a larger application graph. Keep direct GenAIModel, VisionLanguageModel, and ASRModel calls for simple request/response application code.
Full source
Show the complete C++ and Python programs
#include "neat.h"
#include <filesystem>
#include <iostream>
#include <memory>
#include <stdexcept>
#include <string>
namespace genai = simaai::neat::genai;
struct Args {
std::filesystem::path model;
};
Args parse_args(int argc, char** argv) {
Args args;
for (int i = 1; i < argc; ++i) {
const std::string arg = argv[i];
if (arg == "--model" && i + 1 < argc) {
args.model = argv[++i];
} else {
throw std::runtime_error("usage: compose_genai_into_graph --model <llima_model_dir>");
}
}
if (args.model.empty()) {
throw std::runtime_error("missing required --model <llima_model_dir>");
}
return args;
}
simaai::neat::Sample make_text_sample(const std::string& port, const std::string& text) {
return simaai::neat::make_tensor_sample(port, simaai::neat::Tensor::from_text(text));
}
std::string sample_text(const simaai::neat::Sample& sample) {
if (sample.kind == simaai::neat::SampleKind::Tensor && sample.tensor.has_value()) {
return sample.tensor->to_text();
}
if (sample.kind == simaai::neat::SampleKind::TensorSet && sample.tensors.size() == 1U) {
return sample.tensors.front().to_text();
}
return {};
}
int main(int argc, char** argv) {
try {
const Args args = parse_args(argc, argv);
auto model = std::make_shared<genai::VisionLanguageModel>(args.model);
genai::VisionLanguageOptions options;
options.system_prompt = "You are concise.";
options.max_new_tokens = 96;
options.streaming = true;
simaai::neat::Graph genai_fragment =
genai::graphs::VisionLanguage(model, options, "genai_stage");
simaai::neat::Graph app("genai_app");
app.add(genai_fragment);
std::cout << app.describe() << "\n";
simaai::neat::Run run = app.build();
if (!run.push("prompt", make_text_sample("prompt", "Explain what an API gateway does."))) {
throw std::runtime_error("push(prompt) failed: " + run.last_error());
}
std::cout << "assistant: ";
for (int i = 0; i < 256; ++i) {
if (auto token = run.pull("tokens", 250)) {
std::cout << sample_text(*token) << std::flush;
continue;
}
if (auto done = run.pull("done", 10)) {
(void)done;
break;
}
if (auto error = run.pull("error", 10)) {
throw std::runtime_error(sample_text(*error));
}
}
std::cout << "\n";
run.close();
return 0;
} catch (const std::exception& e) {
std::cerr << "error: " << e.what() << "\n";
return 1;
}
}