Skip to main content
Advanced Search
Search Terms
Content Type

Exact Matches
Tag Searches
Date Options
Updated after
Updated before
Created after
Created before

Search Results

31 total results found

Secure Software Development

The following pages and documents cover Secure Software Development, including the Secure Development Lifecycle (SSDLC), organizational policy and compliance requirements, secure coding standards, threat modeling, security testing and validation, training and ...

On the Edge: Agentic AI for Neural Processors

A practical guide to building intelligent agents optimized for NPU hardware. Learn how to design, implement, and deploy agentic systems that leverage neural processors for edge computing, with real-world patterns, performance optimization techniques, and produ...

topic
agentic-programming
topic
npu
topic
edge-computing
difficulty
intermediate

Foundations of NPU-Optimized Agents

On the Edge: Agentic AI for Neural Proc...

NPU architecture and computational constraints. Model quantization and optimization for NPU deployment. Latency profiles and throughput optimization. Hardware-aware agent design patterns.

Agent State & Decision-Making on Constrained Hardware

On the Edge: Agentic AI for Neural Proc...

Managing agent context and memory within NPU limits. Efficient reasoning loops for low-latency inference. Token budget strategies and context windowing. Caching and KV optimization for repeated queries.

Tool Use & Integration Patterns

On the Edge: Agentic AI for Neural Proc...

Designing lightweight tools for NPU-based agents. Async I/O and non-blocking integrations. Local vs. remote tool execution trade-offs. Building tool abstractions that respect hardware constraints.

Production Deployment & Observability

On the Edge: Agentic AI for Neural Proc...

Model serving architectures (ONNX, TensorRT, TVM). Monitoring latency, throughput, and reliability. A/B testing and progressive rollout strategies. Cost optimization and resource allocation.

Real-World Case Studies & Best Practices

On the Edge: Agentic AI for Neural Proc...

Building customer-facing NPU agents (chatbots, assistants). Batch vs. streaming inference strategies. Handling fallbacks and graceful degradation. Lessons learned and anti-patterns to avoid.

1.1 Understanding NPU Architecture

On the Edge: Agentic AI for Neural Proc... Foundations of NPU-Optimized Agents

Before talking about agents on NPUs, we need to talk about the NPU itself — what makes it a distinct class of accelerator, and why the architectural choices ripple all the way up to how you design an agent loop. This book uses Intel Core NPU as its primary anc...

1.2 Computational Constraints & Model Optimization

On the Edge: Agentic AI for Neural Proc... Foundations of NPU-Optimized Agents

The architecture from Chapter 1.1 sets the rules. This section is about playing inside them: what an Intel NPU will and won't accept, how to shape a model so it compiles, and how to quantize without quietly losing the quality you paid for in training. We ancho...

1.3 Latency, Throughput, and Hardware-Aware Patterns

On the Edge: Agentic AI for Neural Proc... Foundations of NPU-Optimized Agents

The architecture and constraints from Chapters 1.1 and 1.2 set the ceiling. This section is about measuring it: what does a real model's latency profile look like on Intel hardware, how does that latency break down, and what does that imply for the agent loop ...

2.1 Context Windows and the Memory Wall

On the Edge: Agentic AI for Neural Proc... Agent State & Decision-Making on Constr...

The agent's state — what it remembers from past steps and what it uses to make the next decision — is the bridge between hardware constraints and agent behavior. This section is about the memory wall: why it exists, what it means in numbers, and how to budget ...

2.2 KV Cache Engineering: Reuse, Eviction, and Prefix Sharing

On the Edge: Agentic AI for Neural Proc... Agent State & Decision-Making on Constr...

The distinction between KV cache (what you keep in memory) and KV cache bandwidth (what you stream per token) is subtle and worth being precise about, because it sets the operational window for what an agent can do in real time. This section descends into the ...

2.3 Reasoning Loops Under Constraint

On the Edge: Agentic AI for Neural Proc... Agent State & Decision-Making on Constr...

Chapter 2 closes here. We have a model that fits, weights we can stream, KV state we can manage, and decode at roughly 6–20 tok/s. The question this section answers: given that decode budget, what reasoning architectures actually work? The naive answer — bolt ...

3.1 Designing Tools for NPU-Bound Agents

On the Edge: Agentic AI for Neural Proc... Tool Use & Integration Patterns

Chapter 2 ended with a claim: tool selection is a decision problem, not a search. This chapter goes further. The tools themselves — what they do, where they run, how they're shaped — are part of agent architecture, not separate from it. Get the tool design rig...

3.2 Local-NPU vs Cloud Tools: A Real Trade-Off Table

On the Edge: Agentic AI for Neural Proc... Tool Use & Integration Patterns

If the tool runs locally on the NPU, the orchestrator pays a one-time compile cost and then has predictable, private, offline-capable inference. If the tool runs in the cloud, the orchestrator pays per-call network latency and per-token API fees but gets large...

3.3 Multi-Device Orchestration on a Single SoC

On the Edge: Agentic AI for Neural Proc... Tool Use & Integration Patterns

A Core Ultra SoC isn't one engine — it's three. CPU cores for general-purpose work, an integrated GPU for parallel compute and graphics, and the NPU for low-power neural inference. An agent that uses only one of them is leaving capacity on the table. An agent ...

4.1 Serving NPU Models with OVMS

On the Edge: Agentic AI for Neural Proc... Production Deployment & Observability

A development-time compile_model(...) call is not a production deployment. Once your agent is real, it needs to survive process restarts, model updates, multiple concurrent clients, health checks, and the operations team. This section is about how to actually ...

4.2 Telemetry: What Works, What Doesn't, and What's Missing

On the Edge: Agentic AI for Neural Proc... Production Deployment & Observability

You can't operate what you can't observe. NPU agents have a harder observability story than CPU- or GPU-bound workloads — partly because the hardware is newer, partly because vendor tooling lags, partly because some of the telemetry you'd expect simply isn't e...