Search for {created_by:admin} {type:page}

1.1 Understanding NPU Architecture

On the Edge: Agentic AI for Neural Proc... Foundations of NPU-Optimized Agents

Before talking about agents on NPUs, we need to talk about the NPU itself — what makes it a distinct class of accelerator, and why the architectural choices ripple all the way up to how you design an agent loop. This book uses Intel Core NPU as its primary anc...

1.2 Computational Constraints & Model Optimization

On the Edge: Agentic AI for Neural Proc... Foundations of NPU-Optimized Agents

The architecture from Chapter 1.1 sets the rules. This section is about playing inside them: what an Intel NPU will and won't accept, how to shape a model so it compiles, and how to quantize without quietly losing the quality you paid for in training. We ancho...

1.3 Latency, Throughput, and Hardware-Aware Patterns

On the Edge: Agentic AI for Neural Proc... Foundations of NPU-Optimized Agents

The architecture and constraints from Chapters 1.1 and 1.2 set the ceiling. This section is about measuring it: what does a real model's latency profile look like on Intel hardware, how does that latency break down, and what does that imply for the agent loop ...

2.1 Context Windows and the Memory Wall

On the Edge: Agentic AI for Neural Proc... Agent State & Decision-Making on Constr...

The agent's state — what it remembers from past steps and what it uses to make the next decision — is the bridge between hardware constraints and agent behavior. This section is about the memory wall: why it exists, what it means in numbers, and how to budget ...

2.2 KV Cache Engineering: Reuse, Eviction, and Prefix Sharing

On the Edge: Agentic AI for Neural Proc... Agent State & Decision-Making on Constr...

The distinction between KV cache (what you keep in memory) and KV cache bandwidth (what you stream per token) is subtle and worth being precise about, because it sets the operational window for what an agent can do in real time. This section descends into the ...

2.3 Reasoning Loops Under Constraint

On the Edge: Agentic AI for Neural Proc... Agent State & Decision-Making on Constr...

Chapter 2 closes here. We have a model that fits, weights we can stream, KV state we can manage, and decode at roughly 6–20 tok/s. The question this section answers: given that decode budget, what reasoning architectures actually work? The naive answer — bolt ...

3.1 Designing Tools for NPU-Bound Agents

On the Edge: Agentic AI for Neural Proc... Tool Use & Integration Patterns

Chapter 2 ended with a claim: tool selection is a decision problem, not a search. This chapter goes further. The tools themselves — what they do, where they run, how they're shaped — are part of agent architecture, not separate from it. Get the tool design rig...

3.2 Local-NPU vs Cloud Tools: A Real Trade-Off Table

On the Edge: Agentic AI for Neural Proc... Tool Use & Integration Patterns

If the tool runs locally on the NPU, the orchestrator pays a one-time compile cost and then has predictable, private, offline-capable inference. If the tool runs in the cloud, the orchestrator pays per-call network latency and per-token API fees but gets large...

3.3 Multi-Device Orchestration on a Single SoC

On the Edge: Agentic AI for Neural Proc... Tool Use & Integration Patterns

A Core Ultra SoC isn't one engine — it's three. CPU cores for general-purpose work, an integrated GPU for parallel compute and graphics, and the NPU for low-power neural inference. An agent that uses only one of them is leaving capacity on the table. An agent ...

4.1 Serving NPU Models with OVMS

On the Edge: Agentic AI for Neural Proc... Production Deployment & Observability

A development-time compile_model(...) call is not a production deployment. Once your agent is real, it needs to survive process restarts, model updates, multiple concurrent clients, health checks, and the operations team. This section is about how to actually ...

4.2 Telemetry: What Works, What Doesn't, and What's Missing

On the Edge: Agentic AI for Neural Proc... Production Deployment & Observability

You can't operate what you can't observe. NPU agents have a harder observability story than CPU- or GPU-bound workloads — partly because the hardware is newer, partly because vendor tooling lags, partly because some of the telemetry you'd expect simply isn't e...

4.3 A/B Testing, Canaries, and Hotswaps

On the Edge: Agentic AI for Neural Proc... Production Deployment & Observability

Models drift. Drivers update. Quantization schemes change. The NPU you tested against in February is not the NPU your users have in November. Shipping an NPU-resident agent is not a one-time event — it's a continuous negotiation between your release process an...

5.1 What's Actually Shipping on Intel NPUs

On the Edge: Agentic AI for Neural Proc... Real-World Case Studies & Best Practices

The most useful thing a book like this can do, in its closing chapter, is be honest about what is really deployed on NPU hardware today versus what is announced, planned, or aspirational. The gap matters. If you build your roadmap on press releases, you'll dis...

5.2 A Worked Agentic Translation Assistant

On the Edge: Agentic AI for Neural Proc... Real-World Case Studies & Best Practices

This section ties the book together by walking through an end-to-end agentic translation assistant. The goal isn't a polished product — it's to show how the patterns from Chapters 1–4 combine in real code, what the latency budget looks like in practice, and wh...

5.3 Anti-Patterns and Lessons

On the Edge: Agentic AI for Neural Proc... Real-World Case Studies & Best Practices

We've covered foundations, state, tools, deployment, and case studies. This final section pulls together the failure modes that recur in real NPU deployments — the things that look like they should work but don't — and the durable lessons distilled from the pu...

Glossary

On the Edge: Agentic AI for Neural Proc... Appendices

The book uses vocabulary from three communities that don't always agree on terms: Intel NPU hardware, OpenVINO/Hugging Face software, and the agent-design literature. Definitions here are tuned to how the book uses each term, not to general usage. Entries are ...

References

On the Edge: Agentic AI for Neural Proc... Appendices

These are the primary sources for the technical claims in the book. Where multiple sources existed for the same fact, the most authoritative (vendor docs first, then peer-reviewed papers, then independent measurement) was used. Sources marked † are referenced ...

Preface

On the Edge: Agentic AI for Neural Proc...

This book is about a narrow, awkward, increasingly important corner of applied AI: building agents that run on the Neural Processing Unit of a consumer-grade laptop. Specifically, on Intel Core Ultra hardware, using OpenVINO, with one eye on the production dep...

Advanced Search

Search Terms

Content Type

Exact Matches

Tag Searches

Date Options

Search Results

22 total results found

1.1 Understanding NPU Architecture

1.2 Computational Constraints & Model Optimization

1.3 Latency, Throughput, and Hardware-Aware Patterns

2.1 Context Windows and the Memory Wall

2.2 KV Cache Engineering: Reuse, Eviction, and Prefix Sharing

2.3 Reasoning Loops Under Constraint

3.1 Designing Tools for NPU-Bound Agents

3.2 Local-NPU vs Cloud Tools: A Real Trade-Off Table

3.3 Multi-Device Orchestration on a Single SoC

4.1 Serving NPU Models with OVMS

4.2 Telemetry: What Works, What Doesn't, and What's Missing

4.3 A/B Testing, Canaries, and Hotswaps

5.1 What's Actually Shipping on Intel NPUs

5.2 A Worked Agentic Translation Assistant

5.3 Anti-Patterns and Lessons

Glossary

References

Preface

Updated after

Updated before

Created after

Created before