Advanced Search
Search Results
31 total results found
4.3 A/B Testing, Canaries, and Hotswaps
Models drift. Drivers update. Quantization schemes change. The NPU you tested against in February is not the NPU your users have in November. Shipping an NPU-resident agent is not a one-time event — it's a continuous negotiation between your release process an...
5.1 What's Actually Shipping on Intel NPUs
The most useful thing a book like this can do, in its closing chapter, is be honest about what is really deployed on NPU hardware today versus what is announced, planned, or aspirational. The gap matters. If you build your roadmap on press releases, you'll dis...
5.2 A Worked Agentic Translation Assistant
This section ties the book together by walking through an end-to-end agentic translation assistant. The goal isn't a polished product — it's to show how the patterns from Chapters 1–4 combine in real code, what the latency budget looks like in practice, and wh...
5.3 Anti-Patterns and Lessons
We've covered foundations, state, tools, deployment, and case studies. This final section pulls together the failure modes that recur in real NPU deployments — the things that look like they should work but don't — and the durable lessons distilled from the pu...
Appendices
Glossary of terms and consolidated source references for the book.
Glossary
The book uses vocabulary from three communities that don't always agree on terms: Intel NPU hardware, OpenVINO/Hugging Face software, and the agent-design literature. Definitions here are tuned to how the book uses each term, not to general usage. Entries are ...
References
These are the primary sources for the technical claims in the book. Where multiple sources existed for the same fact, the most authoritative (vendor docs first, then peer-reviewed papers, then independent measurement) was used. Sources marked † are referenced ...
Preface
This book is about a narrow, awkward, increasingly important corner of applied AI: building agents that run on the Neural Processing Unit of a consumer-grade laptop. Specifically, on Intel Core Ultra hardware, using OpenVINO, with one eye on the production dep...
1.4 The Accuracy Cost of Quantization
Chapter 1.2 laid out the quantization recipes Intel NPU supports: INT8-sym, INT4-sym group-128 or channel-wise, NF4 on Lunar Lake, FP8 on Panther Lake. The hardware story ended there. This section is the missing other half — what those recipes actually cost yo...
1.5 Speculative Decoding
Chapter 1.3 established the bandwidth ceiling as the binding constraint on LLM decode: 136.5 GB/s shared LPDDR5X, ~25 GB/s effective NPU quota, ~6–20 tok/s sustained throughput for 3B–8B INT4 models. The natural follow-up question is whether there's any way ar...
3.4 Structured Outputs and Constrained Decoding
An agent is only as reliable as the parser that reads its output. Chapter 3.1 covered designing the tools; Chapter 3.2 weighed local against cloud; Chapter 3.3 routed work across devices on the SoC. This section closes the loop on the agent-tool contract: how ...
4.4 Security and Privacy on the Edge
"It runs on the device, so it's private" is the marketing line. It's also a half-truth that has caused real production incidents. Chapter 4.1 through 4.3 covered the deployment, observability, and rollout machinery; this section is about the threat model that ...
Onzichtbare Meesters
Een filosofische novelle over passie, erfenis en het ambacht dat onder de oppervlakte van de software-industrie blijft bestaan in 2030. Yael Verheul, hoofdmaintainer van het open-source project Conduit, wordt benoemd in het testament van een gepensioneerde Rij...