Skip to main content

Recently Updated Pages

4.4 Security and Privacy on the Edge

On the Edge: Agentic AI for Neural Proc... Production Deployment & Observability

"It runs on the device, so it's private" is the marketing line. It's also a half-truth that has c...

Updated 1 month ago by Admin

3.4 Structured Outputs and Constrained Decoding

On the Edge: Agentic AI for Neural Proc... Tool Use & Integration Patterns

An agent is only as reliable as the parser that reads its output. Chapter 3.1 covered designing t...

Updated 1 month ago by Admin

1.5 Speculative Decoding

On the Edge: Agentic AI for Neural Proc... Foundations of NPU-Optimized Agents

Chapter 1.3 established the bandwidth ceiling as the binding constraint on LLM decode: 136.5 GB/s...

Updated 1 month ago by Admin

1.4 The Accuracy Cost of Quantization

On the Edge: Agentic AI for Neural Proc... Foundations of NPU-Optimized Agents

Chapter 1.2 laid out the quantization recipes Intel NPU supports: INT8-sym, INT4-sym group-128 or...

Updated 1 month ago by Admin

Preface

On the Edge: Agentic AI for Neural Proc...

This book is about a narrow, awkward, increasingly important corner of applied AI: building agent...

Updated 1 month ago by Admin

References

On the Edge: Agentic AI for Neural Proc... Appendices

These are the primary sources for the technical claims in the book. Where multiple sources existe...

Updated 1 month ago by Admin

Glossary

On the Edge: Agentic AI for Neural Proc... Appendices

The book uses vocabulary from three communities that don't always agree on terms: Intel NPU hardw...

Updated 1 month ago by Admin

2.3 Reasoning Loops Under Constraint

On the Edge: Agentic AI for Neural Proc... Agent State & Decision-Making on Constr...

Chapter 2 closes here. We have a model that fits, weights we can stream, KV state we can manage, ...

Updated 1 month ago by Admin

2.2 KV Cache Engineering: Reuse, Eviction, and Prefix Sharing

On the Edge: Agentic AI for Neural Proc... Agent State & Decision-Making on Constr...

The distinction between KV cache (what you keep in memory) and KV cache bandwidth (what you strea...

Updated 1 month ago by Admin

2.1 Context Windows and the Memory Wall

On the Edge: Agentic AI for Neural Proc... Agent State & Decision-Making on Constr...

The agent's state — what it remembers from past steps and what it uses to make the next decision ...

Updated 1 month ago by Admin

1.3 Latency, Throughput, and Hardware-Aware Patterns

On the Edge: Agentic AI for Neural Proc... Foundations of NPU-Optimized Agents

The architecture and constraints from Chapters 1.1 and 1.2 set the ceiling. This section is about...

Updated 1 month ago by Admin

1.2 Computational Constraints & Model Optimization

On the Edge: Agentic AI for Neural Proc... Foundations of NPU-Optimized Agents

The architecture from Chapter 1.1 sets the rules. This section is about playing inside them: what...

Updated 1 month ago by Admin

1.1 Understanding NPU Architecture

On the Edge: Agentic AI for Neural Proc... Foundations of NPU-Optimized Agents

Before talking about agents on NPUs, we need to talk about the NPU itself — what makes it a disti...

Updated 1 month ago by Admin

5.3 Anti-Patterns and Lessons

On the Edge: Agentic AI for Neural Proc... Real-World Case Studies & Best Practices

We've covered foundations, state, tools, deployment, and case studies. This final section pulls t...

Updated 1 month ago by Admin

5.2 A Worked Agentic Translation Assistant

On the Edge: Agentic AI for Neural Proc... Real-World Case Studies & Best Practices

This section ties the book together by walking through an end-to-end agentic translation assistan...

Updated 1 month ago by Admin

5.1 What's Actually Shipping on Intel NPUs

On the Edge: Agentic AI for Neural Proc... Real-World Case Studies & Best Practices

The most useful thing a book like this can do, in its closing chapter, is be honest about what is...

Updated 1 month ago by Admin

4.3 A/B Testing, Canaries, and Hotswaps

On the Edge: Agentic AI for Neural Proc... Production Deployment & Observability

Models drift. Drivers update. Quantization schemes change. The NPU you tested against in February...

Updated 1 month ago by Admin

4.2 Telemetry: What Works, What Doesn't, and What's Missing

On the Edge: Agentic AI for Neural Proc... Production Deployment & Observability

You can't operate what you can't observe. NPU agents have a harder observability story than CPU- ...

Updated 1 month ago by Admin

4.1 Serving NPU Models with OVMS

On the Edge: Agentic AI for Neural Proc... Production Deployment & Observability

A development-time compile_model(...) call is not a production deployment. Once your agent is rea...

Updated 1 month ago by Admin

3.3 Multi-Device Orchestration on a Single SoC

On the Edge: Agentic AI for Neural Proc... Tool Use & Integration Patterns

A Core Ultra SoC isn't one engine — it's three. CPU cores for general-purpose work, an integrated...

Updated 1 month ago by Admin