Advanced Search

Search Terms

Content Type

Page Chapter
Book Shelf

Exact Matches

Tag Searches

Date Options

Updated after

Updated before

Created after

Created before

Search Results

22 total results found

1.4 The Accuracy Cost of Quantization

On the Edge: Agentic AI for Neural Proc... Foundations of NPU-Optimized Agents

Chapter 1.2 laid out the quantization recipes Intel NPU supports: INT8-sym, INT4-sym group-128 or channel-wise, NF4 on Lunar Lake, FP8 on Panther Lake. The hardware story ended there. This section is the missing other half — what those recipes actually cost yo...

1.5 Speculative Decoding

On the Edge: Agentic AI for Neural Proc... Foundations of NPU-Optimized Agents

Chapter 1.3 established the bandwidth ceiling as the binding constraint on LLM decode: 136.5 GB/s shared LPDDR5X, ~25 GB/s effective NPU quota, ~6–20 tok/s sustained throughput for 3B–8B INT4 models. The natural follow-up question is whether there's any way ar...

3.4 Structured Outputs and Constrained Decoding

On the Edge: Agentic AI for Neural Proc... Tool Use & Integration Patterns

An agent is only as reliable as the parser that reads its output. Chapter 3.1 covered designing the tools; Chapter 3.2 weighed local against cloud; Chapter 3.3 routed work across devices on the SoC. This section closes the loop on the agent-tool contract: how ...

4.4 Security and Privacy on the Edge

On the Edge: Agentic AI for Neural Proc... Production Deployment & Observability

"It runs on the device, so it's private" is the marketing line. It's also a half-truth that has caused real production incidents. Chapter 4.1 through 4.3 covered the deployment, observability, and rollout machinery; this section is about the threat model that ...