Skip to main content
Advanced Search
Search Terms
Content Type

Exact Matches
Tag Searches
Date Options
Updated after
Updated before
Created after
Created before

Search Results

6 total results found

Foundations of NPU-Optimized Agents

On the Edge: Agentic AI for Neural Proc...

NPU architecture and computational constraints. Model quantization and optimization for NPU deployment. Latency profiles and throughput optimization. Hardware-aware agent design patterns.

Agent State & Decision-Making on Constrained Hardware

On the Edge: Agentic AI for Neural Proc...

Managing agent context and memory within NPU limits. Efficient reasoning loops for low-latency inference. Token budget strategies and context windowing. Caching and KV optimization for repeated queries.

Tool Use & Integration Patterns

On the Edge: Agentic AI for Neural Proc...

Designing lightweight tools for NPU-based agents. Async I/O and non-blocking integrations. Local vs. remote tool execution trade-offs. Building tool abstractions that respect hardware constraints.

Production Deployment & Observability

On the Edge: Agentic AI for Neural Proc...

Model serving architectures (ONNX, TensorRT, TVM). Monitoring latency, throughput, and reliability. A/B testing and progressive rollout strategies. Cost optimization and resource allocation.

Real-World Case Studies & Best Practices

On the Edge: Agentic AI for Neural Proc...

Building customer-facing NPU agents (chatbots, assistants). Batch vs. streaming inference strategies. Handling fallbacks and graceful degradation. Lessons learned and anti-patterns to avoid.

Appendices

On the Edge: Agentic AI for Neural Proc...

Glossary of terms and consolidated source references for the book.