{"product_id":"9783032267481","title":"Reliable Evaluations for LLMs and AI Agents End-to-End Evaluation Frameworks for LLMs and Autonomous AI Agents","description":"\u003ch1\u003eReliable Evaluations for LLMs and AI Agents\u003c\/h1\u003e\u003ch2\u003eEnd-to-End Evaluation Frameworks for LLMs and Autonomous AI Agents\u003c\/h2\u003e\u003ch3\u003eAlexei Robsky | Liliya Lavitas | Yueqing Wang\u003c\/h3\u003e\u003cdiv\u003e\u003cb\u003eComputers \/ Artificial Intelligence \/ General\u003c\/b\u003e\u003c\/div\u003e\u003cbr\u003e\u003cdiv\u003e\n\u003cp\u003eThis book gives practitioners a concrete, systematic framework for designing evals that make AI systems safe, robust, and customer-ready before they reach production. Drawing on real-world failures, from chatbots that went off the rails to shopping assistants that hallucinated product information, it shows how seemingly small evaluation gaps can cascade into legal, financial, and reputational crisis, and how to close those gaps with disciplined, systematic testing.\u003c\/p\u003e\r\n\u003cp\u003eMoving from foundational concepts to advanced practice, \u003cem\u003e\u003cspan style=\"font-size: 12.0pt; font-family: 'Aptos',sans-serif; mso-fareast-font-family: Aptos; mso-fareast-theme-font: minor-latin; mso-bidi-font-family: Aptos; mso-font-kerning: 0pt; mso-ligatures: none; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;\"\u003eReliable Evals for LLMs and AI Agents\u003c\/span\u003e\u003c\/em\u003e introduces the four core levers of effective evals: sets, templates, metrics, and evaluators. It then extends these to the unique challenges of autonomous AI agents, where systems perceive, reason, act, and adapt in iterative loops that demand fundamentally different eval approaches. Along the way, it guides readers through benchmark selection, custom eval set design, statistical rigor in metrics, human and LLM-as-a-judge rating strategies, and the infrastructure needed to automate evals at scale.\u003c\/p\u003e\r\n\u003cp\u003eFor engineering leaders, applied researchers, data scientists, and product teams shipping LLM- and agent-powered experiences, this volume offers a blueprint for building eval flywheels that continuously improve AI quality. It shows how to progress from ad-hoc checks to production-grade eval systems, align model metrics with real user satisfaction, integrate offline evals with online A\/B testing, and design accessible interfaces that democratize rigorous testing across an organization.\u003c\/p\u003e\n\u003c\/div\u003e\u003cdiv\u003e\n\u003cp class=\"MsoNormal\"\u003e\u003cstrong\u003eAlexei Robsky\u003c\/strong\u003e is a technology leader with over 15 years of experience building production AI and Machine-Learning systems. As AI Leader at Microsoft, he heads the Fabric Real-Time Intelligence AI team, delivering cloud-scale AI agents and solutions that act on live data streams.\u003cbr\u003ePreviously, Alexei led Google's Gemini Core Modeling and Evals Data Science Research organization, driving evaluation research and training-data quality initiatives that shaped the performance, reliability, and safety of Gemini models.\u003cbr\u003eEarlier in his career, he was a Data Science Manager at Twitter, overseeing evaluation of Home Timeline ranking and personalization, and a Principal Data Science Manager at Microsoft, where he guided cross-functional teams that shipped production-level Azure customer-experience solutions.\u003cbr\u003eAlexei co-authored \u003cem\u003eMachine Learning Governance for Managers\u003c\/em\u003e (Springer, 2024), holds an MBA from Duke University's Fuqua School of Business, and earned a B.Sc. in Electrical Engineering and Computer Science from Tel Aviv University.\u003c\/p\u003e\r\n\u003cp class=\"MsoNormal\" style=\"mso-margin-top-alt: auto; mso-margin-bottom-alt: auto;\"\u003e\u003cstrong\u003eLiliya Lavitas\u003c\/strong\u003e is an accomplished Data Science leader with a deep background in statistical modeling and machine learning. Liliiya currently leads a Data Science team in Google Search incorporating AI features in Google Search experience. Previously, under leadership of Alexei, she has been leading a Data Science team at Google DeepMind, responsible for the rigorous evaluation and training of core Gemini capabilities, ensuring their performance and reliability. Prior to her role at Google, Liliya managed Data Science teams at Netflix and Twitter. Liliya earned her Ph.D. in Statistics from Boston University in 2017, with a thesis specializing in Time Series analysis.\u003c\/p\u003e\r\n\u003cp class=\"MsoNormal\"\u003e\u003cstrong\u003eYueqing Wang\u003c\/strong\u003e is a statistician specializing in novel methodology for system evaluation. In recent years, she has developed new techniques and frameworks for Gen AI evaluation for Google Gemini and at Microsoft AI. Earlier, she worked on paid product ecosystem at YouTube, statistical evaluations of startups at Google Ventures, and Media Mix Modeling at Google Ads, among other things. She earned her PhD in Statistics from the University of California, Berkeley in 2012 with Professor Bin Yu.\u003c\/p\u003e\n\u003c\/div\u003e\u003cbr\u003e\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003ePublication Date: \u003c\/td\u003e\n\u003ctd\u003e19 July 2026\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003ePublisher: \u003c\/td\u003e\n\u003ctd\u003eSpringer Nature Switzerland\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eImprint: \u003c\/td\u003e\n\u003ctd\u003eSpringer\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eISBN-13: \u003c\/td\u003e\n\u003ctd\u003e9783032267481\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eFormat: \u003c\/td\u003e\n\u003ctd\u003ePaperback \/ softback\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003ePage Count: \u003c\/td\u003e\n\u003ctd\u003e193\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/table\u003e","brand":"Springer Nature Switzerland","offers":[{"title":"Default Title","offer_id":47722477289612,"sku":"9783032267481","price":44.99,"currency_code":"USD","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0710\/9545\/1788\/files\/9783032267481.jpg?v=1781089142","url":"https:\/\/fh90cf-fv.myshopify.com\/products\/9783032267481","provider":"Late Knight Books and Services, LLC","version":"1.0","type":"link"}