Mastering Voice Interfaces
66,99 €
Sofort verfügbar, Lieferzeit: Sofort lieferbar
Mastering Voice Interfaces, Apress
Creating Great Voice Apps for Real Users
Von Ann Thymé-Gobbel, Charles Jankowski, im heise Shop in digitaler Fassung erhältlich
Produktinformationen "Mastering Voice Interfaces"
Build great voice apps of any complexity for any domain by learning both the how's and why's of voice development. In this book you’ll see how we live in a golden age of voice technology and how advances in automatic speech recognition (ASR), natural language processing (NLP), and related technologies allow people to talk to machines and get reasonable responses. Today, anyone with computer access can build a working voice app. That democratization of the technology is great. But, while it’s fairly easy to build a voice app that runs, it's still remarkably difficult to build a great one, one that users trust, that understands their natural ways of speaking and fulfills their needs, and that makes them want to return for more.
We start with an overview of how humans and machines produce and process conversational speech, explaining how they differ from each other and from other modalities. This is the background you need to understand the consequences of each design and implementation choice as we dive into the core principles of voice interface design. We walk you through many design and development techniques, including ones that some view as advanced, but that you can implement today. We use the Google development platform and Python, but our goal is to explain the reasons behind each technique such that you can take what you learn and implement it on any platform.
Readers of Mastering Voice Interfaces will come away with a solid understanding of what makes voice interfaces special, learn the core voice design principles for building great voice apps, and how to actually implement those principles to create robust apps. We’ve learned during many years in the voice industry that the most successful solutions are created by those who understand both the human and the technology sides of speech, and that both sides affect design and development. Because we focus on developing task-oriented voice apps for real users in the real world, you’ll learn how to take your voice apps from idea through scoping, design, development, rollout, and post-deployment performance improvements, all illustrated with examples from our own voice industry experiences.
WHAT YOU WILL LEARN
* Create truly great voice apps that users will love and trust
* See how voice differs from other input and output modalities, and why that matters
* Discover best practices for designing conversational voice-first applications, and the consequences of design and implementation choices
* Implement advanced voice designs, with real-world examples you can use immediately.
* Verify that your app is performing well, and what to change if it doesn't
Who This Book Is For
Anyone curious about the real how’s and why’s of voice interface design and development. In particular, it's aimed at teams of developers, designers, and product owners who need a shared understanding of how to create successful voice interfaces using today's technology. We expect readers to have had some exposure to voice apps, at least as users.
Ann Thymé-Gobbel's career has focused on how people use speech and natural language to communicate with each other and with technology. After completing her PhD in cognitive science and linguistics from UC San Diego, she's held a broad set of voice-related UI/UX design roles in both large corporations and small start-ups, working with diverse teams in product development, client project engagements, and R&D. Her past work includes design, data analysis and establishing best practices at Nuance, voice design for mobile and in-home devices at Amazon Lab 126, and creating natural language conversations for multimodal healthcare apps at 22otters. Her research has covered automatic language detection, error correction, and discourse structure. She is currently Director of UI/UX Design at Loose Cannon Systems, the team bringing to market Milo, a handsfree wearable communicator. Ann never stops doing research: she collects and analyzes data at every opportunity and enjoys sharing her findings with others, having presented and taught at conferences internationally.
Charles Jankowski has over 30 years’ experience in industry and academia developing applications and algorithms for real-world users incorporating advanced speech recognition, speaker verification, and natural language technologies. He has used state-of-the-art machine learning processes and techniques for data analysis, performance optimization, and algorithm development. Charles has highly in-depth technical experience with state-of-the-art technologies, effective management of cross-functional teams for all facets of application deployment, and outstanding relationships with clients. Currently, he is Director of NLP at Brain Technologies, creating the Natural iOS application with which you can “Say it and Get it.” Previously he was Director of NLP and Robotics at CloudMinds, Director of Speech and Natural Language at 22otters, Senior Speech Scientist at Performance Technology Partners, and Director of Professional Services at Nuance. He has also been an independent consultant. Charles holds S.B., S.M., and Ph.D. degrees from MIT, all in electrical engineering.
PART 1 – Voice System Foundations
Chapter 1: Say Hello to Voice Systems
Chapter goal: Introduce the reader to voice-first technology, its core concepts, and typical phases of development through an explanatory background for the current state and challenges of voice.
No of pages - 20
Sub-topics
1. Voice-first, voice-only, and conversational everything
2. Introduction to voice technology components (Speech to text, Natural language
understanding, Dialog management, Natural language generation, Text to speech)
3. The phases of voice development success (Plan, Design, Build, Test, Deploy &
Assess, Iterate)
4. Hope is not a strategy – but to plan & execute is
Chapter 2: Keeping Voice in Mind
Chapter goal: Explain to the reader how humans and computers “talk” and “listen.”
What’s easy and hard for the human user and the technology in a dialog, and why.
No of pages - 15
Sub-topics
1. Why voice is different
2. Hands-on: A pre-coding thought experiment
3. Voice dialog and its participants
• The Human: spoken natural language understanding
• The Computer: voice system recognition and interpretation
• Human-computer voice dialog - Successful voice-first development is all about
coordinating human abilities with the technology to allow conversations between
two very different dialog participants.
Chapter 3: Running a Voice Implementation—and Noticing Issues
Chapter goal: Allow the reader to put into practice their newly learned foundation by
implementing and running a simple voice application in the Google Assistant framework,
and experiencing how quickly even a simple voice interaction needs improvement.
No of pages - 15
Sub-topics
1. Hands-on: Preparing a restaurant finder
2. Introducing voice platforms
3. Hands-on: Implementing the restaurant finder
Basic setup, Specifying a first intent, Doing something, What the user says,
What the VUI says, Connecting Dialogflow to Actions on Google, Testingthe app, Saving the voice interaction
4. Google’s voice development ecosystem, and why we're using it here
5. The pros and cons of relying on tools
6. Hands-on: Making changes - testing and iterating (Adding phrases to handle the same meaning, additional content, and more specific)
PART 2 – Planning Voice System InteractionsChapter 4: Defining your Vision: Building What, How, and Why for Whom
Chapter goal: Introduce voice-focused requirement discovery, highlighting differences
from other modalities and devices and showing
No of pages - 25
Sub-topics
1. Functional requirements: What are you building? (General and detailed functionality)
2. Non-functional business requirements: Why are you building it? (Purpose, underlying
service and existing automation, branding and terminology, data needs, access and
availability, legal and business constraints)
3. Non-functional user requirements: Who will use it and what do they want? (User
population demographics and characteristics, engagement patterns, mental models
and domain knowledge, environment and state of mind)
4. Non-functional system requirements; How will you build it? (Available options for
recognizer, parser, and interpreter, external data sources, data storage and data access, other system concerns)
Chapter 5: From Discovery to UX and UI Design: Tools of the Voice-First Trade
Chapter goal: Show how to turn discovery findings into high-level architectural designs,
using flows diagrams, sample dialogs, and detailed dialog management specs.
No of pages - 20
Sub-topics
1. Where to find early user data on any budget (online research, crowd sourcing, dialog
participant observation, focus groups, interviews, and surveys)
2. How discovery results feed into VUI design decisions (dialog manager graphs)
3. Capturing and documenting VUI design (dialog flows, sample dialogs, detailed
design specifications, VUI design documentation approaches)
4. Prototyping and testing your assumptions (early voice UX and prototyping
approaches)
PART 3 – Building Voice System Interactions
Chapter 6: Applying Human 'Rules of Dialog' to Reach Conversation ResolutionChapter goal: Learn that voice-first dialogs have resolutions. Learn how to design and
implement fully specified requests in the 3 core dialog types: question-answer, action
requests, and task completion requests.
No of pages - 30
Sub-topics
1. Dialog acts, games and turns – and Grice
2. Question answering
3. Action requests
4. Task completion requests
5. Fully specified request (Single slot and Multi-slot requests)
6. Determining dialog acts based on feature discovery
7. Dialog completion (Responding to 'goodbye' and 'thanks')
Chapter 7: Resolving Incomplete Requests Through Disambiguation
Chapter goal: Explain how to handle incomplete and ambiguous requests, including
common disambiguation methods (yes/no, A/B sets, lists and menus) and when to apply each.
No of pages - 30
Sub-topics
1. Incomplete requests - how to reach completeness
2. Ambiguous requests3. Disambiguation methods (Logic-based assumptions, Yes/No questions, A/B sets,
Static lists, Dynamic lists, Open sets, Menus)
4. Testing on the device to find and solve issues
5. Toward code independence: using webhooks (fulfillment, contexts, context
parameters, and follow-up)
Chapter 8: Conveying Reassurance with Confidence and Confirmation
Chapter goal: Teach the importance of conveying reassurance and how to apply different confirmation strategies. Introduce discourse markers and backchannels.
No of pages - 30
Sub-topics
1. Conveying reassurance and shared certainty - Setting expectations
2. Webhooks, Take 2 (Dialogflow system architecture, webhook request and response,Implementing the webhook)
3. Confirmation methods (Non-verbal confirmation, Generic acknowledgment, Implicit
and Explicit confirmations)
4. Confirmation placement – confirming slots versus intents
5. Disconfirmation: dealing with “no”
6. Additional reassurance techniques and pitfalls (System pronunciation, Backchannels,
Discourse markers, VUI architecture)
7. Choosing the right reassurance method
Chapter 9: Helping Users Succeed Through Consistency
Chapter goal: Explore how to navigate an audio interaction that is by nature fleeting and
sequential. Provide design and implementation that incorporates consistency through
correctly scoped global commands, landmarks, non-verbal audio.
No of pages - 20
Sub-topics
1. Universals (Uses: clarification and additional information, allow a do-over, provide
an exit)
2. Navigation (Landmarks, Non-verbal audio, Content playback navigation, List
navigation)
3. Consistency, variation and randomization (built-in global intents, consistency across
VUIs and frameworks
Chapter 10: Creating Robust Coverage for Speech-to-Text Resolution
Chapter goal: Teach the nuts and bolts of the computer-side of "listening," starting with
the mapping of sounds to words and how to create solid synonym coverage. Topics
include different approaches to recognition, including regular expressions and statistical
models, dictionaries, domain knowledge, normalizing, and bootstrapping.
No of pages - 25
Sub-topics
1. Recognition is speech-to-text interpretation
2. Recognition engines
3. Grammar concepts (Coverage, Recognition space, Static or dynamic, End-pointing,Multiple hypotheses)
4. Types of grammars (Rule-based grammars, Statistical models, Hot words, Wake
words and invocation names)
5. Working with grammars (Writing rule-based regular expressions)
6. How to succeed with grammars (Bootstrapping, Normalizing punctuation and
spellings, Handling unusual pronunciations, Using domain knowledge, the strengthsand limitations of STT)
7. A simple example (Sample phrases in Dialogflow, Regular expressions in the
webhook)
8. Limitations on grammar creation and use
Chapter 11: Reaching Understanding Through Parsing and Intent Resolution
Chapter goal: Explore the second part of computer "listening": interpreting the meaning.
Topics cover intent resolution, parsing and multiple passes, the use of tagging guides and middle layers.
No of pages - 20
Sub-topics
1. From words to meaning (NLP, NLU)
2. Parsing
3. Machine learning and NLU
4. Ontologies, knowledge bases and content databases
5. Intents (Intent tagging and tagging guides, Middle layers: semantic tags versus system
endpoints)
6. Putting it all together (Matching wide or narrow, Multiple grammars, multiple passes)
7. A simple example (The Stanford Parser revisited, Determining intent, Machine
learning and using knowledge)
Chapter 12: Applying Accuracy Strategies to Avoid Misunderstandings
Chapter goal: Explain how misunderstandings happen and how to avoid them through
techniques that minimize errors and the need to start over. Topics include design and
implementation of a wide set of robustness techniques, including powerful advanced
techniques.
No of pages - 25Sub-topics
1. Accuracy robustness underlying concepts
2. Accuracy robustness strategies (Examples, Providing help, Just-in-time information,
Hidden options and "none of those", Recognize-and-reject, One-step-correction,
Tutorials, Spelling, Narrowing recognition space)
3. Advanced techniques (Multi-tiered behavior and confidence scores, N-best and skiplists, Probabilities, Contextual latency)
Chapter 13: Choosing Strategies to Recover from Miscommunication
Chapter goal: Explore how to recover when miscommunication happens. Show how to
recover and get users back on track quickly, and when to stop trying. Topics include
design and implementation of several recovery strategies.No of pages - 15
Sub-topics
1. Recovery from what?
2. Recovery strategies (Meaningful contextual prompts, Escalating prompts, Tapered
prompts, Rapid reprompt, Backoff strategies)
3. When to stop trying (Max error counts, Transfers)
4. Choosing recovery strategy (Recognition, intent, or fulfillment errors)
Chapter 14: Using Context and Data to Create Smarter Conversations
Chapter goal: Explain why context is king in spoken conversation. Show how to access
and update data from various sources, and how to use that data within and across dialogs to create smarter interactions. Topics focus on how to design and implement context aware dialogs using anaphora, proactive behaviors, proximity, geo-location, domain knowledge, and other powerful methods.
No of pages - 25
Sub-topics
1. Why there’s no conversation without context
2. Reading and writing data (External accounts and services)
3. Persistence within and across conversations
4. Context-aware and context-dependent dialogs (Discourse markers and
acknowledgments, Anaphora resolution, Follow-up dialogs and linked requests,Proactive behaviors, Topic, domain and world knowledge, Geo location-based
behavior, Proximity and relevance, Number and type of devices, Time and day, User
identity, preferences and account types, User utterance wording, System conditions
5. Tracking context in modular and multiturn dialogs
Chapter 15: Creating Secure Personalized ExperiencesChapter goal: Cover personalization and customization. Topics include identification,
authentication, privacy and security concerns, system persona audio, and working with
TTS versus recorded prompts.
No of pages - 25
Sub-topics
1. The importance of knowing who’s talking
2. Individualized targeted behaviors (Concepts in personalization and customization,
Implementing individualized experiences
3. Authorized secure access
4. Approaches to identification and authentication (Implementing secure gated access)
5. Privacy and security concerns
6. System persona (Defining and implementing a system persona, How persona affects
dialogs
7. System voice audio (TTS or voice talent, generated or recorded, Finding and working
with voice talents, One or several voices, Prompt management)
8. Emotion and style
9. Voice for specific user groups
PART 4 – Verifying and Deploying Voice System Interactions
Chapter 16: Testing and Measuring Performance in Voice Systems
Chapter goal: Explain the do’s and don’ts of QA testing a voice system. Topics include
user testing methods that work best for voice, the code needed to support them, and how to improve system performance based on findings.
No of pages - 20
Sub-topics
1. Testing voice system performance (Recognition testing, Dialog traversal: functional
end-to-end testing, Wake word and speech detection testing, Additional system
integration testing)
2. Testing usability and task completion (Voice usability testing concepts, Wizard of Oz
studies)
3. Tracking and measuring performance (Recognition performance metrics, Task
completion metrics, User satisfaction metrics)
Chapter 17: Tuning and Deploying Voice Systems
Chapter goal: Show how to improve, or tune, voice solutions before and after deploying a voice system. Teach what real user data says about the system performance, what to log and track, how to measure accuracy, and how to interpret the data.
No of pages - 25
Sub-topics
1. Tuning: what is it and why do you do it? (Why recognition accuracy isn’t enough,Analyzing causes of poor system performance)
2. Tuning types and approaches (Log-based versus transcription-based tuning, Coverage
tuning, Recognition accuracy tuning, Finding and using recognition accuracy data,
Task completion tuning, Dialog tuning, Prioritizing tuning efforts)
3. Mapping observations to the right remedy (Reporting and using tuning results)
4. How to maximize deployment success (Know when to tune, Understand tuning
complexities to avoid pitfalls)
We start with an overview of how humans and machines produce and process conversational speech, explaining how they differ from each other and from other modalities. This is the background you need to understand the consequences of each design and implementation choice as we dive into the core principles of voice interface design. We walk you through many design and development techniques, including ones that some view as advanced, but that you can implement today. We use the Google development platform and Python, but our goal is to explain the reasons behind each technique such that you can take what you learn and implement it on any platform.
Readers of Mastering Voice Interfaces will come away with a solid understanding of what makes voice interfaces special, learn the core voice design principles for building great voice apps, and how to actually implement those principles to create robust apps. We’ve learned during many years in the voice industry that the most successful solutions are created by those who understand both the human and the technology sides of speech, and that both sides affect design and development. Because we focus on developing task-oriented voice apps for real users in the real world, you’ll learn how to take your voice apps from idea through scoping, design, development, rollout, and post-deployment performance improvements, all illustrated with examples from our own voice industry experiences.
WHAT YOU WILL LEARN
* Create truly great voice apps that users will love and trust
* See how voice differs from other input and output modalities, and why that matters
* Discover best practices for designing conversational voice-first applications, and the consequences of design and implementation choices
* Implement advanced voice designs, with real-world examples you can use immediately.
* Verify that your app is performing well, and what to change if it doesn't
Who This Book Is For
Anyone curious about the real how’s and why’s of voice interface design and development. In particular, it's aimed at teams of developers, designers, and product owners who need a shared understanding of how to create successful voice interfaces using today's technology. We expect readers to have had some exposure to voice apps, at least as users.
Ann Thymé-Gobbel's career has focused on how people use speech and natural language to communicate with each other and with technology. After completing her PhD in cognitive science and linguistics from UC San Diego, she's held a broad set of voice-related UI/UX design roles in both large corporations and small start-ups, working with diverse teams in product development, client project engagements, and R&D. Her past work includes design, data analysis and establishing best practices at Nuance, voice design for mobile and in-home devices at Amazon Lab 126, and creating natural language conversations for multimodal healthcare apps at 22otters. Her research has covered automatic language detection, error correction, and discourse structure. She is currently Director of UI/UX Design at Loose Cannon Systems, the team bringing to market Milo, a handsfree wearable communicator. Ann never stops doing research: she collects and analyzes data at every opportunity and enjoys sharing her findings with others, having presented and taught at conferences internationally.
Charles Jankowski has over 30 years’ experience in industry and academia developing applications and algorithms for real-world users incorporating advanced speech recognition, speaker verification, and natural language technologies. He has used state-of-the-art machine learning processes and techniques for data analysis, performance optimization, and algorithm development. Charles has highly in-depth technical experience with state-of-the-art technologies, effective management of cross-functional teams for all facets of application deployment, and outstanding relationships with clients. Currently, he is Director of NLP at Brain Technologies, creating the Natural iOS application with which you can “Say it and Get it.” Previously he was Director of NLP and Robotics at CloudMinds, Director of Speech and Natural Language at 22otters, Senior Speech Scientist at Performance Technology Partners, and Director of Professional Services at Nuance. He has also been an independent consultant. Charles holds S.B., S.M., and Ph.D. degrees from MIT, all in electrical engineering.
PART 1 – Voice System Foundations
Chapter 1: Say Hello to Voice Systems
Chapter goal: Introduce the reader to voice-first technology, its core concepts, and typical phases of development through an explanatory background for the current state and challenges of voice.
No of pages - 20
Sub-topics
1. Voice-first, voice-only, and conversational everything
2. Introduction to voice technology components (Speech to text, Natural language
understanding, Dialog management, Natural language generation, Text to speech)
3. The phases of voice development success (Plan, Design, Build, Test, Deploy &
Assess, Iterate)
4. Hope is not a strategy – but to plan & execute is
Chapter 2: Keeping Voice in Mind
Chapter goal: Explain to the reader how humans and computers “talk” and “listen.”
What’s easy and hard for the human user and the technology in a dialog, and why.
No of pages - 15
Sub-topics
1. Why voice is different
2. Hands-on: A pre-coding thought experiment
3. Voice dialog and its participants
• The Human: spoken natural language understanding
• The Computer: voice system recognition and interpretation
• Human-computer voice dialog - Successful voice-first development is all about
coordinating human abilities with the technology to allow conversations between
two very different dialog participants.
Chapter 3: Running a Voice Implementation—and Noticing Issues
Chapter goal: Allow the reader to put into practice their newly learned foundation by
implementing and running a simple voice application in the Google Assistant framework,
and experiencing how quickly even a simple voice interaction needs improvement.
No of pages - 15
Sub-topics
1. Hands-on: Preparing a restaurant finder
2. Introducing voice platforms
3. Hands-on: Implementing the restaurant finder
Basic setup, Specifying a first intent, Doing something, What the user says,
What the VUI says, Connecting Dialogflow to Actions on Google, Testingthe app, Saving the voice interaction
4. Google’s voice development ecosystem, and why we're using it here
5. The pros and cons of relying on tools
6. Hands-on: Making changes - testing and iterating (Adding phrases to handle the same meaning, additional content, and more specific)
PART 2 – Planning Voice System InteractionsChapter 4: Defining your Vision: Building What, How, and Why for Whom
Chapter goal: Introduce voice-focused requirement discovery, highlighting differences
from other modalities and devices and showing
No of pages - 25
Sub-topics
1. Functional requirements: What are you building? (General and detailed functionality)
2. Non-functional business requirements: Why are you building it? (Purpose, underlying
service and existing automation, branding and terminology, data needs, access and
availability, legal and business constraints)
3. Non-functional user requirements: Who will use it and what do they want? (User
population demographics and characteristics, engagement patterns, mental models
and domain knowledge, environment and state of mind)
4. Non-functional system requirements; How will you build it? (Available options for
recognizer, parser, and interpreter, external data sources, data storage and data access, other system concerns)
Chapter 5: From Discovery to UX and UI Design: Tools of the Voice-First Trade
Chapter goal: Show how to turn discovery findings into high-level architectural designs,
using flows diagrams, sample dialogs, and detailed dialog management specs.
No of pages - 20
Sub-topics
1. Where to find early user data on any budget (online research, crowd sourcing, dialog
participant observation, focus groups, interviews, and surveys)
2. How discovery results feed into VUI design decisions (dialog manager graphs)
3. Capturing and documenting VUI design (dialog flows, sample dialogs, detailed
design specifications, VUI design documentation approaches)
4. Prototyping and testing your assumptions (early voice UX and prototyping
approaches)
PART 3 – Building Voice System Interactions
Chapter 6: Applying Human 'Rules of Dialog' to Reach Conversation ResolutionChapter goal: Learn that voice-first dialogs have resolutions. Learn how to design and
implement fully specified requests in the 3 core dialog types: question-answer, action
requests, and task completion requests.
No of pages - 30
Sub-topics
1. Dialog acts, games and turns – and Grice
2. Question answering
3. Action requests
4. Task completion requests
5. Fully specified request (Single slot and Multi-slot requests)
6. Determining dialog acts based on feature discovery
7. Dialog completion (Responding to 'goodbye' and 'thanks')
Chapter 7: Resolving Incomplete Requests Through Disambiguation
Chapter goal: Explain how to handle incomplete and ambiguous requests, including
common disambiguation methods (yes/no, A/B sets, lists and menus) and when to apply each.
No of pages - 30
Sub-topics
1. Incomplete requests - how to reach completeness
2. Ambiguous requests3. Disambiguation methods (Logic-based assumptions, Yes/No questions, A/B sets,
Static lists, Dynamic lists, Open sets, Menus)
4. Testing on the device to find and solve issues
5. Toward code independence: using webhooks (fulfillment, contexts, context
parameters, and follow-up)
Chapter 8: Conveying Reassurance with Confidence and Confirmation
Chapter goal: Teach the importance of conveying reassurance and how to apply different confirmation strategies. Introduce discourse markers and backchannels.
No of pages - 30
Sub-topics
1. Conveying reassurance and shared certainty - Setting expectations
2. Webhooks, Take 2 (Dialogflow system architecture, webhook request and response,Implementing the webhook)
3. Confirmation methods (Non-verbal confirmation, Generic acknowledgment, Implicit
and Explicit confirmations)
4. Confirmation placement – confirming slots versus intents
5. Disconfirmation: dealing with “no”
6. Additional reassurance techniques and pitfalls (System pronunciation, Backchannels,
Discourse markers, VUI architecture)
7. Choosing the right reassurance method
Chapter 9: Helping Users Succeed Through Consistency
Chapter goal: Explore how to navigate an audio interaction that is by nature fleeting and
sequential. Provide design and implementation that incorporates consistency through
correctly scoped global commands, landmarks, non-verbal audio.
No of pages - 20
Sub-topics
1. Universals (Uses: clarification and additional information, allow a do-over, provide
an exit)
2. Navigation (Landmarks, Non-verbal audio, Content playback navigation, List
navigation)
3. Consistency, variation and randomization (built-in global intents, consistency across
VUIs and frameworks
Chapter 10: Creating Robust Coverage for Speech-to-Text Resolution
Chapter goal: Teach the nuts and bolts of the computer-side of "listening," starting with
the mapping of sounds to words and how to create solid synonym coverage. Topics
include different approaches to recognition, including regular expressions and statistical
models, dictionaries, domain knowledge, normalizing, and bootstrapping.
No of pages - 25
Sub-topics
1. Recognition is speech-to-text interpretation
2. Recognition engines
3. Grammar concepts (Coverage, Recognition space, Static or dynamic, End-pointing,Multiple hypotheses)
4. Types of grammars (Rule-based grammars, Statistical models, Hot words, Wake
words and invocation names)
5. Working with grammars (Writing rule-based regular expressions)
6. How to succeed with grammars (Bootstrapping, Normalizing punctuation and
spellings, Handling unusual pronunciations, Using domain knowledge, the strengthsand limitations of STT)
7. A simple example (Sample phrases in Dialogflow, Regular expressions in the
webhook)
8. Limitations on grammar creation and use
Chapter 11: Reaching Understanding Through Parsing and Intent Resolution
Chapter goal: Explore the second part of computer "listening": interpreting the meaning.
Topics cover intent resolution, parsing and multiple passes, the use of tagging guides and middle layers.
No of pages - 20
Sub-topics
1. From words to meaning (NLP, NLU)
2. Parsing
3. Machine learning and NLU
4. Ontologies, knowledge bases and content databases
5. Intents (Intent tagging and tagging guides, Middle layers: semantic tags versus system
endpoints)
6. Putting it all together (Matching wide or narrow, Multiple grammars, multiple passes)
7. A simple example (The Stanford Parser revisited, Determining intent, Machine
learning and using knowledge)
Chapter 12: Applying Accuracy Strategies to Avoid Misunderstandings
Chapter goal: Explain how misunderstandings happen and how to avoid them through
techniques that minimize errors and the need to start over. Topics include design and
implementation of a wide set of robustness techniques, including powerful advanced
techniques.
No of pages - 25Sub-topics
1. Accuracy robustness underlying concepts
2. Accuracy robustness strategies (Examples, Providing help, Just-in-time information,
Hidden options and "none of those", Recognize-and-reject, One-step-correction,
Tutorials, Spelling, Narrowing recognition space)
3. Advanced techniques (Multi-tiered behavior and confidence scores, N-best and skiplists, Probabilities, Contextual latency)
Chapter 13: Choosing Strategies to Recover from Miscommunication
Chapter goal: Explore how to recover when miscommunication happens. Show how to
recover and get users back on track quickly, and when to stop trying. Topics include
design and implementation of several recovery strategies.No of pages - 15
Sub-topics
1. Recovery from what?
2. Recovery strategies (Meaningful contextual prompts, Escalating prompts, Tapered
prompts, Rapid reprompt, Backoff strategies)
3. When to stop trying (Max error counts, Transfers)
4. Choosing recovery strategy (Recognition, intent, or fulfillment errors)
Chapter 14: Using Context and Data to Create Smarter Conversations
Chapter goal: Explain why context is king in spoken conversation. Show how to access
and update data from various sources, and how to use that data within and across dialogs to create smarter interactions. Topics focus on how to design and implement context aware dialogs using anaphora, proactive behaviors, proximity, geo-location, domain knowledge, and other powerful methods.
No of pages - 25
Sub-topics
1. Why there’s no conversation without context
2. Reading and writing data (External accounts and services)
3. Persistence within and across conversations
4. Context-aware and context-dependent dialogs (Discourse markers and
acknowledgments, Anaphora resolution, Follow-up dialogs and linked requests,Proactive behaviors, Topic, domain and world knowledge, Geo location-based
behavior, Proximity and relevance, Number and type of devices, Time and day, User
identity, preferences and account types, User utterance wording, System conditions
5. Tracking context in modular and multiturn dialogs
Chapter 15: Creating Secure Personalized ExperiencesChapter goal: Cover personalization and customization. Topics include identification,
authentication, privacy and security concerns, system persona audio, and working with
TTS versus recorded prompts.
No of pages - 25
Sub-topics
1. The importance of knowing who’s talking
2. Individualized targeted behaviors (Concepts in personalization and customization,
Implementing individualized experiences
3. Authorized secure access
4. Approaches to identification and authentication (Implementing secure gated access)
5. Privacy and security concerns
6. System persona (Defining and implementing a system persona, How persona affects
dialogs
7. System voice audio (TTS or voice talent, generated or recorded, Finding and working
with voice talents, One or several voices, Prompt management)
8. Emotion and style
9. Voice for specific user groups
PART 4 – Verifying and Deploying Voice System Interactions
Chapter 16: Testing and Measuring Performance in Voice Systems
Chapter goal: Explain the do’s and don’ts of QA testing a voice system. Topics include
user testing methods that work best for voice, the code needed to support them, and how to improve system performance based on findings.
No of pages - 20
Sub-topics
1. Testing voice system performance (Recognition testing, Dialog traversal: functional
end-to-end testing, Wake word and speech detection testing, Additional system
integration testing)
2. Testing usability and task completion (Voice usability testing concepts, Wizard of Oz
studies)
3. Tracking and measuring performance (Recognition performance metrics, Task
completion metrics, User satisfaction metrics)
Chapter 17: Tuning and Deploying Voice Systems
Chapter goal: Show how to improve, or tune, voice solutions before and after deploying a voice system. Teach what real user data says about the system performance, what to log and track, how to measure accuracy, and how to interpret the data.
No of pages - 25
Sub-topics
1. Tuning: what is it and why do you do it? (Why recognition accuracy isn’t enough,Analyzing causes of poor system performance)
2. Tuning types and approaches (Log-based versus transcription-based tuning, Coverage
tuning, Recognition accuracy tuning, Finding and using recognition accuracy data,
Task completion tuning, Dialog tuning, Prioritizing tuning efforts)
3. Mapping observations to the right remedy (Reporting and using tuning results)
4. How to maximize deployment success (Know when to tune, Understand tuning
complexities to avoid pitfalls)
Artikel-Details
- Anbieter:
- Apress
- Autor:
- Ann Thymé-Gobbel, Charles Jankowski
- Artikelnummer:
- 9781484270059
- Veröffentlicht:
- 29.05.21