GridTips: Enhancing User Experience With The Web Speech API

Enhancing User Experience With The Web Speech API

By Ruth John

December 5th, 2014

AccessibilityAPITechniques

2 Comments

It’s an sparkling time for web APIs, and one to watch out for is a Web Speech API. It enables websites and web apps not usually to pronounce to you, though to listen, too. It’s still early days, though this functionality is set to open a whole array of use cases. I’d contend that’s flattering awesome.

In this article, we’ll demeanour during a record and a due usage, as good as some good examples of how it can be used to raise a user experience.

Disclaimer: This record is flattering cutting-edge, and a selection is now with a W3C as an “unofficial editor’s draft” (as of 6 Jun 2014). The odds that use will differ somewhat from a formula snippets in this essay is high. Checking a specification³ and contrast entirely before releasing formula are always wise.

Speech Synthesis

The API comes in dual parts. To start, let’s demeanour during a debate singularity part, a bit that speaks to you. If your website has some textual calm — possibly physique copy, forms inputs, alt tags, etc. — we could run some poetic functions and a device would pronounce a difference to a user.

Let’s demeanour during some of a formula indispensable to make this happen. First, we would emanate a new instance of a SpeechSynthesisUtterance interface. Then, we would mention a content to be spoken. Then, we would supplement this instance to a queue, that tells a browser what to pronounce and when.

Below we have wrapped all of this in a duty for us to call, named speak, with a content we wish oral as a parameter.

function speak(textToSpeak) 
 // Create a new instance of SpeechSynthesisUtterance
 var newUtterance = new SpeechSynthesisUtterance();

 // Set a text
 newUtterance.text = textToSpeak;

 // Add this content to a tongue queue
 window.speechSynthesis.speak(newUtterance);

All we need to do now is call this duty and pass in some difference to be spoken:

speak('Welcome to Smashing Magazine');

More functionality is enclosed in SpeechSynthesisUtterance. You can stop, start and postponement a queue, as good as set a language, rate and voice for any utterance. Stopping, starting or pausing an tongue fires an eventuality that we can offshoot into, as does changing a voice. Plenty to play around with!

At a moment, debate singularity is upheld usually in Chrome and Safari (both on desktop and mobile devices). Also, a voices accessible to we around a API mostly count on a handling system. Google has a possess set of default voices for Chrome, accessible on Mac OS X, Windows and Ubuntu. However, Mac OS X’s voices are also accessible and, thus, are a same as in Safari on OSX. You can simply see that voices are accessible in a Developer Tools console:

window.speechSynthesis.getVoices();

Tip: If you’re on OS X, check out a voice “Zarvox.”

Speech Recognition

The other partial of a Web Speech API is debate recognition, that enables a user to pronounce into a device’s microphone and have their debate famous by a website or web app.

Let’s run by some code. This time, we’ll emanate a new instance of a SpeechRecognition interface. Because this partial is upheld usually in Chrome, we’ll have to embody a webkit prefix.

var newRecognition = webkitSpeechRecognition();

SpeechRecognition comes with utterly a few attributes. One that we are expected to change is continuous, whose default state of false means that a browser will stop listening after a mangle in speech. If we wish your website or web app to keep listening, afterwards set a charge to true:

newRecognition.continuous = true;

To start and stop debate recognition, call a start() and stop() methods:

// start recognition
newRecognition.start();

// stop recognition
newRecognition.stop();

Again, we can offshoot into copiousness of events, such as soundstart, speechstart, result and error. we have prepared a demo⁴ that shows how to entrance a difference detected, from a result eventuality method. The formula goes on to compare a difference oral opposite some elementary navigation, activating a suitable integrate if detected.

Uses

Dictation

At a moment, a many common use of a Speech API is as a dictation or reading mechanism. That is, a user speaks into a mic and a device translates a debate into content (as demoed by Chrome’s growth team⁵), or a user passes in content to be review out by a device.

Having a device pronounce out some information really has a advantages. Imagine your counterpart revelation we what a continue will be like initial thing in a morning.

Plenty of automobile manufacturers have commissioned text-to-speech capabilities over a final integrate of years. Imagine, in a not-too-distant future, your browser’s reading list being review out to we as we drive.

Voice Control

Dictation could simply be incited into voice control, as we saw with a approval demo above, that could be mutated to concede for navigation around a website. Add it to web-enabled TVs and we competence only be vital in a 2015 of Back to a Future 2.

I’m advantageous to work with some really gifted colleagues, one of whom combined a tennis scoring app. we was gay to find that he could control a app with his voice, vocalization a measure out shrill as he was personification a game.

Translation

Translation would demeanour really opposite when finished in genuine time. Someone could inverse in one language, and another person’s device would pronounce out what is being pronounced in their possess language. Hook that adult to a Bluetooth ear square and eat your heart out Arthur Dent⁶. We’re removing a small closer to any chairman carrying their possess Babel fish⁷.

Limitations

Offline capability needs some-more consideration. As it stands, Chrome sends a available audio to a servers and pings behind a result. Thus, an Internet tie is indispensable for it to work — not ideal.

Conclusion

Nevertheless, it is still exciting, and a record is opening up. we demeanour brazen to a day when looking for a remote is a thing of a past, and we can only tell a TV to tide a latest Sin City movie.

Would we indeed use a web for this? Why not? It’s already universal. You can take a web and a debate wherever we go.

I have met some insurgency when articulate about this API. People possibly can’t see a need for it with a web, or they would feel worried articulate to their device — both current views. However, we wish we have desirous we to during slightest give it a go and consider about it a subsequent time we are building something. Start welcoming speech: It competence be only what you’re listening for.

(ml, al, il)

Footnotes

1 http://slides.com/schold/web-speech-api#/

2 http://slides.com/schold/web-speech-api#/

3 https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.html

4 http://codepen.io/Rumyra/pen/bCphe

5 https://www.google.com/intl/en/chrome/demos/speech.html

6 http://en.wikipedia.org/wiki/Arthur_Dent

7 http://en.wikipedia.org/wiki/List_of_races_and_species_in_The_Hitchhiker%27s_Guide_to_the_Galaxy#Babel_fish

↑ Back to topShare on Twitter

Enhancing User Experience With The Web Speech API

GridTips

sexta-feira, 5 de dezembro de 2014

Enhancing User Experience With The Web Speech API

Enhancing User Experience With The Web Speech API

Speech Synthesis

Speech Recognition

Uses

Dictation

Voice Control

Translation

Limitations

Conclusion

Footnotes

Nenhum comentário:

Postar um comentário

Labels