@speech action

A custom Datastar action plugin that leverages the Web Speech API in order to generate text-to-speech output.

1<button data-on:click="@speech('hello world')">
2  Listen
3</button>

#Getting started

The plugin expects you to provide an import map that specifies the location of the datastar module, then it's a simple matter of including a <script type="module"> element for the plugin. For example:

1<script type="importmap">
2  {
3    "imports": {
4      "datastar": "https://cdn.jsdelivr.net/gh/starfederation/datastar@1.0.0-RC.6/bundles/datastar.js"
5    }
6  }
7</script>
8<script type="module" src="https://cdn.jsdelivr.net/gh/regaez/datastar-speech@main/datastar-speech.min.js"></script>

You can view the source code on Github.

#Why?

The Web Speech API is generally quite clunky to use. Considering its imperative nature, asynchronous callback functions, poor queue management support, and user-agent inconsistencies; it is not particularly suited for use within the context of HTML attributes / Datastar expressions, where one ideally wants to write concise, functional code.

The plugin attempts to smooth over some of the rough edges, simplify its use and expose the capabilities of the Web Speech API via a declarative API, enabling you to harness text-to-speech functionality more easily with Datastar.

#Examples

For each of these, you can inspect the button element to see how @speech is used.

  1. It accepts string inputs, e.g.

    hello world
  2. It accepts number inputs, e.g.

    1,928,374.65
  3. It accepts boolean expressions and values, e.g.

    true
  4. It accepts HTML elements and will read their text context.

  5. It can also accept a signal created by the data-ref attribute.
  6. By default, @speech will enqueue the text, so that it will finish speaking any existing queued text before starting the next. You may override this behaviour by setting the queue option. Available values are immediate, next, replace, and append (default). Note: replace will remove all other text in the queue.

    Try clicking this button with different queue options selected. It helps to queue up some of the other speech examples first to notice the difference.

    Queue:

    1. You can also customise the rate, pitch, volume, and voice options. These values can be controlled via the @speechCtrl('configure', { rate, pitch, volume, voice }) action, with signals bound to inputs, if desired. Inspect the <fieldset> element below to see how the data-on-signal-patch attribute is used to keep the properties in sync with signals.

      Due to limitations of the Web Speech API, changing any options during playback will cause the current utterance to restart, as SpeechSynthesisUtterance does not allow for these settings to be adjusted on-the-fly.
      Options

      If you manually specify a voice option when using @speech then it will be preferred over one set by the @speechCtrl action. This allows you to assign specific voices to utterances that will be retained in the queue history.

      For example: click 'Listen' above, change the voice selection, click 'Listen' again, then use the Queue from the previous example to play back both entries. Notice how they retain which voice was assigned to them when originally queued to play? Compare that with the earlier examples; those should use whichever voice is currently selected (because they did not specify any voice parameter when queued).
    2. You can also specify a lang option. This is useful when the text's language does not match that of the main page, as it will indicate to the user agent which type of voice to prefer using.

      If this option is not specified, it will try to use the <html lang="..."> attribute value, then fall back to the user-agent's default language preference, similar to the SpeechSynthesisUtterance's behaviour.

      An important note on the behaviour here is that, if you have previously selected a custom voice, then that voice will be preferred, regardless of lang. In such cases, in order to let the user agent automatically pick a language-appropriate voice, you must also specify voice: 'prefer-lang' as an option.

      The following, for example, should use a German voice, if your user-agent supports it:

      Ich glaube, dass mein Schwein pfeift.

      You can try changing the language and see how it affects the voice used, and thus the pronunciation, when clicking the button above:

    3. You can also create custom playback controls using the @speechCtrl action, which accepts the following parameters: play, pause, reset, next, previous, and remove.

      You can also listen for the datastar-speech-status event, which is fired during initialisation, and after every playback status change. From the event details, you can extract properties, such as: isPlaying, canPlay, canPause, canReset, hasNext, hasPrevious, etc, and store them as signals. Inspect the HTML of the following controls to see how they're used.

      Signal state:

    #API

    The plugin adds two new actions that you can use within Datastar expressions:

    • @speech
    • @speechCtrl

    #Action @speech

    This action is the primary means to start speech, and the only way you can enqueue text, in order for it to be spoken by the Web Speech API. By default, playback will start immediately if the queue is empty or all existing items in the queue have already finished playing.

    @speech(input: string|number|boolean|HTMLElement, opts?: SpeechOptions)

    input

    The input must be one of type string, number, boolean, HTMLElement. The input text length cannot exceed 32767 characters (this is a limitation of the Web Speech API).

    opts: SpeechOptions

    These are not required, and all SpeechOptions properties are also optional. If not set, they will fall back to the SpeechSynthesisUtterance default values.

    Properties

    queue
    'append' | 'immediate' | 'next' | 'replace'; whenreplace, it cancels current playback, clears the existing speech queue, and plays the given text input immediately.
    lang
    string; a string representing a BCP 47 language tag. This should match the language of the input text, so that an appropriateSpeechSynthesisVoiceis used. If not set, it will use the value of the HTML lang attribute, or the user-agent's language.
    voice
    string; the name of an available SpeechSynthesisVoice. Note: the list of available voices is different for every user agent, and many sound terrible, thus it is probably best to not specify a voice.

    In order to reliably use this option, you should almost always refer to the list of available voices, returned by window.speechSynthesis.getVoices() or the datastar-speech-voices-loaded custom event, rather than hardcoding a value. An exception to this: when you have manually specified the lang option, you should also specify voice: 'prefer-lang'; this custom value will override voice selections configured via the @speechCtrl action and let the user-agent pick a language-appropriate voice automatically.

    #Action @speechCtrl

    This action enables you to control playback, and the primary means to adjust the playback options of the SpeechSynthesisUtterance.

    When invoked, it will automatically apply the new configuration to the existing utterance, and any future utterances. If an utterance is currently being played, it will trigger the utterance to restart immediately with the new settings; the Web Speech API does not expose the capability to change these properties on-the-fly.

    Properties

    @speechCtrl(command: string, opts?: SpeechCtrlOptions)

    command

    The command must be one of the following string values:

    play
    This will start playback, or resume (if paused). You may also, optionally, pass an index property within the SpeechCtrlOptions argument to play a specific utterance stored in the queue. If the index is invalid for the current queue, playback will resume as if no index had been provided (i.e. it will restart the current item, or play the next item in the queue).
    pause
    This will pause playback. If the user-agent supports it, playback will be able to resume from the exact moment it was paused; otherwise, the utterance will be restarted. If nothing is playing, this command will have no effect.
    reset
    This will stop current playback immediately and clear the playback queue. The queue state cannot be restored once reset.
    next
    This will trigger the next item in the queue to play immediately. If there is no next item, this command will have no effect.
    previous
    This will trigger the previous item in the queue to play immediately. If there is no previous item, this command will have no effect.
    remove
    Can be used to remove an item from the queue. It requires an index property to be provided with the SpeechCtrlOptions argument. If the index is invalid for the current queue, this command will have no effect.
    configure
    Can be used to adjust playback settings of the SpeechSynthesisUtterance. You may, optionally, pass the following properties within the SpeechCtrlOptions argument: rate, pitch, volume, and voice. If not specified for any given invocation of the @speechCtrl action, these properties will retain their previous configured value, or fallback to their defaults.

    opts: SpeechCtrlOptions

    This argument is generally not required, unless the command being invoked expects additional parameters, i.e. play, remove, orconfigure. All SpeechCtrlOptions properties are optional; if not set, there will be no effect to the respective properties when invoking the action.

    index
    number; zero-based, targets a specific item in the queue. Only applicable to the play and remove command parameters. It will otherwise be ignored.
    rate
    number; adjusts the speed of the utterance. Valid values range from 0.1-10. Defaults to 1. Only applicable to the configure command parameter.
    pitch
    number; adjusts the pitch of the utterance. Valid values range from 0-2. Defaults to 1. Only applicable to the configure command parameter.
    volume
    number; adjusts the volume of the utterance. Valid values range from 0-1. Defaults to 1. Only applicable to the configure command parameter.
    voice
    string; the name of an available SpeechSynthesisVoice. Only applicable to the configure command parameter. Note: the list of available voices is different for every user agent, and many sound terrible, thus it is probably best to not specify a voice.

    In order to reliably use this option, you should almost always refer to the list of available voices, returned by window.speechSynthesis.getVoices() or the datastar-speech-voices-loaded custom event, rather than hardcoding a value.

    #Custom Events

    The plugin will emit the following events in order to be able to notify you of any internal state changes within the plugin/Web Speech API:

    • datastar-speech-status
    • datastar-speech-voices-loaded

    You can hook onto these using the standard data-on attribute and, if you wish, store any relevant details into your own signals to enable your UI to react to them accordingly.

    #Event datastar-speech-status

    This custom event is dispatched on the window once the plugin has initialised and each time the playback state of any queued speech has changed.

    Properties

    The following fields can be found within the evt.detail object:

    isPlaying
    boolean; indicates whether an utterance is currently being played.
    canPlay
    boolean; indicates whether it is possible to start/resume playback, i.e. an utterance is currently paused, or there is at least one utterance remaining in the queue.
    canPause
    boolean; indicates whether it is possible to pause an utterance, i.e. an utterance is currently playing.
    canReset
    boolean; indicates whether it is possible to stop playback and clear the utterance queue.
    hasNext
    boolean; indicates whether there is an item in the queue after the utterance currently being played.
    hasPrevious
    boolean; indicates whether there is an item in the queue before the utterance currently being played.
    queue
    string[]; a list containing the text of all utterances that have been queued to play. This also includes previously-ended utterances, unless the queue has since been replaced or reset.
    index
    number; the position within the playback queue, i.e. which utterance is currently being played, or was last played.

    Example

    Here is an example using object deconstruction, with property assignment, to store each event property in its own local signal, updating them each time a status event is emitted:

     1<div data-on:datastar-speech-status__window="({
     2  isPlaying: $_isPlaying,
     3  canPlay: $_canPlay,
     4  canPause: $_canPause,
     5  canReset: $_canReset,
     6  hasNext: $_hasNext,
     7  hasPrevious: $_hasPrevious,
     8  queue: $_queue,
     9  index: $_index
    10} = evt.detail)"></div>
    
    Pay particular attention to the surrounding parentheses, i.e. ({...} = obj), as these are necessary to evaluate the assignment-pattern deconstruction as a valid expression.

    It is worth noting that you do not have to deconstruct all properties from evt.detail, only those you actually want to use; you have complete flexibility here to use as many, or as few, signals as you desire.

    It is also important to understand that the signal binding is one-way; changing the values for any of these signals yourself will not affect the equivalent internal state of the datastar-speech plugin.

    #Event datastar-speech-voices-loaded

    This custom event is dispatched on the window when the plugin has detected that the Web Speech API has finished loading the SpeechSynthesisVoice list.

    Properties

    detail
    SpeechSynthesisVoice[]; the list of voices that are available for this user agent. Please refer to this documentation for more information.

    Example

    You could use this event with data-on, data-bind, and a custom function, in order to populate a select input options and assign the value to a signal:

    1<select
    2  data-bind:voice
    3  data-on:datastar-speech-voices-loaded__window="appendVoiceOptions(el, evt.detail);"
    4></select>
    
    To see this page's implementation of the appendVoiceOptions() function, open up the developer tools console and enter: exampleJS.