`data-highlight` attribute

A custom Datastar attribute plugin that uses the CSS Custom Highlight API to add syntax highlighting to elements.

<pre data-highlight="json">
  {
    "foo": "bar",
    "baz": false
  }
</pre>

The example above is, in fact, using the plugin to style the html markup in the code snippet!

#Getting started

The plugin expects you to provide an import map that specifies the location of the datastar module, as well as any languages you want to support. To include languages, you must create an import mapping for each, using the module name format: data-highlight:<lang_code>.

Then, it's a simple matter of including a <script type="module"> element for the plugin and a <link rel="stylesheet"> with the relevant CSS. For example, the following configuration will enable highlighting for css, json, and html:

<script type="importmap">
  {
    "imports": {
      "datastar": "https://cdn.jsdelivr.net/gh/starfederation/datastar@1.0.0-RC.6/bundles/datastar.js",
      "data-highlight:css": "https://cdn.jsdelivr.net/gh/regaez/data-highlight@0.1.0/dist/tokenizerss/css.js",
      "data-highlight:json": "https://cdn.jsdelivr.net/gh/regaez/data-highlight@0.1.0/dist/tokenizerss/json.js",
      "data-highlight:html": "https://cdn.jsdelivr.net/gh/regaez/data-highlight@0.1.0/dist/tokenizerss/html.js"
    }
  }
</script>
<link href="https://cdn.jsdelivr.net/gh/regaez/data-highlight@0.1.0/data-highlight.css" rel="stylesheet">
<script type="module" src="https://cdn.jsdelivr.net/gh/regaez/data-highlight@0.1.0/dist/data-highlight.js"></script>

You can view the source code on Github. Check out the tokenizers directory to view all currently available language tokenizers.

#Examples

You can inspect the HTML to see how each of the examples work.

You can style text simply by adding the data-highlight attribute to any element that only contains text nodes. It is recommended to use pre elements with pre-formatted text.

For example, here are various supported syntaxes:

HTML

<!doctype html>
<html lang="en">
  <head>
    <title>Test</title>
    <meta charset="utf-8" />
    <style>
      body {
      	background: red;
      }
    </style>
    <script type="importmap">
      {
        "imports": {
          "foo": "/foo.js"
        }
      }
    </script>
  </head>
  <body>
    <!-- this is a comment -->
    <main id="main" class='example'>
      <layout-center intrinsic>
        <h1>Page title</h1>
        <hr/>
        <input type="text" name="text" />
        <h2 id=title>Sub heading</h2>
        <p>My cat is <strong>very</strong> grumpy.</p>
        <a href=https://www.example.com title="Isn't this fun?">Link</a>
        <INPUT type=text/>
      </layout-center>
    </main>
    <script>
      let foo = "bar";
    </script>
  </body>
</html>

CSS

code {
  font-family: var(--font-monospace);
  font-size: 0.8em;
  font-weight: 400;
  color: var(--color-green);
  border: 1px dotted oklch(from var(--color-green) l c h / 0.35);
  padding: 0.15em 0.25em;
  border-radius: 4px;
}

JSON

{
  "foo": "bar",
  "arr": [1, 2, -3],
  "obj": {
    "baz": true,
  },
  "err": null
}

The highlighting will dynamically update whenever the text node changes. For example, here is some JSON updating every second:
The plugin includes some custom first-party tokenizers, but also provides a compatibility layer in order to support PrismJS, if you wish to use that instead. This enables you to take advantage of the many language tokenizers available for Prism.
You can toggle between tokenizer implementations for the following example, if you wish:
```
{
  "foo": "bar",
  "baz": [1, 2, true],
  "qux": {
    "zulu": false,
    // a comment
    "alpha": null, // another comment
    "beta": 1.5
  }
}
```
The highlighting changes because the tokenizers assign different type values for certain ranges of text, thus they're mapped to different CSS rules. None of the CSS itself changed.
Please refer to the documentation for more information on how to integrate with PrismJS.
While input and textarea elements are not supported by the CSS Highlight API, a possible workaround for this could be to use data-highlight on a pre element that is overlaid on top of a (mostly) invisible textarea. With a little bit of CSS, and a couple of Datastar signals, you can (fairly quickly) get pretty close to mimicking a textarea with syntax highlighting.
For example, try typing some CSS into the box below:
Unfortunately, you may notice above, whenever data-text replaces the text node on each keypress, it results in an unavoidable, sometimes imperceptible, flash of unstyled content (FOUC), as the highlighted range was assigned to the previous text node. The highlighting will return as soon as the new text node has been processed by the plugin.
In order to mitigate this, the plugin also includes a new data-highlight-text attribute. This acts similarly to data-text but contains some extra logic to split text on newlines and replace/append text data within existing text nodes, rather than replace the text node entirely. This helps retain any existing highlighting on text that did not change, thus reducing the resultant FOUC to new text. You likely won't even notice it.
Compare the example below, which uses data-highlight-text, with the previous example:
However, for most use-cases, you should probably continue using data-text where possible, as it is more performant (by virtue of simply doing less). The data-highlight-text attribute is only really intended to be used alongside data-highlight when there is frequently updating text content.
You can also use the contenteditable="plaintext-only" attribute on elements and benefit from real-time highlighting.
Try typing some CSS into the editable element below.
```
.foo {
  color: red;
}
```
A caveat to this is that cross-browser behaviour is somewhat inconsistent, and if you use contenteditable="true" the browser will let you use shortcuts, like Ctrl+B to bold text, etc. When this happens, new elements are inserted as child nodes alongside the text nodes. This will break highlighting, as the plugin is restricted to only operate on elements with text nodes.

#Documentation

The plugin adds two new attributes that you can use on elements:

data-highlight
data-highlight-text

#Why use this?

The plugin is a flexible implementation of syntax highlighting that can adapt to dynamic content changes in the DOM. Do you use the data-json-signals often? If so, you might find this plugin synergises nicely with it.

The plugin utilises the CSS Custom Highlight API so you only need to provide the raw text; no need to wrap your code snippets in dozens of additional span elements. This makes the plugin ideal when you are serving a static HTML page and/or do not have a backend set up to parse code snippets and render them with syntax highlighting.

In order to stay light-weight and take advantage of resource caching, the plugin is designed to be extensible; loading tokenizers dynamically, on-demand and serving each tokenizer as its own module. This means that the client will only download tokenizers for languages that you have actually used on the page, keeping the footprint as small as possible, while also enabling you to include custom tokenizers for any language you wish to support.

Another powerful advantage of this method of highlighting is that it is easily customisable with CSS. The plugin provides some sample CSS to get you started, but you remain in full control. It uses the ::highlight() pseudo-element, meaning you can support multiple themes, change them on-the-fly, and easily add additional rules for any token types your custom tokenizers may return. No need to worry about trying to override baked-in inline styles on span elements.

#Drawbacks

There are, unfortunately, some drawbacks to the plugin's implementation with the CSS Highlight API. Some are unavoidable, others could perhaps be improved upon.

Requires a modern browser; while the CSS Highlight API is baseline widely available, if you need to support older browsers you will want to consider a different approach.
Flashes of unstyled content; there will be no styling before the plugin has processed the text on first page load, and FOUCs will occur any time the text node updates. This is unfortunately unavoidable.
Increased client-side processing; if your code snippet is static and never changes, then processing the syntax highlighting once server-side and sending static HTML will be more performant, compared to re-calculating each time on the client.
Limited set of stylable CSS properties; this is a limitation of the ::highlight() pseudo-element. Only a small subset of CSS properties can be used. If you need to apply other styles while highlighting, this plugin will not be suitable.
Cannot style inputs, or textareas; this is a current limitation of the CSS Highlights API. Perhaps in the future such functionality will be included in the specification. In the meantime, you could consider using the overlay approach, as demonstrated above.
Only operates on text nodes; this decision was made in order to keep the plugin simple, reliable, and performant. Ranges can start and end on different nodes, so the plugin will attempt to highlight text that may spread across multiple text nodes.

#Attribute `data-highlight`

To use the data-highlight attribute, simply pass the language as the attribute's key or value and ensure the element only contains text nodes (i.e. no nested elements) with the code snippet you want highlighted:

<!-- using the attribute key -->
<pre data-highlight:json>{ "foo": "bar" }</pre>

<!-- or, using the value -->
<pre data-highlight="json">{ "foo": "bar" }</pre>

If the element with the data-highlight attribute contains nested elements, the plugin will return early and not apply any highlights to that element.

The value of the attribute must be a language string; it is not a Datastar expression. However, you are still able to dynamically change the target language with a signal, if you wish. You can compose it with the standard data-attr attribute to set the value whenever the signal changes, for example:

<!-- assuming the $language signal is defined elsewhere -->
<pre data-attr:data-highlight="$language"></pre>

Modifiers

The attribute supports the following modifiers:

__debug

When included, this will trigger the plugin to log the input, language code, and array of tokens to the browser developer console, each time the tokenizer is invoked for that local use-case.

This can be particularly useful when you are customising your CSS theme and are trying to identify which token types were assigned to certain text ranges.

Examples: data-highlight:html__debug or data-highlight__debug="html"

#Attribute `data-highlight-text`

The data-highlight-text attribute functions similarly to that of data-text, except it is specifically written to synergise better with data-highlight by retaining existing highlights, where possible, when updating text. This helps to avoid flashes of unstyled content after an update.

It accomplishes this by splitting text on every new line, creating distinct text nodes for each line, and then later re-using those same nodes when possible, such that previous highlight ranges can be repurposed, rather than orphaned and cleaned-up.

For example:

<!-- Prints all signals, with json syntax highlighting: -->
<pre
  data-highlight:json
  data-highlight-text="JSON.stringify($, null, 2)">
</pre>
<!-- which is equivalent to: -->
<pre data-highlight:json data-json-signals></pre>

This plugin is ideal for content that updates with a high frequency, for example signals that change on short intervals, or when handling text input, where new highlighting may be applied with every keystroke.

For static, or infrequently changing content, it is likely not necessary; in these instances, outputting text as raw HTML content, using the data-text attribute, should suffice.

#Using PrismJS

You may very well wish to use PrismJS, as it has a large assortment of supported languages, much more so than the data-highlight plugin will ever have first-party tokenizers for. Thus, the plugin provides a compatibility layer to be able to easily leverage the tokenizers from PrismJS.

In order to use PrismJS, you must include the following script tag somewhere on the page, pointing to the revelant JS file:

<script src="/path/to/prism.js" data-manual></script>

Note that you won't be able to use their autoloader script, but should rather download your own custom bundle with the necessary languages you wish to support from the PrismJS download page.

It is also important that you include the data-manual attribute on the script tag, to indicate to Prism that it should not parse the page and try to apply any highlights itself — the data-highlight plugin will manage that instead; it will simply use the language tokenizers from PrismJS.

Finally, you must add the Prism compatibility script in your import map, with a module entry for each language that you wish to use. The module names should follow the same format of data-highlight:<lang_code> for each supported language, but all should point to the same script. The language codes must match those defined by Prism, per its supported languages section, in order for the plugin to be able to identify the correct tokenizer.

For example, the following enables support for Javascript and Rust via Prism:

{
  "imports": {
    // ... your other imports
    "data-highlight:js": "https://cdn.jsdelivr.net/gh/regaez/data-highlight@0.1.0/dist/tokenizers/prism.js",
    "data-highlight:rust": "https://cdn.jsdelivr.net/gh/regaez/data-highlight@0.1.0/dist/tokenizers/prism.js",
  }
}

Then it's as simple as using the language code with the data-highlight attribute, like any other:

<pre>
  <code data-highlight:js>
    let greeting = "hello world";
    alert(greeting);
  </code>
</pre>

Which would output:

let greeting = "hello world";
alert(greeting);

There is no need to include the Prism CSS file, as it will not be used.

#Custom tokenizers

In order to create your own custom tokenizer, you must create an ES Module which exports a default function that satisfies the following signature:

type Token = {
  type: string; // The type should map to a CSS ::highlight() pseudo-element name
  start: number; // The index of the first character of the token
  end: number; // The index of the first character AFTER the token has ended
  value: string; // The text slice of the input between the start and end indices
};

export default function(
  input: string, // The element's textContent value
  language: string // The language as specified by the data-highlight attribute key/value
): Token[]

As a tokenizer can return any string value for the type field, you may also need to extend your CSS to handle styling any custom tokens that aren't covered by the plugin's provided styles, or your existing stylesheet. See the custom themes section for more information.

#Custom themes

The plugin provides an example stylesheet with some pre-defined styles assigned to the most common token types. You can either import this on your page via a <link> tag (see "Getting started"), or better yet: simply copy the contents into your own stylesheet and tweak it to your suit your needs/preferences.

When working with third party tokenizers, or when building your own custom ones, you may need to extend the CSS rules to include tokens that are not supported out-of-the-box. You can accomplish this by simply adding new ::highlight pseudo-element selectors with the appropriate token name, which apply any necessary CSS properties.

For example, to style the token type foo to have blue text:

::highlight(foo) {
  color: blue;
}

Note that only a small subset of CSS properties can be applied to the ::highlight pseudo-element.

data-highlight attribute