η΅’ε γγγγ»γγ: colorful letters
A remark plugin to add "syntax-highlighting" for Japanese grammar as a code fence language. It uses kuromoji to tokenize Japanese text into it's component parts of speech.
Note
This library is distributed only as an ESM module.
npm install @saeris/remark-ayaji
or
yarn add @saeris/remark-ayaji
Using this library will depend on your particular application or framework. Below is a bare-bones example to test it in Node.
Node:
import remark from "remark";
import remarkParse from "remark-parse";
import remarkAyaji from "@saeris/remark-ayaji";
import remarkRehype from "remark-rehype";
import rehypeStringify from "rehype-stringify";
const result = await remark()
.use(remarkParse)
.use(remarkAyaji, { dict: `path/to/dictionaries/directory/` })
.use(remarkRehype)
.use(rehypeStringify)
.process(markdown);
console.log(result.tostring());
CSS:
This library also distributes an optional CSS file to get you started with syntax highlighting. Consuming it will largely depend on your particular application's conventions. In frameworks such as Nextjs or Astro, importing the file in your root layout should be all you need.
import "@saeris/remark-ayaji/theme.css";
Highlighting works in a similar manner to syntax highlighters for code, using code fences with a jp
language annotation:
```jp
ζ₯ζ¬θͺγεγγγΎγγ
```
This will compile to the following HTML:
<p><span class="noun">ζ₯ζ¬θͺ</span><span class="particle">γ</span><span class="verb">εγγ</span><span class="auxiliary-verb"γΎγ</span>γ</p>
type PoS =
| "prefix"
| "pronoun"
| "adnominal"
| "noun"
| "adjectival-noun"
| "adjective"
| "particle"
| "conjunction"
| "interjection"
| "adverb"
| "verb"
| "auxiliary-verb";
interface Options {
dict: string;
furigana?: boolean;
include?: PoS[];
exclude?: PoS[];
}
This plugin can be configured both globally via an options object supplied alongside where the plugin is imported and used, or locally via the code fence meta after the language attribute in a comma-separated, JSON-like syntax. Examples can be found below.
This plugin relies on various dictionary files for the tokenizer to work. While kuromoji includes these files, they cannot automatically be loaded and must be done so manually. Depending on your environment, you may need to copy the dict
directory of @saeris/kuromoji
to somewhere in your project. For example, if it exists in your project root, you can configure the plugin like this:
// ...
.use(remarkAyaji, { dict: path.join(process.cwd(), `./dict`) })
// ...
These dictionary files are from the mecab project, please see NOTICE for license details.
Adds furigana to words containing kanji in the Denden Furigana markdown syntax: {ζ₯ζ¬θͺ|γ«γ»γγ}
. By combining this plugin with remark-denden-ruby, this will produce <ruby>
text annotations to further aid in readability.
global config:
// ...
.use(remarkAyaji, { dict, furigana: true })
.use(remarkRuby) // plugin order is important!
// ...
local config:
```jp furigana: true
ζ₯ζ¬θͺ
```
result:
<p>
<span class="noun"
><ruby>ζ₯ζ¬θͺ<rp>(</rp><rt>γ«γ»γγ</rt><rp>)</rp></ruby></span
>
</p>
Warning
Because of the nature of morphological analysis, which is the means by which kuromoji handles tokenization, the best match for the reading of any particular kanji or word containing kanji will be based on the most common usage of that kanji. This means that certain words, most often proper nouns like names, will have a false-positive reading match.
An array of parts of speech to include in highlighting.
global config:
.use(remarkAyaji, { include: ["noun"] })
.use(remarkRuby) // plugin order is important!
local config:
```jp include: ["noun"]
ζ₯ζ¬θͺγεγγγΎγγ
```
result:
<p><span class="noun">ζ₯ζ¬θͺ</span>γεγγγΎγγ</p>
An array of parts of speech to exclude in highlighting. This will take precidence over any parts of speech specified in include
regardless of whether it is enabled globally or locally.
global config:
.use(remarkAyaji, { exclude: ["noun", "auxiliary-verb"] })
.use(remarkRuby) // plugin order is important!
local config:
```jp exclude: ["noun", "auxiliary-verb"]
ζ₯ζ¬θͺγεγγγΎγγ
```
result:
<p>ζ₯ζ¬θͺ<span class="particle">γ</span><span class="verb">εγγ</span>γΎγγ</p>
Released under the MIT license Β© Drake Costa