Intra-word Emphasis in Org mode using ZERO WIDTH SPACEs; East Asian Language Users please take note


Motivation: Intra-word emphasis in real-world documents

Very often there is a need to export documents with intra-word emphasis, … and if you use East Asian Languages for your work, you have no option but to settle for intra-word emphasis.[1]

A sample document that illustrates various uses of intra-word emphasis

intraword-emphasis-5-sample-document

Intra-word emphasis in Org mode using ZERO WIDTH SPACE, and problems thereof

If you try creating a document like the one see you above using Org mode markup, you would realize that there is no way to create intra-word emphasis. This is because Org mode insists that the text you are emphasising is delimited on either sides by a SPACE character.[1]

Think about it: If you want intra-word emphasis, but introduce a SPACE character within a word to force an emphasis, you no longer have a single word, do you?

The solution to this problem is to use ZERO WIDTH SPACE character, instead of the SPACE character[1].

This solution works admirably, but only in a visual sense.

If you were to export the Org mode document to HTML, ODT or other formats, you will notice that these documents do carry the ZERO WIDTH SPACE unscathed.

Do you really want your exported documents to contain stray ZERO WIDTH SPACE-s?

Getting rid of ZERO WIDTH SPACE: Use org-extra-emphasis-intraword-emphasis-mode that comes with org-extra-emphasis library

The question now is …

Is there a way to strip a ZERO WIDTH SPACE used for intra-word emphasis from the final HTML, LaTeX, and ODT documents?

The answer is:

Use the org-extra-emphasis-intraword-emphasis-mode that comes with org-extra-emphasis library.

Rest of the article will outline how you can accomplish intra-word emphasis in Org mode documents.

A note about the org-extra-emphasis library

org-extra-emphasis enhances Org mode by introducing a set of 16 new emphasis markers.[1] It also supports producing documents with intra-word emphasis using org-extra-emphasis-intraword-emphasis-mode.

When you put your Org mode buffer in org-extra-emphasis-intraword-emphasis-mode,

  1. ZERO WIDTH SPACE characters pop out at you; you no longer have to worry about ZERO WIDTH SPACE characters sneaking in to your final manuscript.
  2. ZERO WIDTH SPACE charcters are stripped from HTML, LaTeX, ODT and other documents that you may produce from out of your Org file.

Demonstration of intra-word emphasis using org-extra-emphasis-intraword-emphasis-mode

STEP 1: Create a sample Org document with intra-word emphasis

Copy the Org snippet below to a a file, say intraword-emphasis.org.

Satisfy yourself that the snippet uses ZERO WIDTH SPACE-s to achieve intra-word emphasis.

Sample Org snippet that uses ZERO WIDTH SPACE to achieve intra-word emphasis

1. Intraword markups are a norm, rather than an exception, in East Asian languages.

    *北京*​和​*上海*​是直辖市。

2. Intraword markup is sometimes used to emphasise the initial few letters that make up an /acronym/.

    - NATO :: ​*N*​orth ​*A*​tlantic ​*T*​reaty ​*O*​rganization
    - Scuba :: ​*S*​elf ​*C*​ontained ​*U*​nderwater ​*B*​reathing ​*A*​pparatus
    - Laser :: ​*L*​ight ​*A*​mplification by ​*S*​timulated ​*E*​mission of ​*R*​adiation
    - GIF :: ​*G*​raphics ​*I*​nterchange ​*F*​ormat

3. Intraword markup can be used to break down words in to /syllables/.

    - !!fe!!​!@male!@
    - !!bi!!​!@cy!@​!%cle!%
    - !!in!!​!@ter!@​!%est!%​!&ing!&

4. Intraword markup is used as /pronunciation guides/ in dictionaries

    | IPA  | Examples        |
    |------+-----------------|
    | =ʌ=  | c​*u*​p, l​*u*​ck   |
    | =ɑ:= | ​*a*​rm, f​*a*​ther |
    | =æ=  | c​*a*​t, bl​*a*​ck  |
    | /    | <               |

STEP 2: Export the Org file to HTML, and examine its contents; you will see ZERO WIDTH SPACE characters

Export the above Org file to a temporary HTML buffer with C-c C-e C-b h H.

Examine the contents of the HTML file in M-x glyphless-display-mode.

You will see that the HTML file has ZERO WIDTH SPACE characters.

Intra-word emphasis with ZERO WIDTH SPACE leaves ZERO WIDTH SPACE in the output

intraword-emphasis-1-problems

STEP 3: Download org-export-emphasis

Download org-export-emphasis.el and put it somewhere in your load-path.

STEP 4: Configure your Emacs to use org-export-emphasis

Add the following Emacs Lisp snippet to your .emacs, and restart Emacs.

(require 'org-extra-emphasis)
(set-register ? "​")

(custom-set-variables
 '(org-extra-emphasis t)
 '(org-extra-emphasis-intraword-emphasis-mode t))

(cond
 (nil ; flip this to t, if you want to hide emphasis markers and ZERO WIDTH SPACE
  (custom-set-variables
   '(org-hide-emphasis-markers t)
   '(org-extra-emphasis-zws-display-char nil)))
 (t
  (custom-set-variables
   '(org-hide-emphasis-markers nil)
   '(org-extra-emphasis-zws-display-char ?\N{SPACING UNDERSCORE}))))

(custom-set-faces
 '(org-extra-emphasis ((t nil)))
 '(org-extra-emphasis-01 ((t (:background "#ffff00"))))
 '(org-extra-emphasis-02 ((t (:background "#efb8be"))))
 '(org-extra-emphasis-03 ((t (:background "#f1bc81"))))
 '(org-extra-emphasis-04 ((t (:background "#99d76b")))))

STEP 4: Export to HTML again, but with org-extra-emphasis-intraword-emphasis-mode enabled; you should see no ZERO WIDTH SPACE characters in the HTML file

Open the file intraword-emphasis.org, and ensure that it is in org-extra-emphasis-intraword-emphasis-mode.

Notice that the ZERO WIDTH SPACE=s, are rendered as =UNDERSCORE, and appear in red. The red color for ZERO WIDTH SPACES ensures that you don’t mistake these UNDERSCORE-s for the underline emphasis marker.

Org buffer in org-extra-emphasis-intraword-emphasis-mode, with Emphasis markers and ZERO WIDTH SPACE visible

intraword-emphasis-3-rendering-1

Export the file intraword-emphasis.org to a HTML buffer with C-c C-e C-b h H, or a HTML file with C-c C-e h o.

Put the resulting buffer or file in M-x glyphless-display-mode.

Satisfy yourself that there are no ZERO WIDTH SPACE characters in the final output.

Intra-word emphasis with ZERO WIDTH SPACE, but in org-extra-emphasis-intraword-emphasis-mode leaves no ZERO WIDTH SPACE in output

intraword-emphasis-2-solution

STEP 5: org-extra-emphasis allows you to turn off display of ZERO WIDTH SPACE

I prefer to edit my Org files in visible-mode.

If you prefer to edit your Org files with emphasis markers hidden away, you may also want to hide
away the ZERO WIDTH SPACE markers.

org-extra-emphasis allows you to turn off display of ZERO WIDTH SPACE. Read the configuration script for how to achieve this.

Org buffer with Emphasis markers and ZERO WIDTH SPACE hidden

intraword-emphasis-4-rendering-2

A note on syntax of Emphasised Text in Org mode buffers org-emph-re

The statement that an emphasised text must be surrounded by SPACE-s is not entirely accurate. But it is a good first-order approximation to what Org mode expects.

To get an idea of what characters may lead or follow an emphasised text, one needs to examine the org-emph-re variable.

Down below you see the typical value of org-emh-re (in rx format), when you request multi-line emphasis.1

Even if you aren’t comfortable with reading Emacs Lisp, you can make a good guess of what set of characters are permitted to lead and follow an emphasised text.

Note the presence of space in the leading and trailing set. Here space refers not just to what we conventionally understand as the SPACE character, but stands for any character with whitespace syntax. Since ZERO WIDTH SPACE character is a whitespace character, one can use ZERO WIDTH SPACE in lieu of a SPACE character, for separating runs of text that have different emphasis.

(seq
 (group
  (or
   (any "\"'({-" space)
   bol))
 (group
  (group
   (any "*+/_"))
  (group
   (or
    (not space)
    (seq
     (not space)
     (*? nonl)
     (repeat 0 1 "\n"
             (*? nonl))
     (not space))))
  (backref 3))
 (group
  (or
   (any "!\"'),.:;?[\\}-" space)
   eol)))

Conclusion

I don’t use intraword emphasis much.

But I am glad that there is an off-the-shelf solution to achieve
intraword emphasis, whenever I need it.

1 thought on “Intra-word Emphasis in Org mode using ZERO WIDTH SPACEs; East Asian Language Users please take note

  1. thx a lot, i will share article to chinese biggest emacs bbs: emacs-china.org

    Like

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close