xr: A new weapon for your regex arsenal


There is a new tool for grokking Emacs regular expressions. It is called xr[1], and it converts string regexp to rx[1] notation.

xr is very new (even as of late 2020). Unless you have been tracking GNU ELPA in the last 2 years, it is unlikely that you are familiar with it. So, don’t dismiss this article, and read on …

What is xr?

This is how README[1] introduces xr.

xr – Emacs regexp parser and analyser

XR converts Emacs regular expressions to the structured rx form, thus being an inverse of rx.

It can be useful for:

  • Migrating existing code to rx form
  • Understanding what a regexp string really means
  • Finding errors in regexp strings

Test driving xr with org-emph-re

To test drive xr, I have chosen org-emph-re as a sample input. The value of org-emph-re is used by Org mode to match text which is emboldened, italicized, underlined etc. I have chosen this regular expression as a sample input, because it is at once very familiar to most Emacs Org mode users, and is very intimidating to deserve a powerful tool like xr.

How does org-emph-re look?

This is how org-emph-re looks like.

"\\([-[:space:]('\"{]\\|^\\)\\(\\([*/_+]\\)\\([^[:space:]]\\|[^[:space:]].*?\\(?:\n.*?\\)\\{0,1\\}[^[:space:]]\\)\\3\\)\\([-[:space:].,:!?;'\")}\\[]\\|$\\)"

Needless to say, it is quite intimidating.

How does org-emph-re look when piped through xr?

To make the above regular expression a bit more comprehensible, run it through xr as below

M-x pp-eval-expression RET (xr org-emph-re 'verbose) RET

This is what you will see

(seq
 (group
  (or
   (any "\"'({-" space)
   line-start))
 (group
  (group
   (any "*+/_"))
  (group
   (or
    (not space)
    (seq
     (not space)
     (*\? not-newline)
     (repeat 0 1 "\n"
             (*\? not-newline))
     (not space))))
  (backref 3))
 (group
  (or
   (any "!\"'),.:;?[\\}-" space)
   line-end)))

“Thanks, but no Thanks”. Above rx form still makes my head spin!

Above rx-form of regular expression is still intimidating, but is more approachable than the original string version.

To make better sense of the rx form, you need to juxtapose it with docstring of org-emph-re.

;; `org-emph-re' - Regular expression for matching emphasis.

;; After a match, the match groups contain these elements:

;; ↓ 0 The match of the full regular expression, including the
;;   characters before and after the proper match
(seq

 ;; ↓ 1 The character before the proper match, or empty at beginning of
 ;; line
 (group
  (or
   (any "\"'({-" space)
   line-start))

 ;; ↓ 2 The proper match, including the leading and trailing markers
 (group

  ;; ↓ 3 The leading marker like * or /, indicating the type of
  ;; highlighting
  (group
   (any "*+/_"))

  ;; ↓ 4 The text between the emphasis markers, not including the
  ;; markers
  (group
   (or
    (not space)
    (seq
     (not space)
     (*\? not-newline)
     (repeat 0 1 "\n"
             (*\? not-newline))
     (not space))))

  ;; ↓ 3' The trailing marker like * or /, indicating the type of
  ;; highlighting
  (backref 3))

 ;; ↓ 5 The character after the match, empty at the end of a line
 (group
  (or
   (any "!\"'),.:;?[\\}-" space)
   line-end)))

Conclusion

The juxtaposition of rx-form with human-friendly description in the previous section is deliberate. Over the years, I have realized that to comprehend stuff you cannot dismiss the context.

xr cannot solve all your Emacs regular expression woes. It is merely a tool. But when deployed in conjunction with the context in which a regular-expression appears it can cure most headaches.

Categories gnu

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close