Emacs for your modern document needs: A quick guide to working with PDF, LibreOffice and Microsoft Office files


You can view DVI 1, PostScript 1, PDF 1, OpenDocument 1, 2, Microsoft Office 1, and Djvu 1 files in  Emacs.   But first, you must install a bunch of applications that convert the pages in these documents in to a set of images.  The document converters are essential because Emacs doesn’t support these formats natively.

Overview of Document Viewing in Emacs

The figure below gives a quick overview of document converters that you need, and the Debian packages in which they are distributed.

doc-view

Step 1: Install the Document Converters

Depending on your needs and preferences, install one or more of the following applications.

Applications Debian Packages
gs,ps2pdf,dvipdf ghostscript1
mutool mupdf-tools1
pdftotext poppler-utils1
dvipdfm texlive-binaries1
soffice
unoconv
libreoffice-common1
unoconv1
ddjvu djvulibre-bin1

You will find it useful to consult the earlier diagram to narrow down on the applications and packages you actually need.

Step 2: Install an enhanced menu for DocView

Copy the snippet 1 below to your .emacs, and restart your Emacs.

(with-eval-after-load "doc-view"
  (easy-menu-define my-doc-view-menu doc-view-mode-map "Menu for Doc-View Mode."
    '("DocView"
      ["Switch to a different mode" doc-view-toggle-display :help "Switch to a different mode"]
      ["Open Text" doc-view-open-text :help "Display the current doc's contents as text"]
      "--"
      ("Navigate Doc"
       ["Goto Page ..." doc-view-goto-page :help "View the page given by PAGE"]
       "--"
       ["Scroll Down" doc-view-scroll-down-or-previous-page :help "Scroll page down ARG lines if possible, else goto previous page"]
       ["Scroll Up" doc-view-scroll-up-or-next-page :help "Scroll page up ARG lines if possible, else goto next page"]
       "--"
       ["Next Line" doc-view-next-line-or-next-page :help "Scroll upward by ARG lines if possible, else goto next page"]
       ["Previous Line" doc-view-previous-line-or-previous-page :help "Scroll downward by ARG lines if possible, else goto previous page"]
       ("Customize"
        ["Continuous Off"
         (setq doc-view-continuous nil)
         :help "Stay put in the current page, when moving past first/last line" :style radio :selected
         (eq doc-view-continuous nil)]
        ["Continuous On"
         (setq doc-view-continuous t)
         :help "Goto to the previous/next page, when moving past first/last line" :style radio :selected
         (eq doc-view-continuous t)]
        "---"
        ["Save as Default"
         (customize-save-variable 'doc-view-continuous doc-view-continuous)
         t])
       "--"
       ["Next Page" doc-view-next-page :help "Browse ARG pages forward"]
       ["Previous Page" doc-view-previous-page :help "Browse ARG pages backward"]
       "--"
       ["First Page" doc-view-first-page :help "View the first page"]
       ["Last Page" doc-view-last-page :help "View the last page"])
      "--"
      ("Adjust Display"
       ["Enlarge" doc-view-enlarge :help "Enlarge the document by FACTOR"]
       ["Shrink" doc-view-shrink :help "Shrink the document"]
       "--"
       ["Fit Width To Window" doc-view-fit-width-to-window :help "Fit the image width to the window width"]
       ["Fit Height To Window" doc-view-fit-height-to-window :help "Fit the image height to the window height"]
       "--"
       ["Fit Page To Window" doc-view-fit-page-to-window :help "Fit the image to the window"]
       "--"
       ["Set Slice From Bounding Box" doc-view-set-slice-from-bounding-box :help "Set the slice from the document's BoundingBox information"]
       ["Set Slice Using Mouse" doc-view-set-slice-using-mouse :help "Set the slice of the images that should be displayed"]
       ["Set Slice" doc-view-set-slice :help "Set the slice of the images that should be displayed"]
       ["Reset Slice" doc-view-reset-slice :help "Reset the current slice"])
      ("Search"
       ["New Search ..."
        (doc-view-search t)
        :help "Jump to the next match or initiate a new search if NEW-QUERY is given"]
       "--"
       ["Search" doc-view-search :help "Jump to the next match or initiate a new search if NEW-QUERY is given"]
       ["Backward" doc-view-search-backward :help "Call `doc-view-search' for backward search"]
       "--"
       ["Show Tooltip" doc-view-show-tooltip :help nil])
      ("Maintain"
       ["Reconvert Doc" doc-view-reconvert-doc :help "Reconvert the current document"]
       "--"
       ["Clear Cache" doc-view-clear-cache :help "Delete the whole cache (`doc-view-cache-directory')"]
       ["Dired Cache" doc-view-dired-cache :help "Open `dired' in `doc-view-cache-directory'"]
       "--"
       ["Revert Buffer" doc-view-revert-buffer :help "Like `revert-buffer', but preserves the buffer's current modes"]
       "--"
       ["Kill Proc" doc-view-kill-proc :help "Kill the current converter process(es)"]
       ["Kill Proc And Buffer" doc-view-kill-proc-and-buffer :help "Kill the current buffer"])
      "--"
      ["Customize"
       (customize-group 'doc-view)]))
  (easy-menu-define my-doc-view-minor-mode-menu doc-view-minor-mode-map "Menu for Doc-View Minor Mode."
    '("DocView*"
      ["Display in DocView Mode" doc-view-toggle-display :help "View"]
      ["Exit DocView Mode" doc-view-minor-mode])))

Step 3:  Open a PDF file, say GNU Emacs Manual

Now open a PDF file, say GNU Emacs Manual.  As soon as the file is opened,  PDF to PNG conversion process gets triggered.  The manual is good 653 pages long and it may take a while for the whole document to load.  Fret not.  Even when the conversion is in progress you will be able to scroll through those pages in the document which have  already been converted.

Screenshot from 2018-08-09 12-56-02

Step 4: Ensure that you have properly installed the enhanced menu

The enhanced menu looks like this

Screenshot from 2018-08-09 13-04-42

Instead of seeing the menu above, are you seeing a menu like the one below?

Screenshot from 2018-08-09 16-56-56

If yes, you are still using the stock menu. Please install the enhanced menu mentioned in previous step.

Step 5:  Explore the Navigate Doc menu to browse the document

Navigate Doc sub-menu contains all the different manners in which you can browse through the document. You can move line by line, scroll screenful at a time, move pag by page or jump to any given page.

For the sake of this discussion, open the 100th page using Goto Page.

Screenshot from 2018-08-09 13-16-25

Screenshot from 2018-08-09 13-17-42

Screenshot from 2018-08-09 13-18-39

Step 6: Explore the Adjust Display menu: Set a slice and choose a fitment

The Adjust Display sub-menu contains all the different manner in which you can display the document.  You can Zoom in or out, fit the width or height or even the entire page to window.   There is one another interesting option concerning slices. Slicing a document allows you to strip the page margins. This allows more screen estate for the document content.

For the sake of  this article, you will slice the document to it’s bounding box and fit it to window width.  This, is my preferred way to read documents in Emacs.

Screenshot from 2018-08-09 13-19-01

Screenshot from 2018-08-09 13-19-46

Screenshot from 2018-08-09 13-20-59

In the previous screenshot, note that there is little or to no margin.  Compare this screenshot, to the earlier screenshot at the end of the previous step.

Step 7: Search through the document:  Remember the quirk

You  can search for a given text in the document and jump to the pages where  there is a hit.  Before, you go ahead with searching, you need to be aware of a little bit of a quirk.  You search in two steps as detailed below.

Step 7.1: Intitiate a new search

In step 1, you initiate the search by providing a new search string, say, doc-view-mode.  If this is the first time you are searching through the document, you need to wait a while for the search prompt to appear.  This is because Emacs is busy extracting the underlying text content via pdftotext. Once the search is complete, you will see a message that reports the number of hits.

That is all. Nothing else will happen.

Emacs wouldn’t take you to the first hit, as you would have come to expect from your prior experience with other document viewers.

To actually visit the hits, you have to proceed to step 2.

Screenshot from 2018-08-09 14-07-49Screenshot from 2018-08-09 14-09-04

Step 7.2: Cycle through the hits

In step 2, you browse through the hits.  Remember,  Emacs will take you not to the exact line of the hit. It will only take you to the page where there are one or more hits.  What this means is that in any given page of hit, there are likely to be two or more lines that have hits.

Screenshot from 2018-08-09 14-10-01

Screenshot from 2018-08-09 14-10-35

In the screeshot above, you are actually on a page that has search hits.  And the only hint of this fact  is the tooltip.  To verify that there are indeed  hits on that page, you can fit the whole page to the window, and manually search for the hits.  Ha Ha Ha!

Screenshot from 2018-08-09 14-21-45

While searching through the document, very frequently you will get a bit dis-oriented and lose your bearings.  In that case, you can bring up the tooltip and regain your senses.

Screenshot from 2018-08-09 14-11-35

Step 8: Look at the text underlying your document

In the previous step, you learned that when you search through a PDF document, you are actually searching through a bare-bones text underlying the document.  You can view the text-only content of your document, devoid of any formatting or embedded media files, as shown below.

Screenshot from 2018-08-09 17-47-59

In the screenshot below, you see the barebones ascii text that lies beneath the PDF document.  Try switching back to DocView mode.

Screenshot from 2018-08-09 17-49-08

Now, you see the text that constitutes the PDF markup.

Screenshot from 2018-08-09 17-51-07

Switch to DocView mode once again, as in the previous step, and you will be back to viewing the  PDF file in it’s original glory.

Step 9: Display your document in  a different mode: Handy for one-off editing of OpenDocument files

In the previous step, you learned how to view the PDF markup.  You or I cannot edit the PDF markup.  So that view is pretty useless for non-specialists, which is most people.  However, this feature is pretty handy for one-off editing of OpenDocument files.

For the sake of this article, download a sample ODF file, say the OASIS Spec 1 and try switching to a different mode.

Screenshot from 2018-08-09 14-43-35

You will see that the document opens in archive-mode and you see all XML and media files that make up the document.  If you are comfortable editing OpenDocument XML, you can use this mode for a quick one-off editing of your ODT files,  presumably those less-satisfactory ones created by org-mode1.

Screenshot from 2018-08-09 14-44-08

Once you have edited  the OpenDocument file to your  heart’s content, you can view the original ODT file with your specific modifications.  You may have to re-open the file in doc-view-mode though.

Screenshot from 2018-08-09 14-45-31

Step 9: Customize the Doc-View Mode: Increase the resolution, may be

Sometimes, when you open your documents, you will find that it is very blurry.  For example, when I open Romeo and Juliet (The Illustrated Shakespeare, 1847) 1, I realize thathe result hurts my eyes. In such cases, you can increase the default resolution.

Screenshot from 2018-08-09 15-03-27

Screenshot from 2018-08-09 15-04-51

Once you have increased the resolution, you will note that the changes don’t take effect immediately. This is so even if you restart your Emacs with the new configuration and re-open the file. In this case, delete the image cache.

Screenshot from 2018-08-09 15-10-54

If you have lots of documents that are already in your cache, and you are happy with their resolution, you may want to delete the cache of just this document. You can locate the image cache, as shown below.

Screenshot from 2018-08-09 15-06-15

Screenshot from 2018-08-09 15-06-55

Screenshot from 2018-08-09 15-27-01

Step 10: Customize the Doc-View Mode: Turn on continuous mode, may be

When you are reviewing a document for accuracy, as opposed to just reading it, you are most likely to be navigating the document line-by-line.  In that case, when you move past the last line  (or the first line) of the document, the display stays put on the current page.  If you are like me, you will be bothered by this behaviour.  In that case, you may want to turn on the continuous mode, and save it as default. Once this is done, when you move past the last line (or the first line) of the document, the display would move to the next (or previous)  page.

Screenshot from 2018-08-09 20-39-54

Is it mudraw or mutool draw?

If you are among a selective few, who prefers mupdf-tools to ghostcript, you may have to arrange for the fact that mudraw no longer exists and should be replaced with mutool draw.

Screenshot from 2018-08-09 20-56-46

So, you  may want to create a shim as below.

$ which mudraw
/usr/local/bin/mudraw
$ cat `which mudraw`
#!/bin/sh
mutool draw "$@"

Concluding Words

In the foreword to this article, you learned that Emacs has no native support for many modern document formats.  This lack of support implies the following:

  1. Rendering is slow: The viewing of these files, for the very first time, is terribly slow when compared to the viewing in native applications.  However, for second and subsequent viewing, the document is rendered very fast because of caching.
  2. Search for text is less than perfect: Text search in PDF documents cannot pinpoint the exact line where there is a hit. i.e., If you were to search for a specific text in a PDF file, you would be taken to the page where there is a hit.  It is up to you figure out where exactly the matching line is in the page. This process is nothing short of searching for a needle in a hay stack.
  3. No outlines, hyperlinks, or custom annotation: You miss a lot of sophisticated functionalities.  For example, when you are working with PDF files, you cannot see it’s outline, or jump to the links, or add custom annotation.
  4. Editing the document is too crude: You cannot edit these files in an intelligent manner with Emacs.  For example, if you were to edit OpenDocument files with Emacs, you wouldn’t get the same sort of ease and sophistication that comes with an application like LibreOffice.  Instead, you would be expected to understand and edit the underlying XML files.

If you have meticulously followed the steps outlined in this article, you would have experienced the above limitations first hand. Despite the limitations, you can get a lot done with current functionality.  However, if you wouldn’t mind venturing in to packages that are not part of the official Emacs distribution, you may want to try out pdf-tools 1. It overcomes many of the above limitations and provides an experience that is an order of magnitude better than the official Emacs.

Advertisements
Categories gnu

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close