Update on Microsoft Word Support, Part 1

It’s been a long time since I posted about this so I wanted to give a brief update on where things are at regarding file format support and the upcoming 1.1.0 release of UX Write. I posted back in October that I had decided to support both .docx (Microsoft Word) and .odt (OpenOffice) formats, in addition to the current native HTML format.

As it turns out, the amount of work involved in getting .docx support operational has been much greater than I expected. I’ve been focusing primarily on that over the last couple of months (as it’s the most popular of the two), and have decided that I will release 1.1.0 with .docx support only, and add ODF at a later point in time. While I could delay the release further until I have both file formats working, I don’t think there’s any benefit in doing so. There’s a lot of people waiting on .docx support, so I want to make this available both for the benefit of existing users and because it’s important for the commercial success of the app.

What Does Microsoft Word Support Mean?

It’s tempting to pose the question “Is this app compatible with Microsoft Word?”. It would be nice to give a straight “yes” or “no” answer, but unfortunately the situation is more complicated than that. Microsoft Word has literally hundreds of features, and no other app has or will ever support all of them. All other word processors which work with one or more of Word’s file formats support only a subset of these features. This is typically ok however, as most people only use a small, common set of features rather than every single one.

I’m focusing on supporting only the most important and widely-used features, from the viewpoint of professional, academic, and technical writing — for the most part matching the set of document structure features common in LaTeX (e.g. sections, figures, tables, references), as well as almost all the formatting properties that can be expressed in HTML and CSS. Since all the other word processors for the iPad address the former either very poorly or not at all, UX Write 1.1.0 will likely have the most feature-rich support for .docx on the platform upon its release.

Note that there is a big difference between .docx and .doc files – in fact they are two entirely separate file formats. UX Write will only support .docx, which is a modern, well-documented, XML-based file format that is (relatively) easy to read and write. The older .doc format is a much more complicated binary file format that would be a great deal more difficult to support. I’m aware that some people still use it, and while I could support it with perhaps another six months of work, I feel that time is better spent on improving other aspects of the app. You can easily convert from .doc to .docx (and back again) using any recent version of Word.

Maintaining Document Integrity

The normal way for a word processor to support third-party file formats is to provide import and export facilities. Import converts the document from the third-party format to the program’s native format, and export converts back the other way.

The problem is that this is a lossy process – it is rare for the native format to support a superset of the third-party format’s features, and converting between the two inevitably means you’re going to lose some structural or formatting information along the way. If you look around the support forums for Pages and similar apps you will find many instances of people complaining they’ve lost formatting or other information like footnotes when going back and forth between their iPad and Mac or PC.

UX Write instead uses a technique called bidirectional transformation to interface with third-party file formats such as .docx. Although its native format is HTML, which doesn’t support all the features of Word (such as embedded spreadsheets), and you won’t be able to see or edit these unsupported features, they won’t be lost when you save your document. This is because instead of the saving process completely replacing the original document with a new one converted from HTML, it takes the original document, works out what has changed in the HTML version, and makes the corresponding changes to the original document. This has the following implication:

Any formatting or other features of .docx that UX Write doesn’t support will be kept in-tact on save. You will be able to safely move back and forth between UX Write on your iPad and Microsoft Word on your PC or Mac without losing data.

To understand why this is, let’s have a look at how the traditional import/export process works:

Import - Export

Figure 1: Import/export

Figure 1 illustrates an example of a document containing headings, text, footnotes, and page numbers. When the document is imported, the headings and text (which HTML supports) are maintained, but the footnotes and page numbers (which HTML doesn’t support) are lost. The user makes some changes to the document and then saves it. During save, the export process deletes the original document and replaces it with a new version re-created entirely from the HTML version. Because the latter did not contain the footnotes or page numbers, these are also absent from the saved version.

Bidirectional Transformation

Figure 2: Bidirectional transformation

Figure 2 illustrates the bidirectional transformation process used in UX Write. When the document is first opened, it is converted from .docx to HTML. This is the same process as before, and maintains the headings and text, while losing the footnotes and page numbers (HTML has no notion of separate pages in a document). After the user has edited the file, they save it.

Now here’s where the difference comes in: Instead of completely re-creating the document, bidirectional transformation takes both the original .docx document and the modified HTML document as input, and updates the original based on the changes that have been made to the HTML version. Any changes, additions, or deletions to the headings or text will be applied, but the footnotes and page numbers will be left completely untouched. The user syncs their document with their Mac or PC and opens it in Microsoft Word, and is happy to see that everything is still in place.

Incremental Feature Support

Now that you understand how bidirectional transformation works (trust me, this is a very much simplified explanation of events!), you can see how it is possible for UX Write to support only some features of Microsoft Word, without losing all the other information in your document. In particular, it permits features to be incrementally supported. For example, the first release in the 1.1.x series won’t support footnotes, but this is something I’ll be working on for future versions. In the meantime, while you won’t be able to view or edit footnotes, they’ll at least still be there when take your document back to Microsoft Word on your PC or Mac.

I’ll be putting up another post soon which details the features that are currently supported in the development version and will be available when 1.1.0 is released, as well as others that will be added in the future.

8 thoughts on “Update on Microsoft Word Support, Part 1

  1. Creating and editing footnotes – the very important and necessary function in the UX Write.

    • Yep, this is high on my priority list.

      The main technical challenge relating to this (and why they’re not already supported) is that HTML itself doesn’t actually support footnotes (and even when working with format like .docx, UX Write converts the document to HTML and back). So what needs to be done is to “fake” them by some mechanism such as having a bulleted list at the bottom of the document, with links referencing those items from the text. This is something I’ll be able to with a user interface is similar to what you have in Word (via some custom javascript code), and when you’re working with a .docx file (and later, .tex and .odt), the footnotes will be stored in the correct manner for that file format.

      The second challenge is printing. WebKit isn’t particularly good at doing print output (it’s designed for web only), and doesn’t have any (built-in) way of doing footnotes, headers, footers, automatic page numbering etc. What I’m going to try and do in this regard is to integrate a copy of LaTeX into UX Write, so you’ll be able to export to PDF using that. LaTeX is specifically designed (and well known for) producing very high-quality print output, and handles all these features very well. If I’m able to get this working (and I suspect I likely will – it’s been done before – http://texpadapp.com/2012/09/19/latex-ported-to-ipad/), then you’ll be able to produce publication-ready print output directly from UX Write.

  2. My question is in regards to the following;
    “Note that there is a big difference between .docx and .doc files – in fact they are two entirely separate file formats. UX Write will only support .docx, which is a modern, well-documented, XML-based file format that is (relatively) easy to read and write. The older .doc format is a much more complicated binary file format that would be a great deal more difficult to support. I’m aware that some people still use it, and while I could support it with perhaps another six months of work, I feel that time is better spent on improving other aspects of the app. You can easily convert from .doc to .docx (and back again) using any recent version of Word.”

    I use Microsoft office for Mac 2011 (recently upgraded from ’04, 08 and had habit to save as doc vs docx) … so with 99% saved as doc files …I’m looking for an app to import word files to ipad retina so I can travel without my Mac book and edit existing doc and create new doc, but you say that “you only support docx”.Question: will your app work for my existing files?
    Thank you,
    Len

    • Unfortunately the answer is no, it won’t work with .doc files.

      Basically I’m in the difficult position of having to decide how to allocate a limited amount of resources (i.e. my time) in a way that produces the most useful product. I could add support for .doc as well, but given the format’s complexity (relative to .docx) this would involve a huge amount of time and effort. Thus, support for .doc would come at the cost of other features like find & replace, spell checking, ePub export, change tracking, footnotes, headers/footers etc… (or at least having them delayed by six months or more). I’d happily support it if it was a straightforward task, but my main focus is on providing as many useful features as I can.

      The reason why Microsoft created .docx was there was a demand from the market for a file format that was easy for software other than Microsoft Word to support. In the past, MS had a terrible reputation for interoperability and not making their specifications public. By creating .docx they largely mended their ways and did the right thing by the industry, so we now have a default file format for Word which is much more practical for other applications to work with.

      I understand that many people still use .doc out of habit and because they already have a large collection of documents in that format. However, there isn’t really much practical justification for using the format any more – the only exception is if you are exchanging files with people who still use old versions of word (Office 2004 and earlier on the mac, Office 2003 and earlier on windows). I recommend moving to .docx – the experience when using Word will be exactly the same, and you’ll have documents that you can seamlessly exchange with other apps like UX Write, and are more “future proof” for other word processors that come along in the future.

      If you absolutely require .doc support, I recommend Pages, QuickOffice, or Documents to Go. They all support this format, but lack many of the features of UX Write. If you’re not tied to .doc, you can easily convert individual documents using save as in Word – doing so will maintain all of the text and formatting information, and when using Word you won’t notice any difference.

Comments are closed.