In the past months I have been working hard on writing a book on Design Patterns and refactoring.
Publishing Word and using DocumentShare
To make the book as usable as possible I decided to go for three publication forms: HTML, ePub and PDF. Each have their benefits and drawbacks and each therefore has a specific reason to be used.
In order to publish the base content properly I decided to grab one of my older concepts: DocumentShare, and give it a new life, publishing both HTML and ePub.
Below you find some of the nice highlights on the results from the new DocumentShare tool.
For the published result, go here. As I did not make an effort to find good sets of alternatives for the fonts I like, you might see slightly different results than shown below in the screen shots.
This was something I added only recently. As I was pasting some code in my document, I decided it would be nice to format it properly for the web.
The tags are needed to indicate to DocumentShare that there is text that needs specific formatting. In this case as code.
The code to make this formatting happening is relatively simple.
It knows some basic keywords used in code and additionally takes all text marked as Italics (“MyClass” in this screenshot) to be special keywords to be presented as such.
Remarks are recognized by the “//” start and the </p> ending. I did not build in recognition for “/*” and “* ” type of remarks, but that might follow when I build a specific parser for “Code to HTML”.
Links and chapter names
Chapters are recognized by the “<h>” (header) tag. So all I do is using the HTML published from Word, clean out all the Word-specific tags and references, produce almost crispy clear HTML and then break that HTML up in blocks, with the “<h” tag as the breaking point.
To create a stable header-tag/header ID to publish the files I add an extra tag into the document. This tag is shown in red below.
The Tag has two main parts: “::Tag” and the tag name: “PAT-ADP”. Additonally you can add the alternate name of the chapter, to be shown when you create a link to that chapter. In this case” “Discussing the Adapter Pattern”.
DocumentShare will take these meta-tags and translate them to something usable, as shown below.
When you create a link to this chapter, in Word you do the following (in red):
The link as it will show up in HTML (and ePub)
Style for tags
As we want to publish the document to PDF as well, we need a way to hide the tags we use for chapters, links, code and indents (not shown in this post). After all, it looks quite weird to have red tags that seemingly have no meaning in your PDF.
And so we use a style for the Tags, which we can give a white font-color when we create the PDF. While the text and tags are still there, they are hidden for the eye.
One of the first things I focused on was the navigation. I wanted this to be in sight all the time and also to cater your interest. So I created three “layers”: the chapter / page you are reading now, the chapter it is part of and the entire document.
It took some fine-tuning to decide how many levels each layer would show in the document structure. In the end I settled for “less is more”.
Deriving the navigation
The navigation is derived from the concrete chapters themselves. As I rip apart the HTML as created by Word, I create an in-memory “database” of each chapter, containing the text of that chapter and the header/title of that chapter.
As I can derive the chapter level (<h1>, <h2>, <h3> and so on) I can deduct where this chapter fits in the total structure, assuming that if a <h2> or <h3> follows on a <h1> it is part of that chapter,adding it to the “parent” and building the tree-structure that way.
Building the pages and including sub-chapters
This structure and approach allows for a lot of freedom in producing navigational lists and subnavigations. It also allows me to use “include subchapters” (see the image under “Links and chapter names” above) when I want a chapter to contain all the subchapters instead of publishing these as separate pages.
By default, DocumentShare assumes I want to include all chapters from level 3 (<h3>). In some cases, however, I already want to do this at level 2. For instance in chapter 1 and part 9 where level 2 is where I start each Design Patterns.
The HTML that comes out of the process
The HTML produced by Word is really dirty and ugly. And it really does not follow any rules of logic. Bullet points and numbered lists, for instance, are not published by Word as organized lists, but as paragraphs with styling. That is also why I started to include meta-tags like
What you will find in each <p> tag are endless style-references and <span> tags taking care of specific formatting you might have selected (or not).
Most of this styling is the result of Word not cleaning up the formatting as you go and type, so when you study the HTML form Word even deeper you will find several “dead” tags enclosing nothing at all and styling and formatting that nothing with some kind of font and color.
Instead, I take that HTML and “nuke” it with regular expressions, taking almost all tags (<img> and <a href> excluded) and strip them from anything and everything they contain. Which leads to relatively clean and almost spartan HTML like shown below:
Tags like <span> and <div> are completely removed, as they usually make no sense at all if you want your HTML to be clean and simple.
To finish the process, I add CSS to style the paragraphs and make cute headers so that everything makes sense and looks like something you might like to read.