While historically, it’s been difficult at best to create print-quality PDF books from markup alone, CSS3 now brings the Paged Media Module, which targets print book formatting. “Paged” media exists as finite pages, like books and magazines, rather than as long scrolling stretches of text, like most websites. CSS3 allows us to style text, divide it into book pages, and set the page structure as a whole. You can dictate the size of the book, header and footer content, how to display cross references and tables of contents, whether to add guides and bleeds for commercial printing companies, and more. With a single CSS stylesheet, publishers can take XHTML source content and turn it into a laid-out, print-ready PDF. You can take your XHTML source, bypass desktop page layout software like Adobe InDesign, and package it as an ePub file. It’s a lightweight and adaptable workflow, which gets you beautiful books faster.
XML, XSL, XHTML, and PDF processors
As the publishing industry moves toward digital-centric workflows, there’s a need for scalability—repeatable processes and workflows that work at small and large scales. Creating a well-formatted printed book is no longer enough; publishers often need to release several different formats for every book: print, ePub for the iPad, Nook, etc., .mobi for the Kindle, and so on. The hardest jump is the one from print to ePub—you need to put plain text, often from multiple documents or text flows, and non-linear elements such as images, into a cleanly-tagged linear flow, and package it with a table of contents and instructions on how to tie together the various files that make up the book (see the ePub Wikipedia page to learn more about the extra special sauce that’s part of the ePub format). Even InDesign’s ePub output still needs a lot of extra cleanup after the fact, because of the non-linear nature of print page design.
For many years, XML has been one way to achieve a scalable multi-destination publishing model. XML offers a structured and standardized way to tag book content. It converts easily to XHTML, which is the foundation for digital books. It also comes with XSL-FO, which is markup that converts to print-quality PDF layout. XSL-FO has been both the gateway to XML-source publishing, and a major roadblock: it’s a powerful language for structuring pages and formatting XML files, but it’s also intricate, unapproachable, and not very well known. However, by using XSL-FO and a PDF processor like Antenna House or Prince to read the markup, publishers can use a single XML source to flow neatly into XHTML and ePub and also to produce fully laid-out, print-quality PDFs.
With the combination of major PDF processors and paged media features in CSS3, XML- and XHTML-based publishing can move away from XSL-FO to tap the vast and talented web design community. PDF processors Antenna House 6.0 and Prince 8.0 come with built-in CSS support, along with a slew of their own CSS extensions. These processors read tagged files and convert them to PDF using user-supplied stylesheets. The paginated PDF you get uses the same extensive CSS available for web design, in addition to the specialized CSS3 features added just for paged media, like text strings, cross references, and printer marks. [1]
Cost is a factor in adopting this kind of workflow. The PDF processor is the biggest upfront cost beside the initial stylesheet development. As of this writing, Antenna House costs $1,700 for a single user license, or $7,000 for a server license. Prince’s licenses are substantially less: $495 for a single user, or $3,800 for a server license. But compared to the ongoing cost of desktop page layout, a single upfront payment to install a PDF processor becomes a viable option. (Prince offers a demo version that watermarks the first page of each PDF but is otherwise fully functional. It’s a good way to experiment and evaluate the workflow.)
The open source command-line tool xhtml2pdf is built on python and can convert html to PDF for free, however the CSS support is much less robust than the for-pay tools, especially for CSS3 paged media features. Download the source code from GitHub. Here are some notes I whipped up after playing with xhtml2pdf for an hour.
Building a book
The new CSS3 features come from the Paged Media Module and the Generated Content for Paged Media Module (GCPM). I used the latest working draft and the latest editor’s draft of the Paged Media Module to develop my stylesheets. The spec is fairly stable and has entered the last call period (meaning the working group feels pretty good about it and is looking for final review before they recommend advancement). They’re still editing and it’s likely that they’ll release another Last Call Working Draft to finalize changes made during this review period.
The first step when working with print documents is to set up your page structure using the @page
element. This is akin to master pages in print layout, through which you can set the trim size (i.e., page dimensions), borders, running headers and footers, fonts, and so on—basically anything that you want to appear on every page. And of course, you can still use cascades. For example:
@page {
size: 5.5in 8.5in;
}
This code sets the trim size of every page in the book, which you can then build on to style different sections of your book. The following page definitions add margins and padding for left and right hand pages only in parts of the file that use the “chapters” page rule:
@page chapters:left { /* left page setup */
margin: 0.75in 0.75in 1.125in 0.62in;
padding-left: 0.5in;
}
@page chapters:right { /* right page setup */
margin: 0.75in 0.62in 1.125in 0.75in;
padding-right: 0.5in;
}
The page names are yours to create, and each named page can be further broken up into :first
(which styles the first page within an element that uses that page rule), :left
, and :right
. Invoke page rules like this:
section.chapter {
page: chapters;
}
The Paged Media spec also has 17 predefined page areas that you can customize within your page rules. There’s the main page area, and then 16 other areas run along the edges, as follows:
top-left-corner |
top-left |
top-center |
top-right |
top-right-corner |
left-top |
main page area |
right-top |
left-middle |
right-middle |
left-bottom |
right-bottom |
bottom-left-corner |
bottom-left |
bottom-center |
bottom-right |
bottom-right-corner |
You can style each of these page areas individually, if for example you want to add navigation tabs or running headers or footers (see below for more on those). The Paged Media Editor’s Draft has a great description of sizing and positioning of margin boxes. All but the corner margin boxes have variable widths (for boxes on the horizontal edges) or heights (for boxes along the vertical edges), and will stretch the full width or height available until they run into an obstacle (for example, neighboring content defined in one of the adjacent margin boxes). The example below adds a gray bleed to the outside edge of all index pages by adding a background color to just three of the vertical margin boxes. Because there’s no other content defined in the remaining boxes, the bleed will fill the full height of the page. You might accomplish a similar effect with a fixed position background image or by using page borders, but this method is simple, clean, and gives true bleeds (see Bleeds below).
@page indexmaster:right {
@top-right-corner {
background-color: #777777;
background-color: device-cmyk(0.0, 0.0, 0.0, 0.5);
padding-left: .8in;
margin: -6pt -6pt -6pt 0;
}
@right-top {
background-color: #777777;
background-color: device-cmyk(0.0, 0.0, 0.0, 0.5);
padding-left: .8in;
margin: -6pt -6pt -6pt 0;
}
@bottom-right-corner {
background-color: #777777;
background-color: device-cmyk(0.0, 0.0, 0.0, 0.5);
padding-left: .8in;
margin: -6pt -6pt -6pt 0;
}
}
To keep the bleed on the outside edge, the left and right pages need to be defined separately. The margins, padding, and margin boxes will all need slight adjustments for the corresponding left page bleed. (You may also have noticed that there are two color definitions in the above code; see CMYK Colors below for more about that.)
Counters
Counters aren’t new, but really come in handy for paged media, allowing you to add automatic numbering to chapters, figures, examples, and so on with just a few lines of CSS, like this:
section.chapter > div.titlepage > div > div > h2.title:before {
counter-increment: ChapterNo;
content: "Chapter " counter(ChapterNo);
}
div.figure-title:before {
counter-increment: FigureNo;
content: "Figure " counter(ChapterNo)"-" counter(FigureNo)": ";
}
section.chapter {
counter-reset: ChapterNo FigureNo;
}
In the above code, we created two counters, one for chapter numbering and one for figure numbering, and then reset them both starting at every new chapter. (Bear in mind counter-reset cascades, which means that if you want to reset a few counters on the same element but you define them separately, only the last definition will be honored. To get all the counters to reset, you need to run them together, as shown above.) Additionally, we used the ChapterNo
counter within the figure title, to do things like this: “Figure 5-11:.” In this case, the ChapterNo
counter is actually applied to the figure title’s parent element—section.chapter
. The PDF processor will look progressively further and further up until it finds an instance of the specified counter that applies to the element in question.
Strings
You can turn almost any element into a string that you can then invoke in your CSS to appear in other places throughout your document. Headers and footers, where you have the page number and some text appear on each page, make good use of strings—for example, the book title on the left-hand page, and the chapter title on the right (CSS3 also includes some built-in handling for running elements; see below for why I chose to use strings instead).
Use string-set
on any element to make the contents of the element reusable. Make up a name for it, and name the content you want to include:
div.book > div.titlepage > div > div > h1.title {
string-set: Booktitle self;
}
section.chapter > div.titlepage > div > div > h2.title {
string-set: Chapter self before;
}
In the top example, the name of the string is “Booktitle,” and I use the very simple “self” to say that I want the string to include whatever the content of that element is. In the second block, I tell the string to include both the content of the element, as well as any content I added using the :before
selector (as I did to add with the chapter numbers in the Counters section, above).
To invoke the strings, reference them in the content property:
@page :left { /* left page setup */
@bottom-left { /* verso running footer */
content: counter(page)" "string(Booktitle);
}
}
@page :right { /* right page setup */
@bottom-right { /* recto running footer */
content: string(Chapter)" "counter(page);
}
}
Strings can be quite powerful and can include various types of content within the string-set
property, including counters (I use the page counter in the above examples to display the current page number on each page as well), before/after text, and static text. You can also define multiple strings within one string-set
property.
CSS3-GCPM actually includes special properties just for running elements: running()
and element()
. Used together, these properties convert any element into a running header or footer. However, the danger here is that when you convert an element to a running element in this way, it no longer appears in its original place: running()
acts more like a float that also repeats on subsequent pages. Since I want my headers to appear both in their places inline and as running elements, I used strings instead.
Cross references
Most long documents (like books) include cross references, which usually look something like this: “See page 127.” Within an XML or HTML workflow, cross references can be set up as live links that jump to another section. Although live cross reference links are a basic feature for all digital books, including web-optimized PDFs, they naturally won’t be useful for the print book. However, since the source content is unpaginated, it’s hard to know what location the text should refer to. You won’t know the print page number until you send the text through the PDF processor, and in any case that page number is inaccurate when it comes to reflowable eBooks. The answer is to use generated text, which relies on target-counter()
, and target-text()
.
For example, say you have this cross reference in your HTML:
<p>See <a class="xref" href="#section25" title="Working with Cross
References">Chapter 5, <em>Working with Cross References</em></a>.</p>
By adding this style to your CSS:
a.xref:after {
content: " (page " target-counter(attr(href, url), page) ")";
}
You’ll end up with:
See Chapter 5, Working with Cross References (page 127)
There are a few things going on in that CSS. First, we supplied a static text string that will add a space, an opening parenthesis, the word “page ”, and another space before any generated content. The next bit, target-counter
, tells the renderer to pull in a specific counter related to the element. Then, within the parentheses, we tell the renderer that we need the “page” counter that applies to the href
attribute of the element in question i.e., the renderer should follow the link to its source (#section25), figure out what page it’s on, and display that page number. To wrap it up, we have one last text string to add a closing parenthesis. If the pagination changes the next time we run the document through the PDF processor, the page number will update automatically.
The target-text()
property takes things a step further by pulling in all the text from another element somewhere else in the document. For a simple example, let’s say we need to do something about a hard-coded cross reference to a print page number, like this one:
<p>See <a class="xref" href="#section25" title="Working with Cross
References">page 110</a></p>
…
<h2 class="title" id="section25">Working with Cross References</h2>
Again, we want to make sure that the cross reference always displays an accurate page number, but we also want to include the name of the section being referenced to match the formatting of our previous example. And so, the following:
a.xref {
content: target-text(attr(href, url), content())" (page " target-counter
(attr(href, url), page) ")";
}
will give us this:
See Working with Cross References (page 127)
The target-text
property works much like target-counter
—it follows the url to its source, and when we supply it with a value of content()
, it pulls in the content of the element we’re linking to. The last piece of our cross reference is to add the referenced chapter number within the cross reference text. If we’ve already set up automatic chapter numbering using counters, as we did above in Strings, then we can pull that in as well:
a.xref {
content: target-counter(attr(href, url), ChapterNo)", "target-text
(attr(href, url), content())" (page " target-counter(attr(href, url),
page) ")";
}
For our desired end result:
See Chapter 5, Working with Cross References (page 127)
And now for an important warning: Antenna House won’t break generated text across lines. If the imported text is too long to fit in the page area, it’ll just stretch off past the page edge. Antenna House will, however, break static text strings that you include in the content property. For example, in the above, it will break anywhere in “Chapter “, “ (page “, and “)â€�
, but it won’t break within the actual chapter title, or in the page or chapter numbers (though those latter two are so small, that it probably wouldn’t break inside them anyway). This makes generated text somewhat risky and only appropriate for short lines; more about this in the Footnotes section below.
Table of contents
A table of contents can be set up in the XHTML as a series of nested unordered lists, with each list item linked to the section in question. This works great for ebooks, but print books need to display the page number for the section as well. Just like cross references, you can use target-counter
to set that up:
div.toc ul li.preface a:after {
content: leader(dotted) " " target-counter(attr(href, url), page);
}
The leader(dotted)
function adds a leader tab between the text of the table of contents entry and the generated page number, like so:
Working with Cross References…………………………………….. 127
There are three predefined leader styles: dotted, solid, and space—or you can create your own string. For example, leader(“~â€�)
will create a line of tildes.
Multi-column layouts
Multi-column layouts are another new feature of CSS3. They allow you to split any div
into multiple columns using column-count
. For example, to set only the index of a book in two columns, while leaving the majority of the text in a single column, add column-count: 2
to the index div
:
div.titlepage+div.indexnote+div.index {
column-count: 2;
column-gap: 12pt;
}
The column-count
property sets the number of columns, and the column-gap
property sets the space between the columns. You can also add column-width
to specify the width of two (or more) columns. The columns will span the entire available page area by default.
Breaks
If you’ve done digital book production, then you’re most likely familiar with CSS’ page break properties: page-break-before
, page-break-after
, and page-break-inside
.
As defined in CSS2.1, page-break-before
and -after
accept the following values: auto
, always
, avoid
, left
, right
, and inherit
. You can use them to force breaks around elements, or use page-break-inside
to prevent breaks from occurring inside elements. (This is useful for keeping all paragraphs of a sidebar on the same page, for example). Assigning a value of left
or right
will force a break until you end up with a blank left or right page, respectively. This is useful for book chapters, where you want every chapter to start on a right-hand page. You’d define the chapter div
as follows:
section.chapter {
page-break-before: right;
}
CSS3 adds a few extra properties for multi-column layouts: break-before
, break-after
, and break-inside
. These function the same as the page-break
rules, but at the column level, and add a few extra possible values: page
(force a page break), column
(force a column break), avoid-page
(avoid a page break), and avoid-column
(avoid a column break).
Footnotes
CSS3-GCPM adds special handling just for footnotes. First you’ve got the @footnote
selector, which defines the part of the page reserved just for footnotes (if you’ve got any). We also have a new kind of float: float: footnote;
, which is where the real magic happens. When you apply a float of footnote
to an element, the entire contents of the element get floated down to the bottom of the page, into the @footnote
page area. They lose the normal inherited formatting, and instead get styled with any formatting you’ve defined for the @footnote
area. Additionally, at the point of reference, a marker is added (in superscript by default) that corresponds to the number (or symbol) next to your newly floated content. You can style the in-text
marker, called the footnote-call, and the floated footnote number, called the footnote-marker with two new pseudo-elements: ::footnote-call
and ::footnote-marker
.
Now here’s the disconnect: my XHTML source files included all footnotes as endnotes, where the footnote text sat at the end of each section. My print design called for the footnotes to appear on the page on which they were referenced. In spite of this, I almost got footnotes working without any XHTML changes by just using generated text and the CSS3 footnote tools. Ultimately this plan failed because, as noted above, generated text in some PDF processors doesn’t like to break across lines but will instead just run off the margin if it gets too long. For books with footnotes just a couple of words long, there’s no problem, but that’s rarely the case. [2]
I ended up editing the XHTML to move the footnotes to the exact position where they’re referenced and wrap them in a span
with class="footnote"
. I chose spans mainly because that would leave the footnotes inline, without adding an extra paragraph break (as a div
or p
would).
Here’s the new html:
<p>As you saw in the earlier section,<span class="footnote"><p>If you
were paying attention, of course.</p></span> generated text doesn't break
across lines.</p>
Yep, you’re seeing that right: we’ve got a p
within a span
within a p
. It’s not exactly perfectly formed XHTML, but it does the trick. And with this simple CSS:
span.footnote {
float: footnote;
}
We get this:
As you saw in the earlier section,1 generated text doesn’t break across lines.
1 If you were paying attention, of course.
Another part of the CSS3 footnote arsenal is a predefined counter—footnote
—that applies to all elements with float: footnote
. The footnote counter resets the same as any other counter (see Counters above), allowing you to restart footnote numbering as needed (for example, you might set the numbering to restart at the beginning of each new chapter).
You can customize the way footnotes are marked—with numbers, letters, symbols, or any other value supported in list-style-type
. There’s also a predefined “footnotes” style that rotates through and then multiplies four different glyphs: asterisk, double asterisk, cross, and double cross. Footnotes will be numbered with decimal numbers like 1, 2, 3, etc., by default. To change to lowercase letters, you’d do the following:
::footnote-call {
content: counter(footnote, lower-alpha);
}
::footnote-marker {
content: counter(footnote, lower-alpha);
}
Make sure to set the list type for both the footnote call and footnote marker, unless you want to seriously confuse your readers.
PDF bookmarks
Bookmarking is irrelevant when you’re dealing with print media, but is a handy (and I would argue, essential) component for web-optimized PDFs. Bookmarking adds a linked table of contents to the navigation panel of a PDF reader, allowing users to jump to specific sections. You can create bookmarks to almost any element, and you can tell the PDF how to nest and display the bookmarks all in your CSS.
Here we’ve got two levels of bookmarks, nesting level-one headings inside chapter titles. Instead of having all the levels expanded and displayed when the PDF is opened, we’ve set them to a state of “closed.” Users will only see the chapter titles, and can click to expand the tree and see the nested section headings if they wish:
section.chapter > div.titlepage > div > div > h2.title {
bookmark-level: 1;
bookmark-state: closed;
}
div.sect1 > div.titlepage > div > div > h2.title {
bookmark-level: 2;
bookmark-state: closed;
}
Bookmarks will automatically include the entirety of an element’s content, including any text you added with :before
and :after
selectors. However, you can restrict the bookmark to display only a subset of the element’s information by using the bookmark-label
property. For example:
section.chapter > div.titlepage > div > div > h2.title {
bookmark-level: 1;
bookmark-state: closed;
bookmark-label: content();
}
The example above will display only the actual text of the element, and ignore any before/after content. Note that all text is imported without any formatting, and you can’t specify combinations of content values within a single bookmark-label
declaration.
You can also choose to display a specific text string that will overwrite the contents of the HTML element. For example, if you want to add a bookmark to the copyright page, but the words “Copyright Page” don’t actually appear anywhere in the text:
div.copyrightpage {
bookmark-level: 1;
bookmark-state: closed;
bookmark-label: "Copyright Page"
}
Fonts
When it comes to adding custom fonts to your CSS, you may be relieved to know that it’s the same old CSS you’re used to: use @font-face
to declare the font, and use font-family
to invoke it. Remember to include fallback fonts, especially for body text where you may need to use symbols that aren’t included in your main body font set. Again, this is the same CSS that people have been using for ages:
font-family: "Gotham", "Arial Unicode", sans-serif;
Arial Unicode includes a huge number of glyphs, and so is usually a pretty safe sans-serif fallback.
Most commercial printers require fonts to be embedded in every PDF file. The methods for this vary depending on the PDF processor, so you’ll need to read the documentation carefully if you want to build embedded fonts into your workflow. You could also embed fonts after conversion with PitStop or another PDF post-processing tool.
There are a lot of nice features for fonts coming with CSS3, but they’re still unstable and neither Antenna House nor Prince has added support yet (though Antenna House—and Prince to a more limited extent—has some nice extensions for working with fonts). Check out the Fonts module to get a sense of what’s coming. Development that improves text formatting on a larger scale, including specifying word-
and line-breaks
, spacing, and so on is underway. Prince and Antenna House have implemented some of the features to varying degrees, as they had been defined at the time of release. You can check out the spec, though I encourage you to check with your PDF processor’s CSS reference before you experiment, as there may be variations.
Final touches for printing
There are a few final steps to take if you’re planning to print your document commercially.
Image resolution
Image resolution is crucial for printed media. A general guideline is to set each image’s resolution somewhere in the 200 to 300dpi range (specific requirements depend on each book’s needs). Most PDF processors will impose their own default resolutions on images during conversion, but you can choose to preserve the resolution of the source files instead:
img {
image-resolution: from-image;
}
You can also set the value to normal
, to let the processor choose the resolution, or you can provide a specific numerical dpi value. (Messing around with image resolution is tricky, though, so do your homework first!)
CMYK colors
You should be thinking about CMYK colors throughout building your stylesheet. You specify CMYK colors similarly to how you specify RGB colors:
hr {
color: device-cmyk(0.0, 0.0, 0.0, 0.3);
}
Each value should be a number between 0 and 1 (percentage values actually also work, though only the decimal values are endorsed by the W3C spec right now). Specify the percentage of Cyan, Magenta, Yellow, and Black ink to be used, in that order. You can also build that in with fallbacks by stacking color definitions, for cases where you need to repurpose your stylesheets for multiple presentations (web, print, etc):
hr {
color: #B3B3B3;
color: device-cmyk(0.0, 0.0, 0.0, 0.3);
}
If the device reading the code doesn’t understand CMYK, it’ll use the web-friendly version.
Printer marks and bleed
During commercial printing, books are actually printed on a larger page size than the final version, and are cut down. The cutting is usually pretty exact, but can vary up to a few sixteenths of an inch. So, to ensure that any images or colors that you have at the edges of the page will actual lie on the edge of the page without strips of white being left during the cropping process, you need to set them to run off the page edge a bit, just in case, and then you’ll need to tell the processor to render that little bit of extra stuff beyond the edge, and to add crop marks to guide the printer:
@page {
bleed: 6pt;
marks: crop;
}
It’s that easy. Of course, you’ll need to be creative with bleeding elements, using negative margins and positioning to get them to actually bleed—the processor won’t automatically add extra color or content beyond the limits of the element, it’ll only show content that already exists.
Final notes and further reading
You can read through the full list of CSS that Antenna House supports, but I warn you that the documentation is limited at best and not always clearly worded. Prince’s documentation is slightly better.
Both Antenna House and Prince have their own extensions built on top of the standard CSS3, which are worth checking out. Here are Antenna House’s extensions. Prince’s extensions are listed inline with regular CSS support, and are less robust. Additionally, if the CSS documentation isn’t helping, it may be useful to read the documentation for the related XSL-FO property. They've been in use longer and are more fleshed out, and the functionality is usually the same or very similar. I wasn’t able to find documentation on Prince for this, but here is Antenna House’s documentation.
Remember that CSS3 is still a developing spec; CSS3.info keeps a fairly up-to-date list of the status of the various CSS3 modules. Don’t let that stop you from dipping a toe/foot/leg/neck in the water, though! Here, I limited myself to some book-building basics—page dimensions and margins, cross references, strings, headers and footers, and printer-friendly colors, images, and bleeds—but CSS3 has a lot more to offer when it comes to paged media, and I encourage you to see how much you can do (and remember, CSS2.1 still works, too).
References
[1] If you’re starting with XML source files, you’ll find it much easier to convert to XHTML first before styling with CSS. Luckily Bob Stayton already built the XSL to help you do that: http://sourceforge.net/projects/docbook/files/epub3/.
[2] Because where’s the fun in footnotes if you can’t wax poetic a little bit?
Translations:
Italian
RSS readers: Don't forget to join the discussion!