A minimally conforming EPUB bundle has several required files. The specification can be quite strict about the format, contents, and location of those files within the EPUB archive. This section explains what you must know when you work with the EPUB standard.
Anatomy of an EPUB bundle
The basic structure of a minimal EPUB file follows the pattern as shown below. When ready for distribution, this directory structure is bundled together into a ZIP-format file.
We recommend that you create your own as you follow the tutorial.
To start building your EPUB book, create a directory for the EPUB project. Open a text editor or an IDE such as Eclipse. I recommend using an editor that has an XML mode—in particular, one that can validate against the Relax NG schemas
The mimetype file
This one's pretty easy: The mimetype file is required and must be named mimetype. The contents of the file are always:
Note that the mimetype file cannot contain any newlines or carriage returns.
Additionally, the mimetype file must be the first file in the ZIP archive and must not itself be compressed. You'll see how to include it using common ZIP arguments in Bundling your EPUB file as a ZIP archive. For now, just create this file and save it, making sure that it's at the root level of your EPUB project.
At the root level of the EPUB, there must be a META-INF directory, and it must contain a file named container.xml. EPUB reading systems will look for this file first, as it points to the location of the metadata for the digital book.
The value of full-path (in bold) is the only part of this file that will ever vary. The directory path must be relative to the root of the EPUB file itself, not relative to the META-INF directory.
The mimetype and container files are the only two whose location in the EPUB archive are strictly controlled. As recommended (although not required), store the remaining files in the EPUB in a sub-directory. (By convention, this is usually called OEBPS, for Open eBook Publication Structure, but can be whatever you like.)
The META-INF directory can contain a few optional files, as well. These files allow EPUB to support digital signatures, encryption, and digital rights management (DRM). These topics are not covered in this tutorial. See the OCF specification for more information.
Open Packaging Format metadata file
Although this file can be named anything, the OPF file is conventionally called content.opf. It specifies the location of all the content of the book, from its text to other media such as images. It also points to another metadata file, the Navigation Center extended (NCX) table of contents.
The OPF file is the most complex metadata in the EPUB specification.
Hello World: My First EPUB
Categorized in 4 parts:
The two required terms are title and identifier. According to the EPUB specification, the identifier must be a unique value, although it's up to the digital book creator to define that unique value. For book publishers, this field will typically contain an ISBN or Library of Congress number. For other EPUB creators, consider using a URL or a large, randomly generated unique user ID (UUID). Note that the value of the attribute unique-identifier must match the ID attribute of the dc:identifier element.
Other metadata to consider adding, if it's relevant to your content, include:
- Language (as dc:language).
- Publication date (as dc:date).
- Publisher (as dc:publisher). (This can be your company or individual name.)
- Copyright information (as dc:rights). (If releasing the work under a Creative Commons license, put the URL for the license here.)
You must include the first item, toc.ncx . Note that all items have an appropriate media-type value and that the media type for the XHTML content is application/xhtml+xml. This exact value is required and cannot be text/html or some other type.
EPUB supports four image file formats as
types: JPEG, PNG, GIF, and (SVG. You can include non-supported file types if you provide a fall-back to a core type. See the OPF specification for more information on fall-back items.
The values of the href attribute should be a Uniform Resource Identifier (URI) that is relative to the OPF file. (This is easy to confuse with the reference to the OPF file in the container.xml file, where it must be relative to the EPUB as a whole.) In this case, the OPF file is in the same OEBPS directory as your content, so no path information is required here
Each itemref element has a required attribute idref, which must match one of the IDs in the manifest. The toc attribute is also required. It references an ID in the manifest that must indicate the file name of the NCX table of contents.
The linear attribute in the spine indicates whether the item is considered part of the linear reading order versus being extraneous front- or end-matter. I recommend that you define any cover page as linear=no. Conforming EPUB reading systems will open the book to the first item in the spine that's not set as linear=no.
The guide is a way of providing semantic information to an EPUB reading system. While the manifest defines the physical resources in the EPUB and the spine provides information about their order, the guide explains what the sections mean. Here's a partial list of the values that are allowed in the OPF guide:
- cover: The book cover
- title-page: A page with author and publisher information
- toc: The table of contents
NCX table of contents
Although the OCF file is defined as part of EPUB itself, the last major metadata file is borrowed from a different digital book standard. DAISY is a consortium that develops data formats for readers who are unable to use traditional books, often because of visual impairments or the inability to manipulate printed works. EPUB has borrowed DAISY's NCX DTD. The NCX defines the table of contents of the digital book. In complex books, it is typically hierarchical, containing nested parts, chapters, and sections.
Simple NCX file
Hello World: My First EPUB
Adding the final content
it's time to put in the actual book content.
Create these files and folder:
- title.html: This file will be the title page for the book. Create this file and include an img element that references a cover image, with the value of the src attribute as images/cover.png.
- images: Create this folder inside OEBPS, then copy the sample image (or create your own), naming it cover.png.
- content.html: This will be the actual text of the book.
- stylesheet.css: Place this file in the same OEBPS directory as the XHTML files.
Sample title page
Hello World: My First EPUB
Hello World: My First EPUB
XHTML content in EPUB follows a few rules that might be unfamiliar to you from general Web development:
- The content must validate as XHTML 1.1: The only significant difference between XHTML 1.0 Strict and XHTML 1.1 is that the name attribute has been removed. (Use IDs to refer to anchors within content.)
- img elements can only reference images that are local to the eBook: The elements cannot reference images on the Web.
Bundling the EPUB into a valid epub+zip file
The OEBPS Container Format portion of the EPUB specification has several things to say about EPUB and ZIP, but the most important are:
- The first file in the archive must be the mimetype file . The mimetype file must not be compressed. This allows non-ZIP utilities to uncover the mimetype by reading the raw bytes starting from position 30 in the EPUB bundle.
- The ZIP archive cannot be encrypted. EPUB supports encryption but not at the level of the ZIP file.
$ zip -0Xq my-book.epub mimetype
$ zip -Xr9Dq my-book.epub *
In the first command, you create the new ZIP archive and add the mimetype file with no compression. In the second, you add the remaining items. The flags -X and -D minimize extraneous information in the .zip file; -r will recursively include the contents of META-INF and OEBPS directories.