WordPerfect Document File Format
Comparison Between WP 6.x and WP 7.0 Files
WordPerfect 6.x (WP 6.x) files are separated into two sections, document prefix and document area. WordPerfect 7.0 (WP 7.0) files add support for the OLE Compound File format specification. This Compound document information wraps around a WordPerfect file, and separates the document into four sections, OLE Compound document, document prefix, document area, and ending Compound document information.
The OLE Compound file format is the native file format for OLE 2 servers and makes it possible for full implementation of Windows 95 Shell integration features. A Compound file is a file system within a file. This support was added to allow users, using OLE 2 server applications, to browse, modify, and share embedded WordPerfect 7.0 documents without starting WordPerfect. In addition, the Windows 95 Shell integration gives users the ability to move documents to other machines without losing their links.
The 16-byte file header for WordPerfect 7.0 (WP 7.0) has essentially remained the same format as WP 6.x, with the exception of the minor version number of the document. The minor version has changed from the value of one to two. The values for the Prefix Packet types have not changed for WP 7.0. However, there have been some additions. These additions include:
The values for all single-byte function codes have not changed for WP 7.0. However there have been some additions. These additions include:
The structure of the variable-length multi-byte function codes has not changed. However, there have been some additions. These additions include:
WordPerfect 7.0 File Format
This introduction section contains an overview of the file format for WordPerfect 7.0 documents. Detailed information about the packet and function formats follow the overview.
File Structure
A WordPerfect 7.0 document file is a binary file containing three distinct areas: the Compound Document Format , the file Prefix, and the Document Area. Included with the Prefix is the File Header, Extended Header, Index Area, and Packet Data Area.
OLE
Document
(Included with every WordPerfect document)
(Not documented)
(Indexes point to packet information in the packet data area)
(Data pointed to by indexes in the index area)
(This includes: Text; single byte, variable length, multi-byte, and fixed length multi-byte functions)
The following table shows the components of a document file in more detail.
Generic File Prefix
A generic WordPerfect 7.0 prefix is 526 bytes long. The first 16 bytes is the standard header. The second 496 bytes is the extended header. The last 14 bytes is the index header. There is no packet data area.
The pointer to the document area is 526 (16-byte prefix header + 496-byte extended header + 14-byte index block). The number of indexes is 1 for the index header only.
Note: When creating WordPerfect 7.0 documents you do not need to include the OLE Compound Document wrapper. WordPerfect will read in WP 7.0 documents without it.
Example of Generic Header (hex dump):
FF 57 50 43 0E 02 00 00 01 0A 02 02 00 00 00 02 .WPC............
05 00 00 00 0E 02 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
02 00 01 00 00 00 00 00 00 00 00 00 00 00 ..............
OLE Compound Document Format
This a Microsoft proprietary format, and is not documented. WordPerfect is able to read WP 7.0 documents with or without the OLE Compound Document Format. If you wish to create or read an existing file with the OLE Compound Document wrapper, refer to the wopen function call, documented in the File System section of the PerfectFit SDK.
File Header Format
The file header is 512 (0x200) byte long. The format of the file header is:
<80 (0x50)> <67 (0x43)>
Note: If a developer changes the file external to WP, the developer must update the {file size} long value to reflect the new file size.
File ID Field
The File ID fields are the first 4 bytes of a file header and have the same values for all files produced by Corel WordPerfect products (excluding pre-WP 5.0 files). It is displayed as -1,"WPC" or "FF 57 50 43" in hexadecimal values. If you look at a Corel WordPerfect file that is 5.0 or later in any binary editor, you can see the WP ID in the first 4 bytes of the file as described. If you do not see this ID, the file is not a WP 5.0 or later file.
Pointer to Document Area
The Pointer to Document area field is a long value that begins at offset 4 in the 16-byte file header. This is a pointer to the beginning of the document area. Short integers (two bytes) and long integers (four bytes) are saved in a byte-reversed order.
If any codes such as margins, tabs, font changes, or a specific page size are inserted at the beginning of a document, those codes appear in the document area before the actual text.
Product Type Field
The Product Type field is 1 byte in length and is the 9th byte from the beginning of the file header. It contains a value that identifies the Corel software product used to create the file. The product type for WordPerfect (the software) is 1.
File Type Field
The File Type field is 1 byte in length and is the 10th byte from the beginning of the file header. The value depends on the Product Type (Corel software product) associated with the file. The first ten values (0-9) are reserved for general purpose files that have application across all Corel products. Values 10 and above are available for product-specific file types. See Corel File Types below.
General-Purpose File Types
These general-purpose file types are documented for information only. These file types are reserved for Corel software.
Corel File Types
(SOFT (Sequential Object Format) graphics file for the Macintosh WP.)
Major Version and Minor Version Fields
These version numbers are the major and minor version numbers of the file format itself, not of the program creating the file. For WP 7.0 document files the major version byte is 2, and the minor version byte is 2.
Encryption field
If this word value is non-zero, the file is encrypted and nothing beyond the file header will be intelligible to an application program.
Pointer to Index Area
This is the offset from the beginning of the file to the index header.
Reserved
Four bytes at the beginning of the extended file header are reserved.
File Size
This 32-bit integer field contains the total length of the WordPerfect file. This does not include any additional data that may be added by an OLE Compound Document wrapper. If an application changes the file external to WP, it must update this field to reflect the new file size.
Extended Header
Used by WordPerfect and is not documented.
Index and Packet Data Areas
The prefix index area comes immediately after the file header. The index area contains indexes which point to data in the packet data area. The packet data area follows the index area in the file. Packets of structured data make up the packet data area. See Packet Data Formats later in this section for examples. The packet data contains information such as font descriptions that may be used many times in the document but which is not part of the actual content of the document. Packet data is referenced from other packets and from the document area though the indexes. The WP 7.0 index area contains all the indexes needed for the document.
Index Header
In a WP 7.0 index area, the first index is the index header. The index header tells how many indexes are in the index area. The format of the index header is:
<flags> = 2
<reserved> = 0
[number of indexes in index block]
<reserved = 0> x 10
Indexes
Indexes which follow the index header have the following format:
If bit 0 of the index flags byte is set, it means the data packet has child IDs. If a child ID exists, the first part of the data for the specified packet has the following structure:
If bit 1 of the index flags byte is set, then the following data structure will be the next thing in the packet. If bit 0 (the child bit) is not set, then the following data structure is the first thing in the data packet.
[number of text blocks]
{relative offset of text block within packet}
{size of first text block}
{size of second text block}
..
{size of last text block}
If bit 0 and bit 1 of the index flags byte are not set, neither of the previous two data structures appear in the packet data.
Example:
File Header...
02 00 00 00 28 00 00 00 53 1A 00 00 00 09 00 00
01 00 06 00 00 00 7B 1A 00 00 08 02 01 00 00 00
10 00 00 00 81 1A 00 00 08 23 01 00 00 00 09 01
00 00 91 1A 00 00 00 55 08 00 00 00 17 00 00 00
9A 1B 00 00 0B 30 03 00 00 00 C0 00 00 00 B0 02
00 00 0B 30 05 00 00 00 78 00 00 00 70 03 00 00
Prefix ID ref Text Block pointers
01 00 07 00 04 00 2C 00 00 00 4A 00 00 00 00 00
00 00 00 00 00 00 4A 00 00 00 03 0C 77 1E 2A 00
72 00 75 00 73 00 73 00 00 00 00 00 D4 1A 1D 00
80 01 05 00 08 00 58 02 EC 38 00 00 58 02 00 00
58 02 05 00 50 50 1D 00 D4 D4 1B 1D 00 80 01 05
00 08 00 58 02 EC 38 00 00 58 02 00 00 58 02 05
00 58 02 1D 00 D4 D4 18 10 00 00 03 00 00 00 00
00 00 00 10 00 D4 D4 1A 1D 00 80 01 05 00 08 00
58 02 EC 38 00 00 58 02 00 00 58 02 05 00 50 50
1D 00 D4 D4 1B 1D 00 80 01 05 00 08 00 58 02 EC
38 00 00 58 02 00 00 58 02 05 00 58 02 1D 00 D4
D4 18 10 00 00 03 00 00 00 00 00 00 00 10 00 D4
The first bold block of data is an index to the data type 48 (0x30). The flags byte is 11 (0x0B) or 1011 binary, so bit 0 and bit 1 are set. Bit 0 specifies that there are child references; bit 0 specifies that text blocks exist.
The second bold block is the packet data pointed to by the bold index. The first word or the count of child index references specifies that there is only one child index referenced. The second word is the index ID for the child index. The third word is the number of text blocks in this packet of data. Following the third word are five long values. The first long value is a pointer to where the text data begins. The rest of the long values are sizes of the respective text blocks. See the format of Packet Type 48 (0x30).
When a reference is made to a prefix packet ID, remember that a prefix ID is not the same as a packet type. Prefix ID refers to the index sequence of the packet's index in the index block. Packet type refers to the purpose and structure of the packet's data. Packet IDs are unique to each packet in the prefix area. There can be several packets with the same type value.
Document Area
The document area contains the actual text of the document along with all of the formatting function codes required to create and format the desired document. These function codes may be a single byte or may be many bytes in length. The multi-byte functions may be fixed or variable in length depending on the particular function. These functions may reference data in the packet data area by specifying a prefix index which in turn points to the packet data. This prefix index reference is called a prefix ID or PID.
Single-Byte Functions
Single-byte functions range from 128 (0x80) through 207 (0xCF). In the following example a soft Hyphen at End-of-Line function byte is inserted in the word "comment."
Example: com<131 (0x83)>ment
Variable-Length Multi-Byte Functions
The codes for variable-length multi-byte functions 208 (0xD0) through 239 (0xEF) appear twice each time the function is invoked. The first occurrence is the begin gate (beginning of the function code) and a second occurrence is the end gate (end of the function code).
The orientation of a function specifies what this function applies to. For example, if this function is specific to the page format, the orientation is page type; if it is specific to the line format, the orientation is line type.
Each begin gate is followed by a subgroup byte, a value of size short (16 bits), and a function flag byte. An example of font code structure follows:
Any of the variable-length function codes may reference one or more prefix IDs. For example, the font change function code (see Font Face Change in the Variable-Length Multi-Byte Functions later in this section) references the prefix ID that contains the desired font data. Variable-length functions that do not currently reference a prefix ID may reference one or more prefix IDs in the future.
Document parsing programs must allow for prefix ID references in every variable-length function code.
When the flags byte has the high bit set, there is prefix data associated with the function. The byte following the flags byte (the number of prefix IDs byte) shows how many prefix IDs are referenced in the function.
Following the number of prefix IDs byte is a short value for each prefix ID that exists. Refer to Variable-Length Multi-Byte Functions later in this section for more information.
Next is a short value showing the size of the non-deletable data. The data in variable-length multi-byte functions is divided into two main parts: a non-deletable portion and a deletable portion. The non-deletable portion of a function code is the documented part of the function and should not change from one interim release of WP 7.0 to another. If data is added to the non-deletable part of a function code, it is added to the end of the non-deletable data.
Deletable data directly follows the non-deletable data. The size of the deletable data can be variable for each function. Deletable data is undocumented, since it is specific to the formatter of WordPerfect 7.0. It can be platform specific, language specific, version specific, and so forth. It is subject to change at any time. Deletable data may or may not be present in a variable-length function code within files created in WordPerfect. Document files created outside of WordPerfect should contain only non-deletable data.
Pertinent information for application developers is documented in the non-deletable portion of the function code. To skip over the deletable data, use the size field to move from the beginning of the current function to the next existing function.
Each end gate is preceded by a size value (short), which should always be the same value as the size encountered at the beginning of the function. The size of the function is the total size of the function including begin and end gates.
Fixed-Length Multi-Byte Functions
The codes for fixed-length multi-byte functions 240 (0xF0) through 255 (0xFF) always appear twice. The first occurrence is the begin gate, and a second occurrence is the end gate. The length of each function is fixed and listed after the function code.
can't
can<240 (0xF0)><28 (0x1C)><4 (0x04)><240 (0xF0)>t
Document formatting is accomplished by embedding function codes in the text of a document. A function is any byte greater than 127 (0x7F).
Glossary of Terms
Computers perform operations and handle data in binary form, which can be readily represented with hexadecimal numbers. In this document, values will generally be shown as decimal numbers followed by the hexadecimal equivalent in parentheses. The hexadecimal value will be represented as number that begins with "0x" followed by the actual value, such as (0xFF). In most cases values are unsigned and exceptions to this rule will be noted.
Text Characters
The character <0 (0x00)> has special meaning as the null character and is always deleted by WordPerfect. All values from <1 (0x01)> to <127 (0x7F)> are characters and are mapped to WP extended characters.
Size Definitions
Sizes are referred to as bytes, short integers (sometimes abbreviated as short), or long integers (sometimes abbreviated as long). Depending on the environment and operating system, these terms can mean different things. Fields are depicted with the field name encased in brackets. The brackets indicate the size of the field. Use the table below to match the bracket types with the size and terms they represent.
The byte sequence of all multi-byte data types that are larger than a byte follows the Intel convention of placing the least-significant byte first.
Fields with Bit Flags
Some fields have bit flags. Individual bits are specified by a bit number. The range of bits for a byte value is from 0 to 7, with bit 0 as the rightmost or least significant bit, and bit 7 as the leftmost or most significant bit.
Function Code Documentation Conventions
The brackets shown in table under Size Definitions above are used to describe the size of the individual fields within a packet or a function code. Unless otherwise specified, byte values, 16-bit short values, and 32-bit long values are unsigned. If a field is variable in length, it is represented with " x ?" following the field.
Examples: If a field contains 5 bytes, it is represented as: <byte field description> x 5
If a field contains an indefinite number of short values such as a null terminated word string, it is represented as: [short field description] x ?
If a field contains 2 long values, it is shown as:
{long field description} x 2
Indentation
Indentation is used in this document to distinguish levels of detail and to signal something unique about the information. Most flag fields require a definition of the meaning of each bit used in the flag. The definition will be indented under the flag field to give a visual indication that it contains additional information about the previous field. Some data fields may or may not exist in a particular instance of a function. Generally a flag bit is used to indicate whether or not these fields are present. These field definitions will be indented to show that they may not exist in a specific instance of the function. An example follows:
<function> <sub-function> [size] <flags = 0 or PRFXID>
If the prefix ID bit (PRFXID) is set, the following information exists:
[number of PIDs] [first PID] . . . [last PID] [data field 1] <flag field> bit 0: 1 = more data follows If bit 0 is set, the following data exists: [data field 2] [data field 3] bit 1: 1 = meaning of bit 1 {data field 4} [size] <function>
The above example illustrates how indentation is used to give visual clues to the data content. If the function flags byte is 0, the PID information is omitted. Bits 0 and 1 are defined for the flag field and in this case bits 2-7 are not used. Bit 0 indicates whether or not data fields 2 and 3 are present.
WordPerfect Word Strings
Some fields in packets and in functions hold text that is marked as WP word, or word strings. The reference to word is to the Intel assembly language term for an unsigned short integer. In this format each character of a string takes up one short integer. The high byte is the number of the WordPerfect character set. The low byte contains an offset value into the character set that represents the position of the actual character.
WordPerfect word strings require that all characters in the string have 16-bit values including any null terminator. However, byte strings of 8-bit characters can have 16-bit characters embedded within the string. This is accomplished with a function code that shows the beginning of a text block using 16-bit characters, and the same function code is repeated to show the end of the block. For the format of this code, see the Extended Character function 240 (0xF0) under Fixed-Length Multi-Byte Functions.
Units of Measure
WPU stands for WordPerfect Unit, which is one 1200th of an inch. Dimensions are usually given in WordPerfect Units.
WPFP stands for WordPerfect Fixed Point Value. This is an unsigned 16-bit number which represents a fraction between 0 and 1. 0x8000 is equal to 0.5 and 0xFFFF is treated as 1.0. It is used to specify a percentage value or a fraction. It is also used as the fractional part of WPSP.
WPSP is used to specify spacing values. WPSP denotes a 32-bit value composed of a 16-bit fraction (WPFP) and a 16-bit integer in that order. The fractional value is always positive. The associated integer value is signed which allows the values to be added as though they were one 32-bit value. For example, to code the number -3.75 the integer would be -4 and the fraction would be +0.25 (0x4000). When the integer and fraction are added, the result is -3.75.
PSU stands for Printer Scalable Unit, which is in 10,000ths of the point size of the font.
Font point sizes are given in 3600ths of an inch and are denoted as 3600ths.
RGB, RGBS, RGBT
RGB is used to mean the percent of red, percent of green, and percent of blue in specifying colors. Each color takes one byte with a range from 0 to 255 (0xFF) where 255 is 100%. The numeric value of 127 (0x7F) is calculated to be 50%. This 3-byte field in a function definition will be represented as:
<color (RGB)> x 3
RGBS includes the three bytes of color information above and adds one byte for percent of shading. Shading also has a range of 0 to 255 (0xFF) where 255 is 100%. This 4-byte field in a function definition will be represented as:
<color (RGBS)> x 4
RGBT includes the three bytes of color information above and adds one byte for percent of transparency. Transparency also has a range of 0 to 255 (0xFF) where 255 is 100%. This 4-byte field in a function definition will be represented as:
<color (RGBT)> x 4
To contact Customer Service at one of our worldwide locations, click here
Last Updated: September 9, 1996