In this article we will cover some basics of PDF format enough to understand concepts of PDF files inner workings and how they impact PDF file viewing on different devices.
If you open PDF file with Hex Editor (good tools to do it would be Sublime Text or Notepad++) you will find that parts of PDF file are text based while parts are coded as binary. Binary parts typically are either raster images or compressed parts of PDF representation. It is possible to decompress them using “qpdf” tool available from http://qpdf.sourceforge.net/. To decompress you need to type following in the command line:
qpdf --stream-data=uncompress input_file.pdf output_file.pdf
.
This will convert PDF file internal compressed structures into decompressed form so you will be able to read an entire PDF file in HEX editor.
Each page in PDF is rendered based on two-dimensional device-independent coordinates system. Each component within a page has coordinates which define its position within a page. Components have following properties:
Text in PDF file is rendered by specifying its position and font. Fonts could be either embedded in PDF file or selected from the 14 fonts which must be present in all PDF readers following the specification.
BT
/F13 12 TF
288 720 Td
(ABC) Tj
ET
In the example above /F13 12 TF
means that font #13 (Helvetica) with size 12 should be used to display text. 288 720 Td
specifies coordinates where the text should be displayed. (ABC) Tj
specifies that ABC should be displayed with the properties defined above.
PDF file can also include interactive elements which are specified using AcroForms or Adobe XML Forms Architecture formats. PDF file format ISO 32000-1:2008 specification could be found in https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf file.
PDF Quick Info | |
---|---|
Portable Document Format | |
MIME Types | |
| |
Identifying Characters | |
Hex: 25 50 44 46 2D 31 2E ASCII: %PDF-1. | |
PDF File Opens with | |
|
Open PDF File Translations: