Join me on Facebook!
— Written by Triangles on September 16, 2017 • updated on September 16, 2017 • ID 57 —
How to reduce the size of a PDF that originated from a scanned document.
I have just scanned a bunch of physical pages into a PDF and the result is a pretty big file. Without any advanced OCR processing, the scanned pages are stored as plain images rather than text, which increase the overall size of the output.
Browsing the web I've come up with the following Ghostscript command that compresses and optimizes the original file into a gray-scaled version of it. The result is a printer-friendly PDF file, i.e. the resolution is set to 300 dpi, but you can change it along the way.
gs \ -sDEVICE=pdfwrite \ -dCompatibilityLevel=1.4 \ -dPDFSETTINGS=/printer \ -dBATCH \ -dNOPAUSE \ -dQUIET \ -sProcessColorModel=DeviceGray \ -sColorConversionStrategy=Gray \ -dOverrideICC \ -sOutputFile=output.pdf \ input.pdf
-sDEVICE=pdfwrite selects which output device Ghostscript should use. I want to print to a PDF file, so I'm using
-dCompatibilityLevel=1.4 generates a PDF version 1.4. You may want to change this according to your needs. Here's a list of all PDF versions.
-dPDFSETTINGS=/printer sets the image quality for printers (i.e. 300 dpi). Choose
/screen if you want to scale it down to 72 dpi: you will obtain additional compression (but the file will look ugly if printed on paper).
-dBATCH -dNOPAUSE: Ghostscript will process the input file(s) without interaction. It will quit on completion.
-dQUIET mutes routine information comments on standard output.
-sProcessColorModel=DeviceGray is the color model to use during conversion.
-sColorConversionStrategy=Gray instructs Ghostscript to produce a grayscaled output.
-dOverrideICC: since the color has changed,
-dOverrideICC updates the color profile accordingly.
-sOutputFile=output.pdf: where to save the output file.
input.pdf: the original file to process.
The above command should work on Windows and OS X as well, as long as Ghostscript is installed.
PDF version 1.5 seems to feature a better image compression. I should look into that more closely.
Ghostscript - How to use Ghostscript (link)
GitHub gist - Compress PDF files with ghostscript (link)
Stackoverflow - How to convert a PDF to grayscale from command line avoiding to be rasterized? (link)