JADN home page JADN Repository home page How to read pdf files using C# .NET

including iText, PDFBox, PDF-Excel, etc

A summary of some resources available online for programming in C# to produce software that will read data from files stored in Adobe portable document format (pdf).

Step-by-step instructions and sample C# code are at the bottom of the page.

Firstly, what pdf is

C# Resources for reading PDF files

Extracting images from PDF files using C#

Writing data to a pdf file:

Chris Hornberger wrote on Jul 2 2003, 6:59 am: "Create a Crystal report with the information you want on it, then simply export it to PDF. The fact that you're using C#, I assume you're also using VS Studio.NET and hence, have Crystal too. This will allow you to create your PDF file. Another choice is to spring for Adobe Pagemill and print to the PDF file format."

The brief article "Microsoft Visual Studio.NET: Crystal Reports" by Mujtaba Khambatti explains the benefits of Crystal Reports, designing a report, and using Crystal Reports in projects you create.

There is comprehensive documentation on .NET Crystal Reports in the Microsoft Developer Network in the section MSDN > MSDN Library > Development Tools and Languages > Visual Studio .NET > Developing with Visual Studio .NET > Designing Distributed Applications > Crystal Reports

Other Assorted PDF Utilities:

Some free utilities are available for download - instead of writing your own software, this section may save you the trouble of re-inventing the wheel...

Other useful links:

Some background reading...

 

A worked example – C# code to read a pdf document properties:

Here are step-by-step instructions for using C# and Visual Studio to read the properties of a pdf file using iTextSharp:

(1) Download the most recent version of iTextSharp from http://sourceforge.net/projects/itextsharp/ and unzip the file

(2) Create a new project in Visual Studio (screenshots are from VS2005 and I created a console application) – now you need to add a reference to the iTextSharp dll

In the solution explorer, right click on the project name and select Add Reference…

(3) Click the Browse tab
(4) … and now navigate to the folder into which you unzipped the iTextSharp.dll file, click it, then click the OK button
(5) You will now see itextsharp listed in the solution explorer under References.

Now you can use any of the techniques illustrated in over 200 tutorial files that you also unzipped from the download. Here's a simple bit of code that reads in the properties of a pdf file that's on the web (chosen at random) at:
http://www.chinehamchat.com/Chineham_Chat_Advertisements.pdf

(note: the sample pdf file may have changed since I ran my program)

using System;
using System.Collections.Generic;
using System.Text;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace PdfProperties
{
    class Program
    {
        static void Main(string[] args)
        {
            // create a reader (constructor overloaded for path to local file or URL)
            PdfReader reader
                = new PdfReader("http://www.chinehamchat.com/Chineham_Chat_Advertisements.pdf");
            // total number of pages
            int n = reader.NumberOfPages;
            // size of the first page
            Rectangle psize = reader.GetPageSize(1);
            float width = psize.Width;
            float height = psize.Height;
            Console.WriteLine("Size of page 1 of {0} => {1} × {2}", n, width, height);
            // file properties
            Dictionary<string, string> infodict = reader.Info;
            foreach (KeyValuePair<string, string> kvp in infodict)
                Console.WriteLine(kvp.Key + " => " + kvp.Value);
        }
    }
}

from which the output (eventually – you need to give time for the pdf to download) is:

Size of page 1 of 24 => 421 × 595
ModDate => D:20120122082532Z
CreationDate => D:20101117141712Z
Title => Chineham Chat Advertisement Supplement
Creator => PScript5.dll Version 5.2.2
Author => Chineham Chat Magazine
Keywords => Chineham Chat, Magazine, Basingstoke, Advertisements
Subject => Adverts from the Chineham Chat magazine, distributed free to all households in Chineham, Basingstoke, Hampshire, UK
Producer => Acrobat Distiller 4.05 for Windows

While you’re typing in the code, you'll notice when you type reader. that Intellisense gives you a long list of methods and properties – evidence of the breadth of functionality in this library.


I cannot guarantee any of the quoted, linked information - it was taken on trust from the linked websites - September 2008

The example program code at the bottom is my own work, and I can vouch for that. It was produced in 2010.

Noticed an error, dead link or omission? Please email me (send to webmaster at the domain name jadn.co.uk)