Copy each paragraph of PDF as plain text with WPViewPDF

  • I have a license for WPTools for > 10 years but not for WPViewPDF (yet). I use Delphi 2007 and XE2. WPTools has been great!

    Question: This is what I would like to do programmatically
    1. Open a standard PDF document (PDF created by others. I will have no control on this).
    2. Copy each paragraph in the PDF document to TStringList items as Plain Text.
    3. Do whatever with the text in TStringList items.

    So, if XYZ.pdf has 10 paragraphs, I will have 10 items in the StringList. Each item will have plain text (i.e., all formatting removed) of the respective paragraph.

    I would think this would be trivial for WPViewPDF but want to make sure that it is easily accomplished BEFORE ordering the VCL component.

    Can WPViewPDF do what I am asking above?

    Thanks

    JayM

    • Offizieller Beitrag

    WPViewPDF has a function to extract the text of a PDF page as text or RTF text.

    But there is no guarantee that it always Looks good. Spaces can be missing and the characters can basically be in (almost) any order, although WPViewPDF sorts the characters by XY position.