Read line by line from PDF File using GcPdfDocument

Posted by: Swapnil.Walwadkar on 5 October 2021, 7:53 am EST

  • Posted 5 October 2021, 7:53 am EST - Updated 4 October 2022, 8:38 am EST

    Hi

    I am extracting text from PDF documents (which are created using C1PdfDocument) by using GcPdfDocument.

    The PDF documents contain data with ‘\t\r’. However, I am not receiving the formatting as it is when read using GcPdfDocument.

    Please let me know how I can read the text line by line from the PDF document.

    Attaching screenshots for reference. (One screenshot of PDF Document[A.jpg] and one screenshot of the data extracted from PDF document[B.jpg])

  • Posted 5 October 2021, 8:10 am EST

    Hi Swapnil,

    Thank you for sharing the snapshot.

    Could you please confirm whether you are using C1PdfDocument or GcPdfDocument?

    Regards,

    Nitin

  • Posted 5 October 2021, 11:26 pm EST

    Hi Nitin,

    I am extracting text from PDF documents by using GcPdfDocument.

    The following is the code:

    GcPdfDocument doc = new GcPdfDocument();

    FileStream fs = new FileStream(filename, FileMode.Open, FileAccess.Read);

    doc.Load(fs);

                string text = string.Empty;
                for (int pages = 0; pages < doc.Pages.Count; pages++)
                {
                    text = text + doc.Pages[pages].GetText();
                }
    

    Additional Info if necessary: I have already created those PDF documents by using C1PdfDocument.

  • Posted 6 October 2021, 1:51 am EST

    HI Swanil,

    We tried to load the Pdf generated from the C1Pdf (https://demos.componentone.com/aspnet/ControlExplorer/C1PDF/Overview.aspx)

    And we are able to get the correct text. Could you please share the Pdf with us from which you are trying to parse the text and you are unable to do so?

    Also, please share the code to generate the PDF so that we may investigate further.

    Regards,

    Manish Gupta

    Refer to the sample: GcPdf_c1pdf.zip

  • Posted 6 October 2021, 2:23 am EST

    Hi

    Thank you for your reply.

    I am sharing the PDF file. (SampleLog.pdf)

    The following is the code to generate PDF:

    		C1.C1Pdf.C1PdfDocument pdf = new C1.C1Pdf.C1PdfDocument();
                        System.Drawing.Font font = new System.Drawing.Font("MS Gothic", 9, System.Drawing.FontStyle.Regular);
                        pdf.PaperKind = System.Drawing.Printing.PaperKind.A4;
                        RectangleF rect = pdf.PageRectangle;
                        rect.Inflate(-48, -48);
    
                        pdf.Security.AllowPrint = false;
                        pdf.Security.AllowCopyContent = false;
                        pdf.Security.AllowEditContent = false;
    
                        string data = logStringBuilder.ToString();
    
                        while (true)
                        {
                            int nextChar = pdf.DrawString(data, font, Brushes.Black, rect);
                            if (nextChar >= data.Length)
                            {
                                break;
                            }
                            data = data.Substring(nextChar);
                            pdf.NewPage();
                        }
    
                        pdf.Save(fileNameWithpathPDF);
    
  • Posted 13 October 2021, 7:55 am EST

    Hi,

    It will be helpful if you could inform me when can I expect a solution for it.

Need extra support?

Upgrade your support plan and get personal unlimited phone support with our customer engagement team

Learn More

Forum Channels