PDF: Copy the next text after I found a certain text

Posted by: andreas.kren on 28 July 2018, 5:23 pm EST

    • Post Options:
    • Link

    Posted 28 July 2018, 5:23 pm EST

    I found the sample that shows how to (rudimentary) use the C1TextSearchManager class. I want to find a certain text within a given pdf and then copy for example the next 10 bytes (for example an “order number”) to use it for other tasks,

    Is this possible?

  • Posted 30 July 2018, 7:27 am EST

    Hi Andreas,

    C1TextSearchManager has a property named NearText which as its name says, shows the text in neighborhood of the searched text. We can trim this NearText to remove all characters till the end of searched text and then store this in a global variable for any further usage.

    string newText = "";
    int index = -1;
    if (_textSearchManager.FoundPositions.Count > 0)
    {
         index = _textSearchManager.FoundPositions[0].NearText.IndexOf(searchText, StringComparison.OrdinalIgnoreCase);
    }
    if (index != -1)
    {
         newText = _textSearchManager.FoundPositions[0].NearText.Remove(0, index + tbFindText.Text.Length);
    }
    else
    {
         // "from" not found
    }
    

    Hopefully this would help you meet your requirement.

    Thanks,

    Ruchir

  • Posted 1 August 2018, 5:09 pm EST

    Thank you very much. Will try it.

  • Posted 2 August 2018, 7:00 am EST - Updated 4 October 2022, 2:22 am EST

    And how can I control how long “neartext” is?

    See:

    Search for “Auftragsnummer”

    I get “Auftragsnummer: WRW” → see screenshot, but I need the “whole” ordernumber

    (WRW/21043 … )

  • Posted 3 August 2018, 3:03 am EST

    Hi,

    Thank you for the screen-shots.

    With the current version it is not possible to specify the length of NearText. However, I have forwarded this request to the concerned team for consideration [ID:336892] and it might be added in future builds, if they find it feasible.

    Thanks & regards,

    Ruchir

  • Posted 3 August 2018, 5:33 am EST

    Is there another way to “treat” the pdf as a “text only” string as it can be seen in the arobat reader? So that I can search for a certain text on my own and do the necessary string operations on my own? In the acrobat reader I can save the pdf as “text file”. with this text file I could do what I want. (Search for string “Auftragsnummer” and copy the following text)

    Meanwhile I found this site: http://kenbenoit.net/how-to-batch-convert-pdf-files-to-text/

    which would possibly enable me to do this…

    but is there a way within the c1 controls?

  • Posted 6 August 2018, 6:56 am EST

    We are sorry but pdfs cannot be exported to *.txt format using ComponentOne controls.

    ~Ruchir

  • Posted 9 August 2018, 5:43 am EST

    a property which gives me the whole pdf as a string would be enough.

  • Posted 10 August 2018, 5:46 am EST

    Hi,

    For getting the pdf content as string you may use GetWholeDocumentRange method of PdfDocumentSource, then use its GetText method as follows: ```

    c1PdfDocumentSource1.LoadFromFile(openFileDialog1.FileName);

    using (var mc = new C1DXTextMeasurementContext())

    {

    var dr = c1PdfDocumentSource1.GetWholeDocumentRange(mc);

    var s = dr.GetText();

    textBox1.Text = s;

    }

    Hopefully it would meet your requirement. However, if you still any further help do let me know.
    
    Thanks,
    Ruchir[zip filename="PdfToText.zip"]https://gccontent.blob.core.windows.net/forum-uploads/file-a34a10b3-d554-49c9-8fd5-fa5ab1a14d69.zip[/zip]
  • Posted 10 August 2018, 8:06 am EST

    Thank you very much!

  • Posted 10 August 2018, 8:17 am EST

    This is what I needed! Thank you!!

Need extra support?

Upgrade your support plan and get personal unlimited phone support with our customer engagement team

Learn More

Forum Channels