PDF: Copy the next text after I found a certain text

Posted by: andreas.kren on 28 July 2018, 5:23 pm EST

Please login to follow topic

andreas.kren
- Post Options:
- Link
  Copy
Posted 28 July 2018, 5:23 pm EST

I found the sample that shows how to (rudimentary) use the C1TextSearchManager class. I want to find a certain text within a given pdf and then copy for example the next 10 bytes (for example an “order number”) to use it for other tasks,

Is this possible?
Ruchir.Agarwal
- Post Options:
- Link
  Copy
Posted 30 July 2018, 7:27 am EST
Hi Andreas,

C1TextSearchManager has a property named NearText which as its name says, shows the text in neighborhood of the searched text. We can trim this NearText to remove all characters till the end of searched text and then store this in a global variable for any further usage.

string newText = ""; int index = -1; if (_textSearchManager.FoundPositions.Count > 0) { index = _textSearchManager.FoundPositions[0].NearText.IndexOf(searchText, StringComparison.OrdinalIgnoreCase); } if (index != -1) { newText = _textSearchManager.FoundPositions[0].NearText.Remove(0, index + tbFindText.Text.Length); } else { // "from" not found }

Hopefully this would help you meet your requirement.

Thanks,

Ruchir
andreas.kren
- Post Options:
- Link
  Copy
Posted 1 August 2018, 5:09 pm EST

Thank you very much. Will try it.
andreas.kren
- Post Options:
- Link
  Copy
Posted 2 August 2018, 7:00 am EST - Updated 4 October 2022, 2:22 am EST

And how can I control how long “neartext” is?

See:

Search for “Auftragsnummer”

I get “Auftragsnummer: WRW” → see screenshot, but I need the “whole” ordernumber

(WRW/21043 … )

860×537 37.6 KB
Ruchir.Agarwal
- Post Options:
- Link
  Copy
Posted 3 August 2018, 3:03 am EST

Hi,

Thank you for the screen-shots.

With the current version it is not possible to specify the length of NearText. However, I have forwarded this request to the concerned team for consideration [ID:336892] and it might be added in future builds, if they find it feasible.

Thanks & regards,

Ruchir
andreas.kren
- Post Options:
- Link
  Copy
Posted 3 August 2018, 5:33 am EST

Is there another way to “treat” the pdf as a “text only” string as it can be seen in the arobat reader? So that I can search for a certain text on my own and do the necessary string operations on my own? In the acrobat reader I can save the pdf as “text file”. with this text file I could do what I want. (Search for string “Auftragsnummer” and copy the following text)

Meanwhile I found this site: http://kenbenoit.net/how-to-batch-convert-pdf-files-to-text/

which would possibly enable me to do this…

but is there a way within the c1 controls?
Ruchir.Agarwal
- Post Options:
- Link
  Copy
Posted 6 August 2018, 6:56 am EST

We are sorry but pdfs cannot be exported to *.txt format using ComponentOne controls.

~Ruchir
andreas.kren
- Post Options:
- Link
  Copy
Posted 9 August 2018, 5:43 am EST

a property which gives me the whole pdf as a string would be enough.
Ruchir.Agarwal
- Post Options:
- Link
  Copy
Posted 10 August 2018, 5:46 am EST
Hi,

For getting the pdf content as string you may use GetWholeDocumentRange method of PdfDocumentSource, then use its GetText method as follows: ```

c1PdfDocumentSource1.LoadFromFile(openFileDialog1.FileName);

using (var mc = new C1DXTextMeasurementContext())

{

var dr = c1PdfDocumentSource1.GetWholeDocumentRange(mc);

var s = dr.GetText();

textBox1.Text = s;

}

Hopefully it would meet your requirement. However, if you still any further help do let me know. Thanks, Ruchir[zip filename="PdfToText.zip"]https://gccontent.blob.core.windows.net/forum-uploads/file-a34a10b3-d554-49c9-8fd5-fa5ab1a14d69.zip[/zip]
andreas.kren
- Post Options:
- Link
  Copy
Posted 10 August 2018, 8:06 am EST

Thank you very much!
andreas.kren
- Post Options:
- Link
  Copy
Posted 10 August 2018, 8:17 am EST

This is what I needed! Thank you!!

Please login to reply to thread

Need extra support?

Upgrade your support plan and get personal unlimited phone support with our customer engagement team

Learn More

Forum Channels

ComponentOne

Forums for all current editions of the ComponentOne .NET UI control product line, including ComponentOne Studio and ComponentOne Studio for Xamarin.
ActiveReports

Forums for all versions of ActiveReports and ActiveReports Server
Spread

Forums for all current versions of Spread .NET spreadsheets, SpreadJS JavaScript spreadsheets, and SpreadCOM spreadsheets.
Wijmo

Forums for all Wijmo products, including Wijmo Core, FinancialChart, FlexSheet, MultiRow, OLAP, and ReportViewer
- General Discussion
Document Solutions

Forums for all Document Solutions products, including Document Solutions for PDF, Word, Excel (.NET and Java), and Imaging.

PDF: Copy the next text after I found a certain text

Need extra support?

Forum Channels

ComponentOne

ActiveReports

Spread

Wijmo

Document Solutions