Hello Jason,
Thanks for the feedback.
Perhaps you could explain the main ways you see people using it (ie key use cases)?
This a little bit like I just invented the wheel, I'm just waiting on someone to invent the motor car;-)
I suppose the main motivation and use case is an alternative to RegEX, the syntax for creating complex multiple searches is straightforward xml.
One use case as I see it is for processing multiple documents, as in, to many to process manually.
Lets say you have documents that require localization or legal documents that you know you will have predefined patterns that need some sort
of mechanism of identification of segments but don't know in advance the exact phrase.
Like in the the example I gave to find content within brackets "(Hello World)" and "(Goodbye All)"
OR
"21/06/72"
You only need to know part of the match but not all.
If for example you could tag certain sequences that DON'T require localization. people names, place names, scientific or mathematical formulae.
If for example you could tag certain sequences that DO require localization. date and time formats.
From a quick look at the code, you're conducting the search on the main document part marshalled to an XML string.
But somewhere I guess you're discarding the OpenXML tags?
So the user can search for just document text, or OpenXML tags (eg w:p), or some hybrid (eg "p>A continent") - not sure at what point the tags are getting discarded.
When the search is complete, we have text (not OpenXML), plus your tags?
At present I am only searching the plain text, I'm not looking at the OpenXML, This is something that I intend to address if the interest is there.
At the end of the current process, in addition to the marked up plain text you have access to a structure containing the indexes for all the found sequences.
So with the indexes you can manipulate the document yourself.
Looking forward to your response.
Derek.