Shyli's Enterprise Search Platform Forum
Welcome Guest Active Topics | Log In | Register

New Topic Post Reply Tag as favorite
Document Processing Stage - Extract Year from Date
Guest
#1 Posted : Monday, February 22, 2010 3:01:36 PM

Rank: Guest

Posts: 142
Points: 473
I have a simple question hope some one can give some ideas.

In my document processing pipeline I need a stage that extracts the year(YYYY) part from the docdatetime (yyyy-mm-dd) or any date field in the pipeline. I tried using a matcher and creating an extractor with a regular expression but fast doesnt seem to like it.

Does any one have any ideas on how this can be done?

Thanks,
Guest
cominvent Offline
#2 Posted : Monday, February 22, 2010 4:44:44 PM

Rank: Advanced Member

Posts: 30
Points: 90
Location: Oslo, Norway
Regex matcher should work well, put it as early as possible in the pipeline. Exactly what went wrong when you tried this approach?
Cominvent AS - www.cominvent.com
High-end FAST ESP consulting and training
Jan Høydahl - Gründer & senior architect
Guest
#3 Posted : Monday, February 22, 2010 8:07:53 PM

Rank: Guest

Posts: 142
Points: 473
hi cominvent, i started of with the isbn extractor as an example but i am not sure about the matcher xml file, attributes like master, slave attributes, where do we specify the regex? thank you.
cominvent Offline
#4 Posted : Tuesday, February 23, 2010 8:12:30 AM

Rank: Advanced Member

Posts: 30
Points: 90
Location: Oslo, Norway
Have you checked the Comfiguration Guide chapter called "Matchers"? There you find this example:

Code:

<configuration>
  <matcher type=“pcre” debug=“yes”>
    <pcre global=“yes” verify=“yes”> <re optimize=“no”>([Aa]+)([Bb]+)<re> <output command=“lowercase”>
      <format>[$2, $1]</format> </output>
    </pcre>
  </matcher>
</configuration>


Quote:
Assume that we are applying the regular expression (a+)(b+) to the string xxabbxx . If the formatting specification is foo $2 $1 bar then the t component of the match will be foo bb a bar . If the formatting specification is $0 then the t component of the match will be abb .


If you post your xml which does not work it is easyer to help.
Cominvent AS - www.cominvent.com
High-end FAST ESP consulting and training
Jan Høydahl - Gründer & senior architect
Guest
#5 Posted : Thursday, February 25, 2010 7:20:39 AM

Rank: Guest

Posts: 142
Points: 473
Hi Cominvent,

I have tried this but for reason the procserver throws an error saying that the pipeline cannot be registered. So I have instead used stringreplacer stage and specified the regular expression in the config file to replace the the unwanted string pattern with null and retain the year. Dirty but works.

Thanks for the advice,
Jon



Quick Reply Show Quick Reply
Users browsing this topic
Guest
New Topic Post Reply Tag as favorite
Forum Jump  
You can post new topics in this forum.
You can reply to topics in this forum.
You can delete your posts in this forum.
You can edit your posts in this forum.
You cannot create polls in this forum.
You can vote in polls in this forum.