Home > Archive > PostgreSQL Discussion > September 2005 > tsearch2 for alphabetic character strings & codes









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author tsearch2 for alphabetic character strings & codes
Ron Mayer

2005-09-23, 8:23 pm


I'm looking for a way search for substrings strings within
documents in a way very similar to tsearch2, but my strings
are not alphabetical codes so I'm having a tough time
trying to use the current tsearch2 configurations with them.

For example, using tsearch to search for codes like
'31.03(e)(2)(A)'
in a set of documents is tricky because tsearch seems
to treat most of the punctuation as word separators.

fli=# select
fli-# to_tsvector('default
','31.03(e)(2)(A)'),
fli-# to_tsvector('simple'
,'31.03(e)(2)(A)');

to_tsvector | to_tsvector
-----------------------+-----------------------------
'2':3 'e':2 '31.03':1 | '2':3 'a':4 'e':2 '31.03':1
(1 row)


I see that tsearch2 allows different "configurations"
that appaently differ in how they parse strings.

I guess what I'm looking for is a "configuration"
that's even simpler-than-simple, and only breaks
up strings on whitespace and doesn't use any natural
language dictionaries. I was hoping I could download
or define such a configuration; but didn't see any
obvious documentation on how to set up my own
configuration.

Does this sound like a good approach (and if so, could
someone please point me in the right direction), or
are there other things I should be looking to.

Ron

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Oleg Bartunov

2005-09-24, 3:23 am

Ron,

probably you need to write custom parser. tsearch2 supports
different parsers.

Oleg
On Fri, 23 Sep 2005, Ron Mayer wrote:

>
> I'm looking for a way search for substrings strings within
> documents in a way very similar to tsearch2, but my strings
> are not alphabetical codes so I'm having a tough time
> trying to use the current tsearch2 configurations with them.
>
> For example, using tsearch to search for codes like
> '31.03(e)(2)(A)'
> in a set of documents is tricky because tsearch seems
> to treat most of the punctuation as word separators.
>
> fli=# select
> fli-# to_tsvector('default
','31.03(e)(2)(A)'),
> fli-# to_tsvector('simple'
,'31.03(e)(2)(A)');
>
> to_tsvector | to_tsvector
> -----------------------+-----------------------------
> '2':3 'e':2 '31.03':1 | '2':3 'a':4 'e':2 '31.03':1
> (1 row)
>
>
> I see that tsearch2 allows different "configurations"
> that appaently differ in how they parse strings.
>
> I guess what I'm looking for is a "configuration"
> that's even simpler-than-simple, and only breaks
> up strings on whitespace and doesn't use any natural
> language dictionaries. I was hoping I could download
> or define such a configuration; but didn't see any
> obvious documentation on how to set up my own
> configuration.
>
> Does this sound like a good approach (and if so, could
> someone please point me in the right direction), or
> are there other things I should be looking to.
>
> Ron
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>


Regards,
Oleg
____________________
____________________
____________________
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Andrew J. Kopciuch

2005-09-24, 7:23 am

On Saturday 24 September 2005 00:09, Oleg Bartunov wrote:
> Ron,
>
> probably you need to write custom parser. tsearch2 supports
> different parsers.
>


To expand somewhat on what Oleg mentioned, you can find a howto on writing a
custom parser here :

http://www.sai.msu.su/~megera/postg...r-tsearch2.html

This example might be exactly what you are looking for, I did not look into it
too much myself though, but it appears to just split on whitespace.

There is lots of documentation, examples, help, and other goodies for tsearch2
here:

http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/


HTH,


Andy

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com