|
Home > Archive > PostgreSQL Discussion > April 2006 > Queries with Regular Expressions
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Queries with Regular Expressions
|
|
| Silas Justiniano 2006-04-06, 8:25 pm |
| Hello!
My queries work fine with Regular Expressions, as:
SELECT field FROM table WHERE field ~ 'something';
SELECT field FROM table WHERE field ~* 'something';
SELECT field FROM table WHERE field ~* 'som[i,e]thing';
And it works fine for special characters:
SELECT field FROM table WHERE field ~* 'chão';
But I just can't make it work correctly using brackets:
SELECT field FROM table WHERE field ~* 'ch[aã]o';
It just returns tuples that have 'chao', but not 'chão'.
My queries are utf-8 an the database is SQL_ASCII.
Any idea? Thank you!
--
Silas Justiniano - Brazil
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
| |
| John D. Burger 2006-04-06, 8:25 pm |
| > But I just can't make it work correctly using brackets:
> SELECT field FROM table WHERE field ~* 'ch[aã]o';
>
> It just returns tuples that have 'chao', but not 'chão'.
>
> My queries are utf-8 an the database is SQL_ASCII.
I suspect the bracketed expression is turning into [aXY], where XY is
the two-byte sequence corresponding to ã in UTF8. So the regular
expression is only going to match strings of the form chao, chXo and
chYo. To make sure that this is what's happening, try this:
select length('ã');
I bet you get back 2, not 1. I don't know if a UTF8 database will
handle this correctly or not. The safest thing to do may be to use
queries like this:
SELECT field FROM table WHERE field ~* 'ch(a|ã)o';
- John D. Burger
MITRE
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
| |
| Tom Lane 2006-04-06, 8:25 pm |
| "John D. Burger" <john@mitre.org> writes:
[color=darkred]
> I suspect the bracketed expression is turning into [aXY], where XY is
> the two-byte sequence corresponding to ã in UTF8.
That's what it looks like to me. You can hardly blame the database for
this, when you haven't clued it in that you're using UTF8.
> I don't know if a UTF8 database will handle this correctly or not.
I believe that it will work in recent PG releases (7.4 and up).
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
| |
| Silas Justiniano 2006-04-06, 8:25 pm |
| John, it worked completely fine! Thank you! I don't understand exactly
the difference between [] and () for REs, but I'm starting to study
them deeply.
Thank you very much! Pg is great!
On 4/6/06, John D. Burger <john@mitre.org> wrote:
>
> I suspect the bracketed expression is turning into [aXY], where XY is
> the two-byte sequence corresponding to ã in UTF8. So the regular
> expression is only going to match strings of the form chao, chXo and
> chYo. To make sure that this is what's happening, try this:
>
> select length('ã');
>
> I bet you get back 2, not 1. I don't know if a UTF8 database will
> handle this correctly or not. The safest thing to do may be to use
> queries like this:
>
> SELECT field FROM table WHERE field ~* 'ch(a|ã)o';
>
> - John D. Burger
> MITRE
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
--
Silas Justiniano - Brazil
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
|
|
|
|
|