Home > Archive > SQL Server Full-Text Search > November 2005 > FT Index on column with multiple language doesn't work - any ideas









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author FT Index on column with multiple language doesn't work - any ideas
Andry

2005-11-21, 8:24 pm

Hi all,

I have a table on SQL Server 2005 with a column (collation
SQL_Latin1_General_C
P1_CI_AS) that can contain multiple languages. Some of
the possible languages include English, German, French, Kazakh, Arabic,
Chinese (Simplified and Traditional), Japanese, Korean, Thai, etc.

I tried querying for documents that contains 台灣 (meaning "Taiwan" in
Simplified Chinese), but it is not working as I expected.

I ran the following query:

select * from tblDocuments
where CONTAINS(DocumentTit
le, '台灣', LANGUAGE 'Traditional Chinese')

With the English language breaker, I get 0 results.
With the Traditional Chinese language breaker, I get 1 result and it is the
document I was looking for.

I would've expected the LANGUAGE keyword to ignore the default language
breaker and use whatever is specified, but apparently I'm missing
something.... maybe table setup or whatnot.

Any ideas would be greatly appreaciated.

Cheers,
Andry
Hilary Cotter

2005-11-22, 3:24 am

This is correct behavior. When the document is indexed and you are
specifying the English word breaker it interprets the contents of the row to
be English words, unless you are using the xml language tags and specify
that the XML doc is in Chinese and that you are using the XML data type, or
you are using Word and marked that passage containing the character as
Chinese, or the language settings for your Word document are Chinese, or you
are using html and have set the ms.locale to Chinese and set the correct
code page. With Word Docs and HTML docs you must save them in the image or
varbinary data type columns and use a document type column with the
extension the document would have if it was stored in the file system.

If you don't follow the above the Chinese character will be interpreted as a
Unicode sequence which will not be interpreted correctly in your queries.

--
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

"Andry" <Andry@discussions.microsoft.com> wrote in message
news:569E9782-47E3-4657-AC68- 7787214D3D97@microso
ft.com...
> Hi all,
>
> I have a table on SQL Server 2005 with a column (collation
> SQL_Latin1_General_C
P1_CI_AS) that can contain multiple languages. Some of
> the possible languages include English, German, French, Kazakh, Arabic,
> Chinese (Simplified and Traditional), Japanese, Korean, Thai, etc.
>
> I tried querying for documents that contains ?? (meaning "Taiwan" in
> Simplified Chinese), but it is not working as I expected.
>
> I ran the following query:
>
> select * from tblDocuments
> where CONTAINS(DocumentTit
le, '??', LANGUAGE 'Traditional Chinese')
>
> With the English language breaker, I get 0 results.
> With the Traditional Chinese language breaker, I get 1 result and it is
> the
> document I was looking for.
>
> I would've expected the LANGUAGE keyword to ignore the default language
> breaker and use whatever is specified, but apparently I'm missing
> something.... maybe table setup or whatnot.
>
> Any ideas would be greatly appreaciated.
>
> Cheers,
> Andry



Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com