Home > Archive > MySQL ODBC Connector > February 2006 > Different result with subquery









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Different result with subquery
neroux

2006-02-21, 1:23 pm


Hello,

I am having a table with an integer column which I use for probability
calculations (the higher a value the more often it is chosen).

Now I am having the following query, which should actually incorporate
these probabilities, however it seems to prefer values from the middle
range

SELECT * FROM table WHERE field>=(SELECT RAND()*MAX(field) FROM
table) ORDER BY field LIMIT 1

If I split it up it still seems to prefer mid-range values over values
closer to the higher end, however it doesnt "completely" ignore the
highest value anymore.

random_value = SELECT RAND()*MAX(field) FROM table;
SELECT * FROM table WHERE field>=random_value ORDER BY field LIMIT 1

Does anyone have an explanation for that or a better solution? Thanks

____________________
____________________
__________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? unsub...sie.nctu.edu.tw

Peter Brawley

2006-02-25, 9:43 am

Neroux,

RAND() gives a roughly rectangular distribution, not a normal
distribution, so samples of fewer than 100 or so values from it are
likely to show large but statistically _insignificant_ differences. In
larger samples, I don't see any tendency of RAND() to produce more
values near its mean (.5) than near its limits (0,1). You can check this
yourself by populating a table with RAND() values then running a simple
crosstab query eg

SELECT
SUM(IF(r<0.1,1,0)) AS P1,
SUM(IF(r>=0.1 AND r<0.2,1,0)) AS P2,
SUM(IF(r>=0.2 AND r<0.3,1,0)) AS P3,
SUM(IF(r>=0.3 AND r<0.4,1,0)) AS P4,
SUM(IF(r>=0.4 AND r<0.5,1,0)) AS P5,
SUM(IF(r>=0.5 AND r<0.6,1,0)) AS P6,
SUM(IF(r>=0.6 AND r<0.7,1,0)) AS P7,
SUM(IF(r>=0.7 AND r<0.8,1,0)) AS P8,
SUM(IF(r>=0.8 AND r<0.9,1,0)) AS P9,
SUM(IF(r>=0.9 AND r<1.0,1,0)) AS P10
FROM randresults;

PB

-----

neroux wrote:
> Hello,
>
> I am having a table with an integer column which I use for probability
> calculations (the higher a value the more often it is chosen).
>
> Now I am having the following query, which should actually incorporate
> these probabilities, however it seems to prefer values from the middle
> range
>
> SELECT * FROM table WHERE field>=(SELECT RAND()*MAX(field) FROM
> table) ORDER BY field LIMIT 1
>
> If I split it up it still seems to prefer mid-range values over values
> closer to the higher end, however it doesnt "completely" ignore the
> highest value anymore.
>
> random_value = SELECT RAND()*MAX(field) FROM table;
> SELECT * FROM table WHERE field>=random_value ORDER BY field LIMIT 1
>
> Does anyone have an explanation for that or a better solution? Thanks
>
> ____________________
____________________
__________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
>



--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 267.15.11/264 - Release Date: 2/17/2006


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? unsub...sie.nctu.edu.tw

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2009 droptable.com