solución aproximada pero rápida sería la de la muestra 4 puntos representativos de los datos y resolver la ecuación polinomial para estos puntos.
En cuanto a la muestreo, se puede dividir los datos en sectores iguales y calcular la media de X e Y para cada sector - la división se puede hacer usando los cuartiles de los valores de x, promedios de X- valores, min(x)+(max(x)-min(x))/4
o lo que sea que crea que es el más apropiado.
Para ilustrar la toma de muestras por cuartiles (es decir, por números de fila):
En cuanto a la solución de, utilicé numberempire.com para resolver estas ecuaciones * para las variables k,a,b,c
:
k + a*X1 + b*X1^2 + c*X1^3 - Y1 = 0,
k + a*X2 + b*X2^2 + c*X2^3 - Y2 = 0,
k + a*X3 + b*X3^2 + c*X3^3 - Y3 = 0,
k + a*X4 + b*X4^2 + c*X4^3 - Y4 = 0
* Dado que Y(X) = 0 + ax bx^2 + cx^3 + ϵ
incluye implícitamente el punto [0, 0] como uno de los puntos de muestra, crearía malas aproximaciones para los conjuntos de datos que no incluyen [0, 0]. Me tomé la libertad de resolver Y(X) = k + ax bx^2 + cx^3 + ϵ
en su lugar.
El SQL real sería algo así:
select
-- returns 1 row with columns labeled K, A, B and C = coefficients in 3rd order polynomial equation for the 4 sample points
-(X1*(X2p2*(X3p3*Y4-X4p3*Y3)+X2p3*(X4p2*Y3-X3p2*Y4)+(X3p2*X4p3-X3p3*X4p2)*Y2)+X1p2*(X2*(X4p3*Y3-X3p3*Y4)+X2p3*(X3*Y4-X4*Y3)+(X3p3*X4-X3*X4p3)*Y2)+X1p3*(X2*(X3p2*Y4-X4p2*Y3)+X2p2*(X4*Y3-X3*Y4)+(X3*X4p2-X3p2*X4)*Y2)+(X2*(X3p3*X4p2-X3p2*X4p3)+X2p2*(X3*X4p3-X3p3*X4)+X2p3*(X3p2*X4-X3*X4p2))*Y1)/(X1*(X2p2*(X4p3-X3p3)-X3p2*X4p3+X3p3*X4p2+X2p3*(X3p2-X4p2))+X2*(X3p2*X4p3-X3p3*X4p2)+X1p2*(X3*X4p3+X2*(X3p3-X4p3)+X2p3*(X4-X3)-X3p3*X4)+X2p2*(X3p3*X4-X3*X4p3)+X1p3*(X2*(X4p2-X3p2)-X3*X4p2+X3p2*X4+X2p2*(X3-X4))+X2p3*(X3*X4p2-X3p2*X4)) as k,
(X1p2*(X2p3*(Y4-Y3)-X3p3*Y4+X4p3*Y3+(X3p3-X4p3)*Y2)+X2p2*(X3p3*Y4-X4p3*Y3)+X1p3*(X3p2*Y4+X2p2*(Y3-Y4)-X4p2*Y3+(X4p2-X3p2)*Y2)+X2p3*(X4p2*Y3-X3p2*Y4)+(X3p2*X4p3-X3p3*X4p2)*Y2+(X2p2*(X4p3-X3p3)-X3p2*X4p3+X3p3*X4p2+X2p3*(X3p2-X4p2))*Y1)/(X1*(X2p2*(X4p3-X3p3)-X3p2*X4p3+X3p3*X4p2+X2p3*(X3p2-X4p2))+X2*(X3p2*X4p3-X3p3*X4p2)+X1p2*(X3*X4p3+X2*(X3p3-X4p3)+X2p3*(X4-X3)-X3p3*X4)+X2p2*(X3p3*X4-X3*X4p3)+X1p3*(X2*(X4p2-X3p2)-X3*X4p2+X3p2*X4+X2p2*(X3-X4))+X2p3*(X3*X4p2-X3p2*X4)) as a,
-(X1*(X2p3*(Y4-Y3)-X3p3*Y4+X4p3*Y3+(X3p3-X4p3)*Y2)+X2*(X3p3*Y4-X4p3*Y3)+X1p3*(X3*Y4+X2*(Y3-Y4)-X4*Y3+(X4-X3)*Y2)+X2p3*(X4*Y3-X3*Y4)+(X3*X4p3-X3p3*X4)*Y2+(X2*(X4p3-X3p3)-X3*X4p3+X3p3*X4+X2p3*(X3-X4))*Y1)/(X1*(X2p2*(X4p3-X3p3)-X3p2*X4p3+X3p3*X4p2+X2p3*(X3p2-X4p2))+X2*(X3p2*X4p3-X3p3*X4p2)+X1p2*(X3*X4p3+X2*(X3p3-X4p3)+X2p3*(X4-X3)-X3p3*X4)+X2p2*(X3p3*X4-X3*X4p3)+X1p3*(X2*(X4p2-X3p2)-X3*X4p2+X3p2*X4+X2p2*(X3-X4))+X2p3*(X3*X4p2-X3p2*X4)) as b,
(X1*(X2p2*(Y4-Y3)-X3p2*Y4+X4p2*Y3+(X3p2-X4p2)*Y2)+X2*(X3p2*Y4-X4p2*Y3)+X1p2*(X3*Y4+X2*(Y3-Y4)-X4*Y3+(X4-X3)*Y2)+X2p2*(X4*Y3-X3*Y4)+(X3*X4p2-X3p2*X4)*Y2+(X2*(X4p2-X3p2)-X3*X4p2+X3p2*X4+X2p2*(X3-X4))*Y1)/(X1*(X2p2*(X4p3-X3p3)-X3p2*X4p3+X3p3*X4p2+X2p3*(X3p2-X4p2))+X2*(X3p2*X4p3-X3p3*X4p2)+X1p2*(X3*X4p3+X2*(X3p3-X4p3)+X2p3*(X4-X3)-X3p3*X4)+X2p2*(X3p3*X4-X3*X4p3)+X1p3*(X2*(X4p2-X3p2)-X3*X4p2+X3p2*X4+X2p2*(X3-X4))+X2p3*(X3*X4p2-X3p2*X4)) as c
from (select
samples.*,
-- precomputing the powers should give better performance (at least i hope it would)
power(X1,2) X1p2, power(X2,2) X2p2, power(X3,2) X3p2, power(X4,2) X4p2,
power(Y1,3) Y1p3, power(Y2,3) Y2p3, power(Y3,3) Y3p3, power(Y4,3) Y4p3
from (select
avg(case when sector = 1 then x end) X1,
avg(case when sector = 2 then x end) X2,
avg(case when sector = 3 then x end) X3,
avg(case when sector = 4 then x end) X4,
avg(case when sector = 1 then y end) Y1,
avg(case when sector = 2 then y end) Y2,
avg(case when sector = 3 then y end) Y3,
avg(case when sector = 4 then y end) Y4
from (select x, y,
-- splitting to sectors 1 - 4 by row number (SQL Server version)
ceiling(row_number() OVER (ORDER BY x asc)/count(*) * 4) sector
from original_data
)
) samples
)
Según developer.mimer.com, estas características opcionales deben estar activados en SQL Server:
T611, "Elementary OLAP operations"
F591, "Derived tables"
Sería útil, si pudiera pegar la fórmula para la estimación de a, byc, porque, no todo el mundo aquí es bueno con las matemáticas ... –
Si es tarea, etiquétela. –
agregó la etiqueta [math] – zgpmax