Necesito un gurú SQL que me ayude a acelerar mi consulta.La consulta MySQL con subconsulta dependiente lleva demasiado tiempo
Tengo 2 tablas, cantidades y precios. cantidades registra un valor de cantidad entre 2 marcas de tiempo, con 15 minutos de diferencia. los precios registran un precio para una marca de tiempo dada, para un tipo de precio dado y hay un registro de precio 5 por cada 5 minutos.
Necesito 2 calcular el precio total de cada período, p. hora o día, entre dos marcas de tiempo. Esto se calcula por la suma de (cantidad multiplicada por el promedio de los 3 precios en la ventana de cantidad de 15 minutos) en cada período.
Por ejemplo, supongamos que quiero ver el precio total cada hora durante 1 día. El valor del precio total en cada fila en el conjunto de resultados es la suma de los precios totales para cada uno de los cuatro períodos de 15 minutos en esa hora. Y el precio total para cada período de 15 minutos se calcula multiplicando el valor de la cantidad en ese período por el promedio de los 3 precios (uno por cada 5 minutos) en el período de esa cantidad.
Esto es la consulta que estoy usando, y los resultados:
SELECT
MIN(`quantities`.`start_timestamp`) AS `start`,
MAX(`quantities`.`end_timestamp`) AS `end`,
SUM(`quantities`.`quantity` * (
SELECT AVG(`prices`.`price`)
FROM `prices`
WHERE `prices`.`timestamp` >= `quantities`.`start_timestamp`
AND `prices`.`timestamp` < `quantities`.`end_timestamp`
AND `prices`.`type_id` = 1
)) AS total
FROM `quantities`
WHERE `quantities`.`start_timestamp` >= '2010-07-01 00:00:00'
AND `quantities`.`start_timestamp` < '2010-07-02 00:00:00'
GROUP BY HOUR( `quantities`.`start_timestamp`);
+---------------------+---------------------+----------+
| start | end | total |
+---------------------+---------------------+----------+
| 2010-07-01 00:00:00 | 2010-07-01 01:00:00 | 0.677733 |
| 2010-07-01 01:00:00 | 2010-07-01 02:00:00 | 0.749133 |
| 2010-07-01 02:00:00 | 2010-07-01 03:00:00 | 0.835467 |
| 2010-07-01 03:00:00 | 2010-07-01 04:00:00 | 0.692233 |
| 2010-07-01 04:00:00 | 2010-07-01 05:00:00 | 0.389533 |
| 2010-07-01 05:00:00 | 2010-07-01 06:00:00 | 0.335300 |
| 2010-07-01 06:00:00 | 2010-07-01 07:00:00 | 1.231467 |
| 2010-07-01 07:00:00 | 2010-07-01 08:00:00 | 0.352800 |
| 2010-07-01 08:00:00 | 2010-07-01 09:00:00 | 1.447200 |
| 2010-07-01 09:00:00 | 2010-07-01 10:00:00 | 0.756733 |
| 2010-07-01 10:00:00 | 2010-07-01 11:00:00 | 0.599467 |
| 2010-07-01 11:00:00 | 2010-07-01 12:00:00 | 1.056467 |
| 2010-07-01 12:00:00 | 2010-07-01 13:00:00 | 1.252600 |
| 2010-07-01 13:00:00 | 2010-07-01 14:00:00 | 1.285567 |
| 2010-07-01 14:00:00 | 2010-07-01 15:00:00 | 0.442933 |
| 2010-07-01 15:00:00 | 2010-07-01 16:00:00 | 0.692567 |
| 2010-07-01 16:00:00 | 2010-07-01 17:00:00 | 1.281067 |
| 2010-07-01 17:00:00 | 2010-07-01 18:00:00 | 0.652033 |
| 2010-07-01 18:00:00 | 2010-07-01 19:00:00 | 1.721900 |
| 2010-07-01 19:00:00 | 2010-07-01 20:00:00 | 1.362400 |
| 2010-07-01 20:00:00 | 2010-07-01 21:00:00 | 1.099300 |
| 2010-07-01 21:00:00 | 2010-07-01 22:00:00 | 0.646267 |
| 2010-07-01 22:00:00 | 2010-07-01 23:00:00 | 0.873100 |
| 2010-07-01 23:00:00 | 2010-07-02 00:00:00 | 0.546533 |
+---------------------+---------------------+----------+
24 rows in set (5.16 sec)
necesito la consulta que correr mucho más rápido que esto, y no tendría sin embargo sería posible. He aquí los resultados de EXPLAIN EXTENDIDO ...
+----+--------------------+------------+-------+-------------------+-----------------+---------+-------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+-------+-------------------+-----------------+---------+-------+-------+----------------------------------------------+
| 1 | PRIMARY | quantities | range | start_timestamp | start_timestamp | 8 | NULL | 89 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | prices | ref | timestamp,type_id | type_id | 4 | const | 22930 | Using where |
+----+--------------------+------------+-------+-------------------+-----------------+---------+-------+-------+----------------------------------------------+
2 rows in set, 3 warnings (0.00 sec)
me di cuenta de la sub consulta dependiente no está utilizando el campo de marca de tiempo en la clave y la consulta está escaneando un montón de filas.
¿Alguien puede ayudarme a hacer que esto funcione mucho más rápido?
Estas son las sentencias SQL necesarias para crear el esquema y llenarlo con una gran cantidad de datos (2) meses de valor
# Create prices table
CREATE TABLE `prices` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`timestamp` datetime NOT NULL,
`type_id` int(11) NOT NULL,
`price` float(8,2) NOT NULL,
PRIMARY KEY (`id`),
KEY `timestamp` (`timestamp`),
KEY `type_id` (`type_id`)
) ENGINE=MyISAM;
# Create quantities table
CREATE TABLE `quantities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`start_timestamp` datetime NOT NULL,
`end_timestamp` datetime NOT NULL,
`quantity` float(7,2) NOT NULL,
PRIMARY KEY (`id`),
KEY `start_timestamp` (`start_timestamp`),
KEY `end_timestamp` (`end_timestamp`)
) ENGINE=MyISAM;
# Insert first 2 rows into prices, one for each of 2 types, starting 64 days ago
INSERT INTO `prices` (`id`, `timestamp`, `type_id`, `price`) VALUES
(NULL, DATE_SUB(CURDATE(), INTERVAL 64 DAY), '1', RAND()),
(NULL, DATE_SUB(CURDATE(), INTERVAL 64 DAY), '2', RAND());
# Fill the prices table with a record for each type, for every 5 minutes, for the next 64 days
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 32 DAY), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 16 DAY), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 8 DAY), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 4 DAY), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 2 DAY), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 1 DAY), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 12 HOUR), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 6 HOUR), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 3 HOUR), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 90 MINUTE), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 45 MINUTE), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 20 MINUTE), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 10 MINUTE), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 5 MINUTE), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_SUB(`timestamp`, INTERVAL 5 MINUTE), `type_id`, RAND() FROM prices WHERE MOD((TIME_TO_SEC(`timestamp`) - TIME_TO_SEC(CONCAT(DATE_SUB(CURDATE(), INTERVAL 64 DAY), ' 00:00:00'))), 45 *60) = 0 AND `timestamp` > CONCAT(DATE_SUB(CURDATE(), INTERVAL 64 DAY), ' 00:00:00');
# Insert first row into quantities, start timestamp is 64 days ago, end timestamp is start timestamp plus 15 minutes
INSERT INTO `quantities` (`id`, `start_timestamp`, `end_timestamp`, `quantity`) VALUES (NULL, DATE_SUB(CURDATE(), INTERVAL 64 DAY), DATE_SUB(CURDATE(), INTERVAL '63 23:45' DAY_MINUTE), RAND());
# Fill the quantities table with a record for each 15 minute period for the next 64 days
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 32 DAY), DATE_ADD(`end_timestamp`, INTERVAL 32 DAY), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 16 DAY), DATE_ADD(`end_timestamp`, INTERVAL 16 DAY), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 8 DAY), DATE_ADD(`end_timestamp`, INTERVAL 8 DAY), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 4 DAY), DATE_ADD(`end_timestamp`, INTERVAL 4 DAY), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 2 DAY), DATE_ADD(`end_timestamp`, INTERVAL 2 DAY), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 1 DAY), DATE_ADD(`end_timestamp`, INTERVAL 1 DAY), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 12 HOUR), DATE_ADD(`end_timestamp`, INTERVAL 12 HOUR), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 6 HOUR), DATE_ADD(`end_timestamp`, INTERVAL 6 HOUR), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 3 HOUR), DATE_ADD(`end_timestamp`, INTERVAL 3 HOUR), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 90 MINUTE), DATE_ADD(`end_timestamp`, INTERVAL 90 MINUTE), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 45 MINUTE), DATE_ADD(`end_timestamp`, INTERVAL 45 MINUTE), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 15 MINUTE), DATE_ADD(`end_timestamp`, INTERVAL 15 MINUTE), RAND() FROM quantities;
INSERT INTO quantities (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_SUB(`start_timestamp`, INTERVAL 15 MINUTE), DATE_SUB(`end_timestamp`, INTERVAL 15 MINUTE), RAND() FROM quantities WHERE MOD((TIME_TO_SEC(`start_timestamp`) - TIME_TO_SEC(CONCAT(DATE_SUB(CURDATE(), INTERVAL 64 DAY), ' 00:00:00'))), 45 * 60) = 0 AND `start_timestamp` > CONCAT(DATE_SUB(CURDATE(), INTERVAL 64 DAY), ' 00:00:00');
+1 para DDL y datos de muestra – Unreason
La única forma de mejorar el rendimiento es volver a escribir la subconsulta como JOIN. – Naktibalda
Creo que eso es lo que JochenJung también recomendó a continuación, pero estamos teniendo problemas para que genere los resultados correctos y todavía nos está tomando el mismo tiempo. ¿Tienes alguna idea Naktibalda? – neilcrookes