2012-07-16 31 views
8

Tengo una consulta generada por SQLAlchemy ORM. Se supone que debe recuperar stream_items para un curso específico, junto con todas sus partes: recursos, bloques de texto de contenido, etc., y los usuarios que los publicaron. Sin embargo, esta consulta parece ser extremadamente lenta, tomando minutos en nuestra base de datos de producción con aproximadamente 20,000 usuarios en la base de datos, aproximadamente 25 stream_items para el curso y un par de bloques de texto de contenido por stream_item. Tenga en cuenta que hay muy pocos otros registros además de los usuarios en la base de datos porque importamos un grupo de usuarios pero muy poco contenido.¿Cómo puedo optimizar esta consulta producida por SQLAlchemy?

Editar: Tenga en cuenta que cada identificación de objeto es una clave externa en la tabla franklin_object.

He intentado mirar a la consulta, y se han identificado varios bits preocupantes (mirando la salida del EXPLAIN)

  1. Una de las operaciones de búsqueda es 'Usando temporal; Usando filesort '.
  2. La tabla de usuario es golpeado dos veces con ningún índice
  3. La tabla de contenido de bloque de texto es golpeado dos veces con ningún índice

Sin embargo, realmente no sé qué hacer con ellos, sobre todo los dos últimos cuestiones.

Ésta es la consulta:

SELECT stream_item.id        AS stream_item_id, 
     franklin_object.id       AS franklin_object_id, 
     franklin_object.type       AS franklin_object_type, 
     franklin_object.uuid       AS franklin_object_uuid, 
     stream_item.parent_id      AS stream_item_parent_id, 
     stream_item.shown_at       AS stream_item_shown_at, 
     stream_item.author_id      AS stream_item_author_id, 
     stream_item.stream_sort_at     AS stream_item_stream_sort_at, 
     stream_item.PUBLIC       AS stream_item_public, 
     stream_item.created_at      AS stream_item_created_at, 
     stream_item.updated_at      AS stream_item_updated_at, 
     anon_1.content_text_block_text    AS anon_1_content_text_block_text, 
     anon_2.resource_id       AS anon_2_resource_id, 
     anon_2.franklin_object_id     AS anon_2_franklin_object_id, 
     anon_2.franklin_object_type     AS anon_2_franklin_object_type, 
     anon_2.franklin_object_uuid     AS anon_2_franklin_object_uuid, 
     anon_2.resource_top_parent_resource   AS anon_2_resource_top_parent_resource, 
     anon_2.resource_top_parent_id    AS anon_2_resource_top_parent_id, 
     anon_2.resource_title      AS anon_2_resource_title, 
     anon_2.resource_url       AS anon_2_resource_url, 
     anon_2.resource_image      AS anon_2_resource_image, 
     anon_2.resource_created_at     AS anon_2_resource_created_at, 
     anon_2.resource_updated_at     AS anon_2_resource_updated_at, 
     franklin_object_1.id       AS franklin_object_1_id, 
     franklin_object_1.type      AS franklin_object_1_type, 
     franklin_object_1.uuid      AS franklin_object_1_uuid, 
     anon_1.content_text_block_id     AS anon_1_content_text_block_id, 
     anon_1.franklin_object_id     AS anon_1_franklin_object_id, 
     anon_1.franklin_object_type     AS anon_1_franklin_object_type, 
     anon_1.franklin_object_uuid     AS anon_1_franklin_object_uuid, 
     anon_1.content_text_block_position   AS anon_1_content_text_block_position, 
     anon_1.content_text_block_franklin_object_id AS anon_1_content_text_block_franklin_object_id, 
     anon_1.content_text_block_created_at   AS anon_1_content_text_block_created_at, 
     anon_1.content_text_block_updated_at   AS anon_1_content_text_block_updated_at, 
     anon_3.user_password       AS anon_3_user_password, 
     anon_3.user_auth_token      AS anon_3_user_auth_token, 
     anon_3.user_id        AS anon_3_user_id, 
     anon_3.franklin_object_id     AS anon_3_franklin_object_id, 
     anon_3.franklin_object_type     AS anon_3_franklin_object_type, 
     anon_3.franklin_object_uuid     AS anon_3_franklin_object_uuid, 
     anon_3.user_email       AS anon_3_user_email, 
     anon_3.user_auth_token_expiration   AS anon_3_user_auth_token_expiration, 
     anon_3.user_active       AS anon_3_user_active, 
     anon_3.user_activation_token     AS anon_3_user_activation_token, 
     anon_3.user_first_name      AS anon_3_user_first_name, 
     anon_3.user_last_name      AS anon_3_user_last_name, 
     anon_3.user_image       AS anon_3_user_image, 
     anon_3.user_bio        AS anon_3_user_bio, 
     anon_3.user_aspirations      AS anon_3_user_aspirations, 
     anon_3.user_website       AS anon_3_user_website, 
     anon_3.user_resume       AS anon_3_user_resume, 
     anon_3.user_resume_name      AS anon_3_user_resume_name, 
     anon_3.user_primary_role      AS anon_3_user_primary_role, 
     anon_3.user_institution_id     AS anon_3_user_institution_id, 
     anon_3.user_birth_date      AS anon_3_user_birth_date, 
     anon_3.user_gender       AS anon_3_user_gender, 
     anon_3.user_graduation_year     AS anon_3_user_graduation_year, 
     anon_3.user_complete       AS anon_3_user_complete, 
     anon_3.user_masthead_y_position    AS anon_3_user_masthead_y_position, 
     anon_3.user_masthead       AS anon_3_user_masthead, 
     anon_3.user_fb_access_token     AS anon_3_user_fb_access_token, 
     anon_3.user_fb_user_id      AS anon_3_user_fb_user_id, 
     anon_3.user_location       AS anon_3_user_location, 
     anon_3.user_created_at      AS anon_3_user_created_at, 
     anon_3.user_updated_at      AS anon_3_user_updated_at, 
     anon_4.content_text_block_text    AS anon_4_content_text_block_text, 
     anon_4.content_text_block_id     AS anon_4_content_text_block_id, 
     anon_4.franklin_object_id     AS anon_4_franklin_object_id, 
     anon_4.franklin_object_type     AS anon_4_franklin_object_type, 
     anon_4.franklin_object_uuid     AS anon_4_franklin_object_uuid, 
     anon_4.content_text_block_position   AS anon_4_content_text_block_position, 
     anon_4.content_text_block_franklin_object_id AS anon_4_content_text_block_franklin_object_id, 
     anon_4.content_text_block_created_at   AS anon_4_content_text_block_created_at, 
     anon_4.content_text_block_updated_at   AS anon_4_content_text_block_updated_at, 
     anon_5.user_password       AS anon_5_user_password, 
     anon_5.user_auth_token      AS anon_5_user_auth_token, 
     anon_5.user_id        AS anon_5_user_id, 
     anon_5.franklin_object_id     AS anon_5_franklin_object_id, 
     anon_5.franklin_object_type     AS anon_5_franklin_object_type, 
     anon_5.franklin_object_uuid     AS anon_5_franklin_object_uuid, 
     anon_5.user_email       AS anon_5_user_email, 
     anon_5.user_auth_token_expiration   AS anon_5_user_auth_token_expiration, 
     anon_5.user_active       AS anon_5_user_active, 
     anon_5.user_activation_token     AS anon_5_user_activation_token, 
     anon_5.user_first_name      AS anon_5_user_first_name, 
     anon_5.user_last_name      AS anon_5_user_last_name, 
     anon_5.user_image       AS anon_5_user_image, 
     anon_5.user_bio        AS anon_5_user_bio, 
     anon_5.user_aspirations      AS anon_5_user_aspirations, 
     anon_5.user_website       AS anon_5_user_website, 
     anon_5.user_resume       AS anon_5_user_resume, 
     anon_5.user_resume_name      AS anon_5_user_resume_name, 
     anon_5.user_primary_role      AS anon_5_user_primary_role, 
     anon_5.user_institution_id     AS anon_5_user_institution_id, 
     anon_5.user_birth_date      AS anon_5_user_birth_date, 
     anon_5.user_gender       AS anon_5_user_gender, 
     anon_5.user_graduation_year     AS anon_5_user_graduation_year, 
     anon_5.user_complete       AS anon_5_user_complete, 
     anon_5.user_masthead_y_position    AS anon_5_user_masthead_y_position, 
     anon_5.user_masthead       AS anon_5_user_masthead, 
     anon_5.user_fb_access_token     AS anon_5_user_fb_access_token, 
     anon_5.user_fb_user_id      AS anon_5_user_fb_user_id, 
     anon_5.user_location       AS anon_5_user_location, 
     anon_5.user_created_at      AS anon_5_user_created_at, 
     anon_5.user_updated_at      AS anon_5_user_updated_at, 
     anon_6.stream_item_id      AS anon_6_stream_item_id, 
     anon_6.franklin_object_id     AS anon_6_franklin_object_id, 
     anon_6.franklin_object_type     AS anon_6_franklin_object_type, 
     anon_6.franklin_object_uuid     AS anon_6_franklin_object_uuid, 
     anon_6.stream_item_parent_id     AS anon_6_stream_item_parent_id, 
     anon_6.stream_item_shown_at     AS anon_6_stream_item_shown_at, 
     anon_6.stream_item_author_id     AS anon_6_stream_item_author_id, 
     anon_6.stream_item_stream_sort_at   AS anon_6_stream_item_stream_sort_at, 
     anon_6.stream_item_public     AS anon_6_stream_item_public, 
     anon_6.stream_item_created_at    AS anon_6_stream_item_created_at, 
     anon_6.stream_item_updated_at    AS anon_6_stream_item_updated_at 
FROM franklin_object 
     INNER JOIN stream_item 
       ON franklin_object.id = stream_item.id 
     INNER JOIN (SELECT franklin_object.id     AS franklin_object_id, 
          franklin_object.type     AS franklin_object_type, 
          franklin_object.uuid     AS franklin_object_uuid, 
          content_text_block.id     AS content_text_block_id, 
          content_text_block.text    AS content_text_block_text, 
          content_text_block.position   AS content_text_block_position, 
          content_text_block.franklin_object_id AS content_text_block_franklin_object_id, 
          content_text_block.created_at   AS content_text_block_created_at, 
          content_text_block.updated_at   AS content_text_block_updated_at 
        FROM franklin_object 
          INNER JOIN content_text_block 
            ON franklin_object.id = content_text_block.id) AS anon_1 
       ON stream_item.id = anon_1.content_text_block_franklin_object_id 
     LEFT OUTER JOIN contents_resources AS contents_resources_1 
        ON anon_1.content_text_block_id = contents_resources_1.content_id 
     LEFT OUTER JOIN (SELECT franklin_object.id   AS franklin_object_id, 
           franklin_object.type   AS franklin_object_type, 
           franklin_object.uuid   AS franklin_object_uuid, 
           resource.id     AS resource_id, 
           resource.top_parent_resource AS resource_top_parent_resource, 
           resource.top_parent_id  AS resource_top_parent_id, 
           resource.title    AS resource_title, 
           resource.url     AS resource_url, 
           resource.image    AS resource_image, 
           resource.created_at   AS resource_created_at, 
           resource.updated_at   AS resource_updated_at 
         FROM franklin_object 
           INNER JOIN resource 
             ON franklin_object.id = resource.id) AS anon_2 
        ON anon_2.resource_id = contents_resources_1.resource_id 
     LEFT OUTER JOIN contents_franklin_objects AS contents_franklin_objects_1 
        ON anon_1.content_text_block_id = contents_franklin_objects_1.content_id 
     LEFT OUTER JOIN franklin_object AS franklin_object_1 
        ON franklin_object_1.id = contents_franklin_objects_1.franklin_object_id 
     LEFT OUTER JOIN likers AS likers_1 
        ON stream_item.id = likers_1.post_id 
     LEFT OUTER JOIN (SELECT franklin_object.id   AS franklin_object_id, 
           franklin_object.type  AS franklin_object_type, 
           franklin_object.uuid  AS franklin_object_uuid, 
           USER.id     AS user_id, 
           USER.email     AS user_email, 
           USER.password    AS user_password, 
           USER.auth_token   AS user_auth_token, 
           USER.auth_token_expiration AS user_auth_token_expiration, 
           USER.active    AS user_active, 
           USER.activation_token  AS user_activation_token, 
           USER.first_name   AS user_first_name, 
           USER.last_name    AS user_last_name, 
           USER.image     AS user_image, 
           USER.bio     AS user_bio, 
           USER.aspirations   AS user_aspirations, 
           USER.website    AS user_website, 
           USER.resume    AS user_resume, 
           USER.resume_name   AS user_resume_name, 
           USER.primary_role   AS user_primary_role, 
           USER.institution_id  AS user_institution_id, 
           USER.birth_date   AS user_birth_date, 
           USER.gender    AS user_gender, 
           USER.graduation_year  AS user_graduation_year, 
           USER.complete    AS user_complete, 
           USER.masthead_y_position AS user_masthead_y_position, 
           USER.masthead    AS user_masthead, 
           USER.fb_access_token  AS user_fb_access_token, 
           USER.fb_user_id   AS user_fb_user_id, 
           USER.location    AS user_location, 
           USER.created_at   AS user_created_at, 
           USER.updated_at   AS user_updated_at 
         FROM franklin_object 
           INNER JOIN USER 
             ON franklin_object.id = USER.id) AS anon_3 
        ON anon_3.user_id = likers_1.user_id 
     LEFT OUTER JOIN contents_franklin_objects AS contents_franklin_objects_2 
        ON franklin_object.id = contents_franklin_objects_2.franklin_object_id 
     LEFT OUTER JOIN (SELECT franklin_object.id     AS franklin_object_id, 
           franklin_object.type     AS franklin_object_type, 
           franklin_object.uuid     AS franklin_object_uuid, 
           content_text_block.id     AS content_text_block_id, 
           content_text_block.text    AS content_text_block_text, 
           content_text_block.position   AS content_text_block_position, 
           content_text_block.franklin_object_id AS content_text_block_franklin_object_id, 
           content_text_block.created_at   AS content_text_block_created_at, 
           content_text_block.updated_at   AS content_text_block_updated_at 
         FROM franklin_object 
           INNER JOIN content_text_block 
             ON franklin_object.id = content_text_block.id) AS anon_4 
        ON anon_4.content_text_block_id = contents_franklin_objects_2.content_id 
     LEFT OUTER JOIN (SELECT franklin_object.id   AS franklin_object_id, 
           franklin_object.type  AS franklin_object_type, 
           franklin_object.uuid  AS franklin_object_uuid, 
           stream_item.id    AS stream_item_id, 
           stream_item.parent_id  AS stream_item_parent_id, 
           stream_item.shown_at  AS stream_item_shown_at, 
           stream_item.author_id  AS stream_item_author_id, 
           stream_item.stream_sort_at AS stream_item_stream_sort_at, 
           stream_item.PUBLIC   AS stream_item_public, 
           stream_item.created_at  AS stream_item_created_at, 
           stream_item.updated_at  AS stream_item_updated_at 
         FROM franklin_object 
           INNER JOIN stream_item 
             ON franklin_object.id = stream_item.id) AS anon_6 
        ON anon_6.stream_item_parent_id = franklin_object.id 
     LEFT OUTER JOIN likers AS likers_2 
        ON anon_6.stream_item_id = likers_2.post_id 
     LEFT OUTER JOIN (SELECT franklin_object.id   AS franklin_object_id, 
           franklin_object.type  AS franklin_object_type, 
           franklin_object.uuid  AS franklin_object_uuid, 
           USER.id     AS user_id, 
           USER.email     AS user_email, 
           USER.password    AS user_password, 
           USER.auth_token   AS user_auth_token, 
           USER.auth_token_expiration AS user_auth_token_expiration, 
           USER.active    AS user_active, 
           USER.activation_token  AS user_activation_token, 
           USER.first_name   AS user_first_name, 
           USER.last_name    AS user_last_name, 
           USER.image     AS user_image, 
           USER.bio     AS user_bio, 
           USER.aspirations   AS user_aspirations, 
           USER.website    AS user_website, 
           USER.resume    AS user_resume, 
           USER.resume_name   AS user_resume_name, 
           USER.primary_role   AS user_primary_role, 
           USER.institution_id  AS user_institution_id, 
           USER.birth_date   AS user_birth_date, 
           USER.gender    AS user_gender, 
           USER.graduation_year  AS user_graduation_year, 
           USER.complete    AS user_complete, 
           USER.masthead_y_position AS user_masthead_y_position, 
           USER.masthead    AS user_masthead, 
           USER.fb_access_token  AS user_fb_access_token, 
           USER.fb_user_id   AS user_fb_user_id, 
           USER.location    AS user_location, 
           USER.created_at   AS user_created_at, 
           USER.updated_at   AS user_updated_at 
         FROM franklin_object 
           INNER JOIN USER 
             ON franklin_object.id = USER.id) AS anon_5 
        ON anon_5.user_id = likers_2.user_id 
WHERE stream_item.parent_id = 11 
ORDER BY stream_item.stream_sort_at DESC, 
      anon_1.content_text_block_position, 
      anon_6.stream_item_stream_sort_at DESC 

Y la salida de explicar:

ID SELECT_TYPE TABLE POSSIBLY_KEYS KEY KEY_LEN REF ROWS EXTRA 
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 599 Using  temporary; Using filesort 
1 PRIMARY stream_item eq_ref PRIMARY,parent_id PRIMARY 4 anon_1.content_text_block_franklin_object_id 1 Using where 
1 PRIMARY contents_resources_1 ref content_id content_id 5 anon_1.content_text_block_id 2 
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 7 
1 PRIMARY contents_franklin_objects_1 ref content_id content_id 5 anon_1.content_text_block_id 1 
1 PRIMARY franklin_object eq_ref PRIMARY PRIMARY 4 franklin.stream_item.id 1 Using where 
1 PRIMARY franklin_object_1 eq_ref PRIMARY PRIMARY 4 franklin.contents_franklin_objects_1.franklin_object_id 1 
1 PRIMARY likers_1 ref post_id post_id 5 franklin.stream_item.id 1 
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 136 
1 PRIMARY contents_franklin_objects_2 ref franklin_object_id franklin_object_id 5 franklin.stream_item.id 1 
1 PRIMARY <derived5> ALL NULL NULL NULL NULL 599 
1 PRIMARY <derived6> ALL NULL NULL NULL NULL 608 
1 PRIMARY likers_2 ref post_id post_id 5 anon_6.stream_item_id 1 
1 PRIMARY <derived7> ALL NULL NULL NULL NULL 136 
7 DERIVED user ALL PRIMARY NULL NULL NULL 133 
7 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.user.id 1 
6 DERIVED stream_item ALL PRIMARY NULL NULL NULL 709 
6 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.stream_item.id 1 
5 DERIVED content_text_block ALL PRIMARY NULL NULL NULL 666 
5 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.content_text_block.id  1 
4 DERIVED user ALL PRIMARY NULL NULL NULL 133 
4 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.user.id 1 
3 DERIVED resource ALL PRIMARY NULL NULL NULL 7 
3 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.resource.id 1 
2 DERIVED content_text_block ALL PRIMARY NULL NULL NULL 666 
2 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.content_text_block.id 1 

¿Cómo puedo reducir el TODAS las consultas a algo más rápido? ¿De qué otras maneras puedo acelerar esto?

¿Es la forma en que los objetos de franklin se configuran como antipatrón? La forma en que funciona es que la tabla franklin_object tiene dos columnas: id y type. Luego, cada tipo es una tabla, con una clave principal que es una clave externa en franklin_object.

El código que genera el SQL es algo a lo largo de las líneas de:

stream_item_query = StreamItem.query.options(db.joinedload('stream_items'),db.joinedload('contents_included_in'),db.joinedload('contents.resources'),db.joinedload('contents.objects'),db.subqueryload('likers'))

stream_items = stream_item_query.filter(StreamItem.parent_id == community_id).order_by(db.desc(StreamItem.stream_sort_at)).all()

+0

Agregó el código de orm anterior. –

+0

¿Están sus clases asignadas a tablas o para seleccionar statemnts en varias tablas? las uniones son un poco raras 'l join (select * from a join b) r' en lugar de lo que esperaría,' l join b join a' – SingleNegationElimination

+0

Cada clase hereda (herencia unida) del objeto de franklin. –

Respuesta

4

Vaya, éste daño a mi cerebro un poco. Intentar averiguar qué está haciendo la consulta, qué son todas las tablas y las relaciones fueron tediosas. Si tiene una experiencia similar, que sea la primera pista de que probablemente está tratando de hacer demasiado en esta única consulta.

Mi sugerencia es repensar todo su enfoque.

SQLAlchemy es una herramienta muy buena, y no voy a bash (o su elección de mysql), pero como con la mayoría de las herramientas ORM, debe considerar los costos con su uso. Un ejemplo es este negocio de mesa franklin_object. ¿Es esto un antipatrón? Sí y No. Tiene sentido desde una perspectiva puramente OO. Puede determinar qué tablas consultar consultando un id en esta tabla. Desde una perspectiva de consulta relacional, tiene muy poco sentido. Podría eliminar cada instancia de franklin_object de su consulta y no perder nada más que ... las columnas de franklin_object. Si esa es una opción viable, lo haría de inmediato.

Examinemos este enlace con franklin_object más.En cuanto a las sub-consultas, todos ellos tienen la misma forma:

SELECT franklin_object.id   AS franklin_object_id, 
     franklin_object.type   AS franklin_object_type, 
     franklin_object.uuid   AS franklin_object_uuid, 
     linked_table.id    AS linked_table_id, 
     linked_table.col2   AS col2 --and more 
    FROM franklin_object 
    INNER JOIN linked_table 
     ON franklin_object.id = linked_table.id) AS anon_n 

No hay mucha información de la base de datos para seguir adelante en cuanto a cómo optimizar esta parte de la consulta, independientemente de las estadísticas. Quizás si se limitara franklin_object al especificar type en una cláusula where, la consulta sería mejor. Tal vez.

Esto es especialmente problemático con la tabla USER, ya que esta tabla tiene muchos registros (por lo que dices). Como está consultando la mayoría de las columnas y el optimizador no puede determinar con precisión cuántas filas se recuperarán, tiene sentido que se realice un escaneo completo de la tabla. En tu caso, dos veces.

Otro aspecto es la gran cantidad de combinaciones implicadas. Si sacamos todas las referencias de franklin_object, todavía hay 11 uniones. Eso no es terrible, si su modelo de datos era más relacional, pero no lo es. La consulta generada no proporciona mucha ayuda a la base de datos para determinar la mejor manera de realizar la consulta, por lo que no funciona bien. Tal vez podrías mitigar esto con pistas y demás, pero apuesto a que esto te morderá en el largo plazo.

Está utilizando una herramienta ORM, por lo que realmente usa. No ganas nada al hacer una consulta tan grande, todo de una vez. Podría dividirse un poco por el rendimiento. Realice recuperaciones perezosas para evitar consultas enormes y complicadas. Yo diría que lo intentéis, solo para ver cómo va, para hacer todo perezosamente. El rendimiento probablemente estará bien, yo diría mejor. No es genial, probablemente ni siquiera sea aceptable, pero es mejor que poder tomar un café mientras la base de datos se agita.

A continuación, empiece a unir las piezas en fragmentos más simplificados. Ate los objetos que lógicamente tienen sentido, como resource y contents_resources. Otro ejemplo, la conexión entre stream_item, likers y user está duplicada. Haga esa consulta y deje que SQLAlchemy haga su trabajo.

Como último recurso, podría implementarse algún tipo de mecanismo de caché. Tal vez desnormalizar las tablas en alguna parte. En un sistema lento y de lectura lenta, puede hacer que estas tablas se alimenten en otra estructura donde las consultas sean directas y rápidas. Es decir, hacer el procesamiento por adelantado y almacenarlo en una sola tabla.

Buena suerte

Cuestiones relacionadas