这是我的第一篇文章....
我有一个查询所花的时间比我想要的要长(不是我们所有人!)取决于我在WHERE子句中输入的内容……它可能运行得更快。我试图理解为什么查询计划与众不同,并且我可以做些什么来加快整个查询的速度。
这是查询1:
SELECT date_observed, base_value FROM device_read_data WHERE fk_device_rw_id IN (SELECT fk_device_rw_id FROM equipment_set_rw WHERE fk_equipment_set_id = CAST('ed151028-1fc0-11e3-b79f-47c0fd87d2b4' AS uuid)) AND date_observed BETWEEN '2013-12-01 07:45:00+00'::timestamptz AND '2014-01-01 07:59:59+00'::timestamptz AND base_value ~ '[0-9]+(\.[0-9]+)?' ;
这是查询计划1:
"Hash Semi Join (cost=11.65..5640243.59 rows=92194 " Hash Cond: (device_read_data.fk_device_rw_id = equipment_set_rw.fk_device_rw_id)" " -> Seq Scan on device_read_data (cost=0.00..5449563.56 rows=72157042 " Filter: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone) AND ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text))" " Rows Removed by Filter: 82135660" " -> Hash (cost=11.61..11.61 rows=3 " Buckets: 1024 Batches: 1 Memory Usage: 1kB" " -> Bitmap Heap Scan on equipment_set_rw (cost=4.27..11.61 rows=3 " Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)" " -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=3 " Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)" "Total runtime: 132530.290 ms"
这是查询2:
SELECT date_observed, base_value FROM device_read_data WHERE fk_device_rw_id IN (SELECT fk_device_rw_id FROM equipment_set_rw WHERE fk_equipment_set_id = CAST('ed151028-1fc0-11e3-b79f-47c0fd87d2b4' AS uuid)) AND date_observed BETWEEN '2014-01-01 07:45:00+00'::timestamptz AND '2014-02-01 07:59:59+00'::timestamptz AND base_value ~ '[0-9]+(\.[0-9]+)?' ;
这是查询计划2:
"Nested Loop (cost=4.27..1869543.46 rows=20391 " -> Bitmap Heap Scan on equipment_set_rw (cost=4.27..9.73 rows=2 " Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)" " -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=2 " Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)" " -> Index Scan using idx_device_read_data_date_observed_fk_device_rw_id on device_read_data (cost=0.00..934664.91 rows=10195 " Index Cond: ((date_observed >= '2014-01-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-02-01 07:59:59+00'::timestamp with time zone) AND (fk_device_rw_id = equipment_set_rw.fk_device_rw_id))" " Filter: ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text)" "Total runtime: 2055.068 ms"
我只在Where子句中更改了Date Range。您可以看到,在查询1中,表上有一个Seq扫描,而在查询2中是索引扫描。
我正在尝试确定是什么原因造成的,但似乎找不到答案。
附加信息
在(date_observed,fk_device_rw_id)上有一个复合索引
此表上永远不会有任何删除。不需要自动真空。
我还是用真空吸尘器清理了桌子....但是没有效果。
我已经在这张桌子上重建了索引
我已经分析了这张桌子
该系统是Prod的副本,目前处于闲置状态
系统信息
在Linux上运行Postgres 9.2
16GB系统内存
Shared_Buffers设置为4GB
我还能提供什么其他信息?我确定有些事情我遗漏了。
谢谢你的帮助。
编辑1我试过了:设置enable_seqscan = false
以下是解释计划结果:
"Hash Semi Join (cost=2566484.50..7008502.81 rows=92194 " Hash Cond: (device_read_data.fk_device_rw_id = equipment_set_rw.fk_device_rw_id)" " -> Bitmap Heap Scan on device_read_data (cost=2566472.85..6817822.78 rows=72157042 " Recheck Cond: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone))" " Rows Removed by Index Recheck: 2102" " Filter: ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text)" " Rows Removed by Filter: 12265137" " -> Bitmap Index Scan on idx_device_read_data_date_observed_fk_device_rw_id (cost=0.00..2548433.59 rows=85430682 " Index Cond: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone))" " -> Hash (cost=11.61..11.61 rows=3 " Buckets: 1024 Batches: 1 Memory Usage: 1kB" " -> Bitmap Heap Scan on equipment_set_rw (cost=4.27..11.61 rows=3 " Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)" " -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=3 " Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)" "Total runtime: 182244.181 ms"
如预期的那样,查询花费了更长的时间。是否也有记录可以使此操作更快?
我有什么选择?
谢谢。
编辑2我尝试了重写方法。恐怕结果与原始结果相似。这是查询计划:
"Hash Join (cost=11.65..6013386.19 rows=90835 " Hash Cond: (a.fk_device_rw_id = b.fk_device_rw_id)" " -> Seq Scan on device_read_data a (cost=0.00..5565898.74 rows=71450793 " Filter: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone) AND ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text))" " Rows Removed by Filter: 85426425" " -> Hash (cost=11.61..11.61 rows=3 " Buckets: 1024 Batches: 1 Memory Usage: 1kB" " -> Bitmap Heap Scan on equipment_set_rw b (cost=4.27..11.61 rows=3 " Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)" " -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=3 " Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)" "Total runtime: 127992.849 ms"
这似乎是一个简单的问题。从表中返回属于特定日期范围的记录。考虑到我现有的系统架构,在性能受到不利影响之前,表中可能存在多少条记录存在一个阈值。
除非有其他建议,否则我可能需要采用分区方法。
到目前为止,感谢您的帮助!