postgresql 13.1 insert into select并行查询的实现

本文信息基于PG13.1。

从PG9.6开始支持并行查询。PG11开始支持CREATE TABLE … AS、SELECT INTO以及CREATE MATERIALIZED VIEW的并行查询。

先说结论:

换用create table as 或者select into或者导入导出。

首先跟踪如下查询语句的执行计划:

select count(*) from test t1,test1 t2 where t1.id = t2.id ;
postgres=# explain analyze select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                  QUERY PLAN
-------------------------------------------------------------------------------------------
Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=683.246..715.324 rows=1 loops=1)
 -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=681.474..715.311 rows=3 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=674.689..675.285 rows=1 loops=3)
    -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=447.799..645.689 rows=333333 loops=3)
      Hash Cond: (t1.id = t2.id)
      -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.025..74.010 rows=333333 loops=3)
      -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=260.052..260.053 rows=333333 loops=3)
       Buckets: 131072 Batches: 16 Memory Usage: 3520kB
       -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.032..104.804 rows=333333 loops=3)
 Planning Time: 0.420 ms
 Execution Time: 715.447 ms
(13 rows)

可以看到走了两个Workers。

下边看一下insert into select:

postgres=# explain analyze insert into va select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                 QUERY PLAN
-------------------------------------------------------------------------------------------
Insert on va (cost=73228.00..73228.02 rows=1 width=4) (actual time=3744.179..3744.187 rows=0 loops=1)
 -> Subquery Scan on "*SELECT*" (cost=73228.00..73228.02 rows=1 width=4) (actual time=3743.343..3743.352 rows=1 loops=1)
   -> Aggregate (cost=73228.00..73228.01 rows=1 width=8) (actual time=3743.247..3743.254 rows=1 loops=1)
    -> Hash Join (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=1092.295..3511.301 rows=1000000 loops=1)
      Hash Cond: (t1.id = t2.id)
      -> Seq Scan on test t1 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.030..421.537 rows=1000000 loops=1)
      -> Hash (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=1090.078..1090.081 rows=1000000 loops=1)
       Buckets: 131072 Batches: 16 Memory Usage: 3227kB
       -> Seq Scan on test1 t2 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.021..422.768 rows=1000000 loops=1)
 Planning Time: 0.511 ms
 Execution Time: 3745.633 ms
(11 rows)

可以看到并没有Workers的指示,没有启用并行查询。

即使开启强制并行,也无法走并行查询。

postgres=# set force_parallel_mode =on;
SET
postgres=# explain analyze insert into va select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                 QUERY PLAN
-------------------------------------------------------------------------------------------
Insert on va (cost=73228.00..73228.02 rows=1 width=4) (actual time=3825.042..3825.049 rows=0 loops=1)
 -> Subquery Scan on "*SELECT*" (cost=73228.00..73228.02 rows=1 width=4) (actual time=3824.976..3824.984 rows=1 loops=1)
   -> Aggregate (cost=73228.00..73228.01 rows=1 width=8) (actual time=3824.972..3824.978 rows=1 loops=1)
    -> Hash Join (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=1073.587..3599.402 rows=1000000 loops=1)
      Hash Cond: (t1.id = t2.id)
      -> Seq Scan on test t1 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.034..414.965 rows=1000000 loops=1)
      -> Hash (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=1072.441..1072.443 rows=1000000 loops=1)
       Buckets: 131072 Batches: 16 Memory Usage: 3227kB
       -> Seq Scan on test1 t2 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.022..400.624 rows=1000000 loops=1)
 Planning Time: 0.577 ms
 Execution Time: 3825.923 ms
(11 rows)

原因在官方文档有写:

The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. As an exception, the commands CREATE TABLE … AS, SELECT INTO, and CREATE MATERIALIZED VIEW which create a new table and populate it can use a parallel plan.

解决方案有如下三种:

1.select into

postgres=# explain analyze select count(*) into vaa from test t1,test1 t2 where t1.id = t2.id ;
                  QUERY PLAN
-------------------------------------------------------------------------------------------
Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=742.736..774.923 rows=1 loops=1)
 -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=740.223..774.907 rows=3 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=731.408..731.413 rows=1 loops=3)
    -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=489.880..700.830 rows=333333 loops=3)
      Hash Cond: (t1.id = t2.id)
      -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.033..87.479 rows=333333 loops=3)
      -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=266.839..266.840 rows=333333 loops=3)
       Buckets: 131072 Batches: 16 Memory Usage: 3520kB
       -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.058..106.874 rows=333333 loops=3)
 Planning Time: 0.319 ms
 Execution Time: 783.300 ms
(13 rows)

2.create table as

postgres=# explain analyze create table vb as select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                  QUERY PLAN
-------------------------------------------------------------------------------------------
 Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=540.120..563.733 rows=1 loops=1)
 -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=537.982..563.720 rows=3 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=526.602..527.136 rows=1 loops=3)
    -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=334.532..502.793 rows=333333 loops=3)
      Hash Cond: (t1.id = t2.id)
      -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.018..57.819 rows=333333 loops=3)
      -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=189.502..189.503 rows=333333 loops=3)
       Buckets: 131072 Batches: 16 Memory Usage: 3520kB
       -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.023..77.786 rows=333333 loops=3)
 Planning Time: 0.189 ms
 Execution Time: 565.448 ms
(13 rows)

3.或者通过导入导出的方式,例如:

psql -h localhost -d postgres -U postgres -c "select count(*) from test t1,test1 t2 where t1.id = t2.id " -o result.csv -A -t -F ","
psql -h localhost -d postgres -U postgres -c "COPY va FROM 'result.csv' WITH (FORMAT CSV, DELIMITER ',', HEADER FALSE, ENCODING 'windows-1252')"

一些场景下也会比非并行快。

补充:POSTGRESQL: 动态SQL语句中不能使用SELECT INTO?

我的数据库版本是 PostgreSQL 8.4.7 。 下面是出错的存储过程:

CREATE or Replace FUNCTION func_getnextid(
 tablename varchar(240),
 idname varchar(20) default 'id')
RETURNS integer AS $funcbody$
Declare
 sqlstring varchar(240);
 currentId integer;
Begin
 sqlstring:= 'select max("' || idname || '") into currentId from "' || tablename || '";';
 EXECUTE sqlstring;
 if currentId is NULL or currentId = 0 then
  return 1;
 else
  return currentId + 1;
 end if;
End;
$funcbody$ LANGUAGE plpgsq

执行后出现这样的错误:

SQL error:

ERROR: EXECUTE of SELECT ... INTO is not implemented

CONTEXT: PL/pgSQL function "func_getnextbigid" line 6 at EXECUTE statement

改成这样的就对了:

CREATE or Replace FUNCTION func_getnextid(
 tablename varchar(240),
 idname varchar(20) default 'id')
RETURNS integer AS $funcbody$
Declare
 sqlstring varchar(240);
 currentId integer;
Begin
 sqlstring:= 'select max("' || idname || '") from "' || tablename || '";';
 EXECUTE sqlstring into currentId;
 if currentId is NULL or currentId = 0 then
  return 1;
 else
  return currentId + 1;
 end if;
End;
$funcbody$ LANGUAGE plpgsql;

以上为个人经验,希望能给大家一个参考,也希望大家多多支持我们。如有错误或未考虑完全的地方,望不吝赐教。

(0)

相关推荐

  • php mysql insert into 结合详解及实例代码

    php mysql insert into 结合详解 ySQL INSERT INTO语句在实际应用中是经常使用到的语句,所以对其相关的内容还是多多掌握为好. 向数据库表插入数据 INSERT INTO 语句用于向数据库表添加新记录. 语法 INSERT INTO table_name VALUES (value1, value2,....) 您还可以规定希望在其中插入数据的列: INSERT INTO table_name (column1, column2,...) VALUES (valu

  • postgresql insert into select无法使用并行查询的解决

    本文信息基于PG13.1. 从PG9.6开始支持并行查询.PG11开始支持CREATE TABLE - AS.SELECT INTO以及CREATE MATERIALIZED VIEW的并行查询. 先说结论: 换用create table as 或者select into或者导入导出. 首先跟踪如下查询语句的执行计划: select count(*) from test t1,test1 t2 where t1.id = t2.id ; postgres=# explain analyze se

  • SELECT INTO 和 INSERT INTO SELECT 两种表复制语句详解(SQL数据库和Oracle数据库的区别)

    1.INSERT INTO SELECT语句 语句形式为:Insert into Table2(field1,field2,...) select value1,value2,... from Table1 或者:Insert into Table2 select * from Table1 注意:(1)要求目标表Table2必须存在,并且字段field,field2...也必须存在 (2)注意Table2的主键约束,如果Table2有主键而且不为空,则 field1, field2...中必须

  • 正确使用MySQL INSERT INTO语句

    以下的文章主要介绍的是MySQL INSERT INTO语句的实际用法以及MySQL INSERT INTO语句中的相关语句的介绍,MySQL INSERT INTO语句在实际应用中是经常使用到的语句,所以对其相关的内容还是多多掌握为好. INSERT [LOW_PRIORITY | DELAYED] [IGNORE] [INTO] tbl_name [(col_name,...)] VALUES (expression,...),(...),... MySQLINSERT INTO SELEC

  • SQL insert into语句写法讲解

    方式1. INSERT INTO t1(field1,field2) VALUE(v001,v002);  明确只插入一条Value 方式2. INSERT INTO t1(field1,field2) VALUES(v101,v102),(v201,v202),(v301,v302),(v401,v402); 在插入批量数据时 方式2 优于 方式1. [特注]当 id 为自增,即  id INT PRIMARY KEY AUTO_INCREMENT 时,执行 insert into 语句,需要

  • PHP+MySQL之Insert Into数据插入用法分析

    本文实例讲述了PHP+MySQL之Insert Into数据插入用法.分享给大家供大家参考.具体如下: INSERT INTO 语句用于向数据库表中插入新纪录. 向数据库表插入数据 INSERT INTO 语句用于向数据库表添加新纪录. 语法: INSERT INTO table_name VALUES (value1, value2,....) 您还可以规定希望在其中插入数据的列: INSERT INTO table_name (column1, column2,...) VALUES (va

  • mysql 中 replace into 与 insert into on duplicate key update 的用法和不同点实例分析

    本文实例讲述了mysql 中 replace into 与 insert into on duplicate key update 的用法和不同点.分享给大家供大家参考,具体如下: replace into和insert into on duplicate key update都是为了解决我们平时的一个问题 就是如果数据库中存在了该条记录,就更新记录中的数据,没有,则添加记录. 我们创建一个测试表test CREATE TABLE `test` ( `id` int(11) unsigned N

  • MySql中使用INSERT INTO语句更新多条数据的例子

    我们知道当插入多条数据的时候insert支持多条语句: 复制代码 代码如下: INSERT INTO t_member (id, name, email) VALUES     (1, 'nick', 'nick@126.com'),     (4, 'angel','angel@163.com'),     (7, 'brank','ba198@126.com'); 但是对于更新记录,由于update语法不支持一次更新多条记录,只能一条一条执行: 复制代码 代码如下: UPDATE t_mem

  • postgresql 13.1 insert into select并行查询的实现

    本文信息基于PG13.1. 从PG9.6开始支持并行查询.PG11开始支持CREATE TABLE - AS.SELECT INTO以及CREATE MATERIALIZED VIEW的并行查询. 先说结论: 换用create table as 或者select into或者导入导出. 首先跟踪如下查询语句的执行计划: select count(*) from test t1,test1 t2 where t1.id = t2.id ; postgres=# explain analyze se

  • 解决postgresql insert into select无法使用并行查询的问题

    本文信息基于PG13.1. 从PG9.6开始支持并行查询.PG11开始支持CREATE TABLE - AS.SELECT INTO以及CREATE MATERIALIZED VIEW的并行查询. 先说结论: 换用create table as 或者select into或者导入导出. 首先跟踪如下查询语句的执行计划: select count(*) from test t1,test1 t2 where t1.id = t2.id ; postgres=# explain analyze se

  • Oracle并行操作之并行查询实例解析

    Oracle数据库的并行操作特性,其本质上就是强行榨取除数据库服务器空闲资源(主要是CPU资源),对一些高负荷大数据量数据进行分治处理.并行操作是一种非确定性的优化策略,在选择的时候需要小心对待.目前,使用并行操作特性的主要有下面几个方面: Parallel Query:并行查询,使用多个操作系统级别的Server Process来同时完成一个SQL查询: Parallel DML:并行DML操作.类似于Parallel Query.当要对大数据量表进行DML操作,如insert.update和

  • mysql中insert与select的嵌套使用解决组合字段插入问题

    如何在mysql从多个表中组合字段然后插入到一个新表中,通过一条sql语句实现.具体情形是:有三张表a.b.c,现在需要从表b和表c中分别查几个字段的值插入到表a中对应的字段.对于这种情况,我们可以使用如下的语句来实现: INSERT INTO db1_name(field1,field2) SELECT field1,field2 FROM db2_name 当然,上面的语句比较适合两个表的数据互插,如果多个表就不适应了.对于多个表,我们可以先将需要查询的字段join起来,然后组成一个视图后再

  • 解析MySQL中INSERT INTO SELECT的使用

    1. 语法介绍有三张表a.b.c,现在需要从表b和表c中分别查几个字段的值插入到表a中对应的字段.对于这种情况,可以使用如下的语句来实现:INSERT INTO db1_name (field1,field2) SELECT field1,field2 FROM db2_name 上面的语句比较适合两个表的数据互插,如果多个表就不适应了.对于多个表,可以先将需要查询的字段JOIN起来,然后组成一个视图后再SELECT FROM就可以了: INSERT INTO a (field1,field2)

  • 数据库插入数据之select into from与insert into select区别详解

    可能第一次接触select...into...from...和insert into...select...有很多人都会误解, 从表面上看都是把相关信息查询出来,然后添加到一个表里,其实还远远没有这么简单,接下来,小猪就用最普通的表述给大家介绍一下这两者间的区别. 步骤/方法 1.首先,我们来看一下insert into select语句,其语法形式为:Insert into Table2(field1,field2,...) select value1,value2,... from Tabl

  • mysql中insert与select的嵌套使用方法

    本文讲述了mysql中insert与select的嵌套使用的方法,对于初学MySQL的朋友有一定的借鉴价值. 这里需要实现在mysql从多个表中组合字段然后插入到一个新表中,通过一条sql语句实现该功能需求.具体情形是:有三张表a.b.c,现在需要从表b和表c中分别查几个字段的值插入到表a中对应的字段.对于这种情况,我们可以使用如下的语句来实现: INSERT INTO db1_name(field1,field2) SELECT field1,field2 FROM db2_name 当然,上

  • insert into select和select into的使用和区别介绍

    insert into ... select ...:可将表1中的全部数据或者部分数据复制到表2中. eg: 复制代码 代码如下: insert into t2(id,name,pwd) select id,name,pwd from t1 注:t2必须存在.t1中查询的列名可不与t1列名相同.无 values select...into:查询t1中的数据,插入到t2中. eg: 复制代码 代码如下: select * into t2 from t1 注:t2被创建并填充数据.

随机推荐