postgresql insert into select无法使用并行查询的解决

本文信息基于PG13.1。

从PG9.6开始支持并行查询。PG11开始支持CREATE TABLE … AS、SELECT INTO以及CREATE MATERIALIZED VIEW的并行查询。

先说结论:

换用create table as 或者select into或者导入导出。

首先跟踪如下查询语句的执行计划:

select count(*) from test t1,test1 t2 where t1.id = t2.id ;
postgres=# explain analyze select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=683.246..715.324 rows=1 loops=1)
  -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=681.474..715.311 rows=3 loops=1)
     Workers Planned: 2
     Workers Launched: 2
     -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=674.689..675.285 rows=1 loops=3)
        -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=447.799..645.689 rows=333333 loops=3)
           Hash Cond: (t1.id = t2.id)
           -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.025..74.010 rows=333333 loops=3)
           -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=260.052..260.053 rows=333333 loops=3)
              Buckets: 131072 Batches: 16 Memory Usage: 3520kB
              -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.032..104.804 rows=333333 loops=3)
 Planning Time: 0.420 ms
 Execution Time: 715.447 ms
(13 rows)

可以看到走了两个Workers。

下边看一下insert into select:

postgres=# explain analyze insert into va select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on va (cost=73228.00..73228.02 rows=1 width=4) (actual time=3744.179..3744.187 rows=0 loops=1)
  -> Subquery Scan on "*SELECT*" (cost=73228.00..73228.02 rows=1 width=4) (actual time=3743.343..3743.352 rows=1 loops=1)
     -> Aggregate (cost=73228.00..73228.01 rows=1 width=8) (actual time=3743.247..3743.254 rows=1 loops=1)
        -> Hash Join (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=1092.295..3511.301 rows=1000000 loops=1)
           Hash Cond: (t1.id = t2.id)
           -> Seq Scan on test t1 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.030..421.537 rows=1000000 loops=1)
           -> Hash (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=1090.078..1090.081 rows=1000000 loops=1)
              Buckets: 131072 Batches: 16 Memory Usage: 3227kB
              -> Seq Scan on test1 t2 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.021..422.768 rows=1000000 loops=1)
 Planning Time: 0.511 ms
 Execution Time: 3745.633 ms
(11 rows)

可以看到并没有Workers的指示,没有启用并行查询。

即使开启强制并行,也无法走并行查询。

postgres=# set force_parallel_mode =on;
SET
postgres=# explain analyze insert into va select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on va (cost=73228.00..73228.02 rows=1 width=4) (actual time=3825.042..3825.049 rows=0 loops=1)
  -> Subquery Scan on "*SELECT*" (cost=73228.00..73228.02 rows=1 width=4) (actual time=3824.976..3824.984 rows=1 loops=1)
     -> Aggregate (cost=73228.00..73228.01 rows=1 width=8) (actual time=3824.972..3824.978 rows=1 loops=1)
        -> Hash Join (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=1073.587..3599.402 rows=1000000 loops=1)
           Hash Cond: (t1.id = t2.id)
           -> Seq Scan on test t1 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.034..414.965 rows=1000000 loops=1)
           -> Hash (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=1072.441..1072.443 rows=1000000 loops=1)
              Buckets: 131072 Batches: 16 Memory Usage: 3227kB
              -> Seq Scan on test1 t2 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.022..400.624 rows=1000000 loops=1)
 Planning Time: 0.577 ms
 Execution Time: 3825.923 ms
(11 rows)

原因在官方文档有写:

The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. As an exception, the commands CREATE TABLE … AS, SELECT INTO, and CREATE MATERIALIZED VIEW which create a new table and populate it can use a parallel plan.

解决方案有如下三种:

1.select into

postgres=# explain analyze select count(*) into vaa from test t1,test1 t2 where t1.id = t2.id ;
                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=742.736..774.923 rows=1 loops=1)
  -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=740.223..774.907 rows=3 loops=1)
     Workers Planned: 2
     Workers Launched: 2
     -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=731.408..731.413 rows=1 loops=3)
        -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=489.880..700.830 rows=333333 loops=3)
           Hash Cond: (t1.id = t2.id)
           -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.033..87.479 rows=333333 loops=3)
           -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=266.839..266.840 rows=333333 loops=3)
              Buckets: 131072 Batches: 16 Memory Usage: 3520kB
              -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.058..106.874 rows=333333 loops=3)
 Planning Time: 0.319 ms
 Execution Time: 783.300 ms
(13 rows)

2.create table as

postgres=# explain analyze create table vb as select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=540.120..563.733 rows=1 loops=1)
  -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=537.982..563.720 rows=3 loops=1)
     Workers Planned: 2
     Workers Launched: 2
     -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=526.602..527.136 rows=1 loops=3)
        -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=334.532..502.793 rows=333333 loops=3)
           Hash Cond: (t1.id = t2.id)
           -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.018..57.819 rows=333333 loops=3)
           -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=189.502..189.503 rows=333333 loops=3)
              Buckets: 131072 Batches: 16 Memory Usage: 3520kB
              -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.023..77.786 rows=333333 loops=3)
 Planning Time: 0.189 ms
 Execution Time: 565.448 ms
(13 rows)

3.或者通过导入导出的方式,例如:

psql -h localhost -d postgres -U postgres -c "select count(*) from test t1,test1 t2 where t1.id = t2.id " -o result.csv -A -t -F ","
psql -h localhost -d postgres -U postgres -c "COPY va FROM 'result.csv' WITH (FORMAT CSV, DELIMITER ',', HEADER FALSE, ENCODING 'windows-1252')"

一些场景下也会比非并行快。

到此这篇关于postgresql insert into select无法使用并行查询的解决的文章就介绍到这了,更多相关postgresql insert into select并行查询内容请搜索我们以前的文章或继续浏览下面的相关文章希望大家以后多多支持我们!

(0)

相关推荐

  • PostgreSQL 角色与用户管理介绍

    一.角色与用户的区别 角色就相当于岗位:角色可以是经理,助理.用户就是具体的人:比如陈XX经理,朱XX助理,王XX助理.在PostgreSQL 里没有区分用户和角色的概念,"CREATE USER" 为 "CREATE ROLE" 的别名,这两个命令几乎是完全相同的,唯一的区别是"CREATE USER" 命令创建的用户默认带有LOGIN属性,而"CREATE ROLE" 命令创建的用户默认不带LOGIN属性(CREATE U

  • 解决PostgreSQL服务启动后占用100% CPU卡死的问题

    进程中有N个postgres.exe(此为正常,见官方文档),却有一个始终占满CPU(由于本机是双核,占用了50%的资源).自带的pgAdmin III连接会死掉. 此问题在网上搜索没找到答案. 查看日志发现有这样一条错误信息: %t LOG:  could not receive data from client: An operation was attempted on something that is not a socket. 根据错误提示,在HP的官网找到了答案(应该是win的问题

  • PostgreSQL 创建表分区

    创建表分区步骤如下: 1. 创建主表 CREATE TABLE users ( uid int not null primary key, name varchar(20)); 2. 创建分区表(必须继承上面的主表) CREATE TABLE users_0 ( check (uid >= 0 and uid< 100) ) INHERITS (users); CREATE TABLE users_1 ( check (uid >= 100)) INHERITS (users); 3.

  • PostgreSQL的upsert实例操作(insert on conflict do)

    建表语句: DROP TABLE IF EXISTS "goods"; CREATE TABLE "goods" ( "store_cd" int4 NOT NULL, "good_cd" varchar(50) COLLATE "pg_catalog"."default" NOT NULL, "name" varchar(255) COLLATE "pg_

  • PostgreSQL中调用存储过程并返回数据集实例

    这里用一个实例来演示PostgreSQL存储过程如何返回数据集. 1.首先准备数据表 复制代码 代码如下: //member_category create table member_category(id serial, name text, discount_rate real, base_integral integer); alter table member_category add primary key(id); alter table member_category add ch

  • PostgreSQL新手入门教程

    自从MySQL被Oracle收购以后,PostgreSQL逐渐成为开源关系型数据库的首选. 本文介绍PostgreSQL的安装和基本用法,供初次使用者上手.以下内容基于Debian操作系统,其他操作系统实在没有精力兼顾,但是大部分内容应该普遍适用. 安装 1.首先,安装PostgreSQL客户端. sudo apt-get install postgresql-client 然后,安装PostgreSQL服务器. sudo apt-get install postgresql 2.正常情况下,安

  • Postgresql ALTER语句常用操作小结

    postgresql版本:psql (9.3.4) 1.增加一列 复制代码 代码如下: ALTER TABLE table_name ADD column_name datatype; 2.删除一列 复制代码 代码如下: ALTER TABLE table_name DROP  column_name; 3.更改列的数据类型 复制代码 代码如下: ALTER TABLE table_name ALTER  column_name TYPE datatype; 4.表的重命名 复制代码 代码如下:

  • Windows PostgreSQL 安装图文教程

    它提供了多版本并行控制,支持几乎所有 SQL 构件(包括子查询,事务和用户定义类型和函数), 并且可以获得非常广阔范围的(开发)语言绑定 (包括 C,C++,Java,perl,tcl,和 python).本文介绍的是其在windows系统下的安装过程. 一般说来,一个现代的与 Unix 兼容的平台应该就能运行 PostgreSQL.而如果在windows系统下安装,你需要 Cygwin 和cygipc 包.另外,如果要制作服务器端编程语言 PL/Perl,则还需要完整的Perl安装,包括 li

  • PostgreSQL 安装和简单使用第1/2页

    据我了解国内四大国产数据库,其中三个都是基于PostgreSQL开发的.并且,因为许可证的灵活,任何人都可以以任何目的免费使用,修改,和分发 PostgreSQL,不管是私用,商用,还是学术研究使用.本文只是简单介绍一下postgresql的安装和简单的使用,语法方面涉及的比较少,以方便新手上路为目的. 1.系统环境和安装方法 : PostgreSQL的安装方法比较灵活,可以用源码包安装,也可以用您使用的发行版所带的软件包来安装,还可以采用在线安装-- 1.1 系统环境:Ubuntu Linux

  • PostgreSQL中的XML操作函数代码

    XML内容生成部分 SQL数据生成XML的函数. 1. xmlcomment:生成注释函数. xmlcomment(text ) 例: SELECT xmlcomment('hello'); xmlcomment -------------- <!--hello--> 2. xmlconcat:XML连接函数 xmlconcat(xml [, ...]) 例: SELECT xmlconcat('<abc/>', '<bar>foo</bar>'); xml

随机推荐