postgreSQL 非count方法算记录数操作

2025-06-21 03:36:02

一般方法

select count(1) from table_name;

全量扫描一遍表，记录越多，查询速度越慢

新法

PostgreSQL 还真提供了一个这样的途径，那就是系统表 pg_class，这个系统表里头，存储着每个表的统计信息，其中 reltuples 就是对应的表的统计行,统计行的数据是pg有个独立进程，定期扫描不同的表，收集这些表的统计信息，保存在系统表里头。

方法如下：

select
 reltuples::int as total
from
 pg_class
where
 relname = 'table_name'
 and relnamespace = (select oid from pg_namespace where nspname = 'schema');

新方法不是通用的，如果要求特精确还是使用select count(1),如果是类似分页的，且分页数量超过比较多的，也不是要求特别精准的，这就是一个好方法！

count(1) over 计算记录数

select count(1) over(), * from table_name;

补充

count 是最常用的聚集函数之一，看似简单，其实还是有坑的，如：

1、count(*)：返回结果集的行数，是null也统计

2、count(1)：和count(*)基本没区别，pg92之前都是扫描全表的，pg92之后增加了index only scan一般会变成扫主键索引，如果没有主键或者是表的列很多的情况下，count(1)快一些，因为不会考虑表的全部字段

3、count(field)：返回数据表中指定字段值不等于null的行数

拓展：理解 PostgreSQL 的 count 函数的行为

关于 count 函数的使用一直存在争议，尤其是在 MySQL 中，作为流行度越来越高的 PostgreSQL 是否也有类似的问题呢，我们通过实践来理解一下 PostgreSQL 中 count 函数的行为。

构建测试数据库

创建测试数据库，并创建测试表。测试表中有自增 ID、创建时间、内容三个字段，自增 ID 字段是主键。

create database performance_test;

create table test_tbl (id serial primary key, created_at timestamp, content varchar(512));

生成测试数据

使用 generate_series 函数生成自增 ID，使用 now() 函数生成 created_at 列，对于 content 列，使用了 repeat(md5(random()::text), 10) 生成 10 个 32 位长度的 md5 字符串。使用下列语句，插入 1000w 条记录用于测试。

performance_test=# insert into test_tbl select generate_series(1,10000000),now(),repeat(md5(random()::text),10); INSERT 0 10000000 Time: 212184.223 ms (03:32.184)

由 count 语句引发的思考

默认情况下 PostgreSQL 不开启 SQL 执行时间的显示，所以需要手动开启一下，方便后面的测试对比。

\timing on

count(*) 和 count(1) 的性能区别是经常被讨论的问题，分别使用 count(*) 和 count(1) 执行一次查询。

performance_test=# select count(*) from test_tbl;
 count
----------
 10000000
(1 row)

Time: 115090.380 ms (01:55.090)

performance_test=# select count(1) from test_tbl;
 count
----------
 10000000
(1 row)

Time: 738.502 ms

可以看到两次查询的速度差别非常大，count(1) 真的有这么大的性能提升？接下来再次运行查询语句。

performance_test=# select count(*) from test_tbl;
 count
----------
 10000000
(1 row)

Time: 657.831 ms

performance_test=# select count(1) from test_tbl;
 count
----------
 10000000
(1 row)

Time: 682.157 ms

可以看到第一次查询时候会非常的慢，后面三次速度非常快并且时间相近，这里就有两个问题出现了：

为什么第一次查询速度这么慢？

count(*) 和 count(1) 到底存不存在性能差别？

查询缓存

使用 explain 语句重新执行查询语句

explain (analyze,buffers,verbose) select count(*) from test_tbl;

可以看到如下输出：

Finalize Aggregate (cost=529273.69..529273.70 rows=1 width=8) (actual time=882.569..882.570 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=96 read=476095
  -> Gather (cost=529273.48..529273.69 rows=2 width=8) (actual time=882.492..884.170 rows=3 loops=1)
     Output: (PARTIAL count(*))
     Workers Planned: 2
     Workers Launched: 2
     Buffers: shared hit=96 read=476095
     -> Partial Aggregate (cost=528273.48..528273.49 rows=1 width=8) (actual time=881.014..881.014 rows=1 loops=3)
        Output: PARTIAL count(*)
        Buffers: shared hit=96 read=476095
        Worker 0: actual time=880.319..880.319 rows=1 loops=1
         Buffers: shared hit=34 read=158206
        Worker 1: actual time=880.369..880.369 rows=1 loops=1
         Buffers: shared hit=29 read=156424
        -> Parallel Seq Scan on public.test_tbl (cost=0.00..517856.98 rows=4166598 width=0) (actual time=0.029..662.165 rows=3333333 loops=3)
           Buffers: shared hit=96 read=476095
           Worker 0: actual time=0.026..661.807 rows=3323029 loops=1
            Buffers: shared hit=34 read=158206
           Worker 1: actual time=0.030..660.197 rows=3285513 loops=1
            Buffers: shared hit=29 read=156424
 Planning time: 0.043 ms
 Execution time: 884.207 ms

注意里面的 shared hit，表示命中了内存中缓存的数据，这就可以解释为什么后面的查询会比第一次快很多。接下来去掉缓存，并重启 PostgreSQL。

service postgresql stop
echo 1 > /proc/sys/vm/drop_caches
service postgresql start

重新执行 SQL 语句，速度慢了很多。

 Finalize Aggregate (cost=529273.69..529273.70 rows=1 width=8) (actual time=50604.564..50604.564 rows=1 loops=1)
  Output: count(*)
  Buffers: shared read=476191
  -> Gather (cost=529273.48..529273.69 rows=2 width=8) (actual time=50604.508..50606.141 rows=3 loops=1)
     Output: (PARTIAL count(*))
     Workers Planned: 2
     Workers Launched: 2
     Buffers: shared read=476191
     -> Partial Aggregate (cost=528273.48..528273.49 rows=1 width=8) (actual time=50591.550..50591.551 rows=1 loops=3)
        Output: PARTIAL count(*)
        Buffers: shared read=476191
        Worker 0: actual time=50585.182..50585.182 rows=1 loops=1
         Buffers: shared read=158122
        Worker 1: actual time=50585.181..50585.181 rows=1 loops=1
         Buffers: shared read=161123
        -> Parallel Seq Scan on public.test_tbl (cost=0.00..517856.98 rows=4166598 width=0) (actual time=92.491..50369.691 rows=3333333 loops=3)
           Buffers: shared read=476191
           Worker 0: actual time=122.170..50362.271 rows=3320562 loops=1
            Buffers: shared read=158122
           Worker 1: actual time=14.020..50359.733 rows=3383583 loops=1
            Buffers: shared read=161123
 Planning time: 11.537 ms
 Execution time: 50606.215 ms

shared read 表示没有命中缓存，通过这个现象可以推断出，上一小节的四次查询中，第一次查询没有命中缓存，剩下三次查询都命中了缓存。

count(1) 和 count(*) 的区别

接下来探究 count(1) 和 count(*) 的区别是什么，继续思考最开始的四次查询，第一次查询使用了 count(*)，第二次查询使用了 count(1) ，却依然命中了缓存，不正是说明 count(1) 和 count(*) 是一样的吗？

事实上，PostgreSQL 官方对于 is there a difference performance-wise between select count(1) and select count(*)? 问题的回复也证实了这一点：

Nope. In fact, the latter is converted to the former during parsing.[2]

既然 count(1) 在性能上没有比 count(*) 更好，那么使用 count(*) 就是更好的选择。

sequence scan 和 index scan

接下来测试一下，在不同数据量大小的情况下 count(*) 的速度，将查询语句写在 count.sql 文件中，使用 pgbench 进行测试。

pgbench -c 5 -t 20 performance_test -r -f count.sql

分别测试 200w - 1000w 数据量下的 count 语句耗时

数据大小	count耗时(ms)
200w	738.758
300w	1035.846
400w	1426.183
500w	1799.866
600w	2117.247
700w	2514.691
800w	2526.441
900w	2568.240
1000w	2650.434

绘制成耗时曲线

曲线的趋势在 600w - 700w 数据量之间出现了转折，200w - 600w 是线性增长，600w 之后 count 的耗时就基本相同了。使用 explain 语句分别查看 600w 和 700w 数据时的 count 语句执行。

700w：

Finalize Aggregate (cost=502185.93..502185.94 rows=1 width=8) (actual time=894.361..894.361 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=16344 read=352463
  -> Gather (cost=502185.72..502185.93 rows=2 width=8) (actual time=894.232..899.763 rows=3 loops=1)
     Output: (PARTIAL count(*))
     Workers Planned: 2
     Workers Launched: 2
     Buffers: shared hit=16344 read=352463
     -> Partial Aggregate (cost=501185.72..501185.73 rows=1 width=8) (actual time=889.371..889.371 rows=1 loops=3)
        Output: PARTIAL count(*)
        Buffers: shared hit=16344 read=352463
        Worker 0: actual time=887.112..887.112 rows=1 loops=1
         Buffers: shared hit=5459 read=118070
        Worker 1: actual time=887.120..887.120 rows=1 loops=1
         Buffers: shared hit=5601 read=117051
        -> Parallel Index Only Scan using test_tbl_pkey on public.test_tbl (cost=0.43..493863.32 rows=2928960 width=0) (actual time=0.112..736.376 rows=2333333 loops=3)
           Index Cond: (test_tbl.id < 7000000)
           Heap Fetches: 2328492
           Buffers: shared hit=16344 read=352463
           Worker 0: actual time=0.107..737.180 rows=2344479 loops=1
            Buffers: shared hit=5459 read=118070
           Worker 1: actual time=0.133..737.960 rows=2327028 loops=1
            Buffers: shared hit=5601 read=117051
 Planning time: 0.165 ms
 Execution time: 899.857 ms

600w：

Finalize Aggregate (cost=429990.94..429990.95 rows=1 width=8) (actual time=765.575..765.575 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=13999 read=302112
  -> Gather (cost=429990.72..429990.93 rows=2 width=8) (actual time=765.557..770.889 rows=3 loops=1)
     Output: (PARTIAL count(*))
     Workers Planned: 2
     Workers Launched: 2
     Buffers: shared hit=13999 read=302112
     -> Partial Aggregate (cost=428990.72..428990.73 rows=1 width=8) (actual time=763.821..763.821 rows=1 loops=3)
        Output: PARTIAL count(*)
        Buffers: shared hit=13999 read=302112
        Worker 0: actual time=762.742..762.742 rows=1 loops=1
         Buffers: shared hit=4638 read=98875
        Worker 1: actual time=763.308..763.308 rows=1 loops=1
         Buffers: shared hit=4696 read=101570
        -> Parallel Index Only Scan using test_tbl_pkey on public.test_tbl (cost=0.43..422723.16 rows=2507026 width=0) (actual time=0.053..632.199 rows=2000000 loops=3)
           Index Cond: (test_tbl.id < 6000000)
           Heap Fetches: 2018490
           Buffers: shared hit=13999 read=302112
           Worker 0: actual time=0.059..633.156 rows=1964483 loops=1
            Buffers: shared hit=4638 read=98875
           Worker 1: actual time=0.038..634.271 rows=2017026 loops=1
            Buffers: shared hit=4696 read=101570
 Planning time: 0.055 ms
 Execution time: 770.921 ms

根据以上现象推断，PostgreSQL 似乎在 count 的数据量小于数据表长度的某一比例时，才使用 index scan，通过查看官方 wiki 也可以看到相关描述：

It is important to realise that the planner is concerned with minimising the total cost of the query. With databases, the cost of I/O typically dominates. For that reason, "count(*) without any predicate" queries will only use an index-only scan if the index is significantly smaller than its table. This typically only happens when the table's row width is much wider than some indexes'.[3]

根据 Stackoverflow 上的回答，count 语句查询的数量大于表大小的 3/4 时候就会用使用全表扫描代替索引扫描[4]。

结论

不要用 count(1) 或 count(列名) 代替 count(*)

count 本身是非常耗时的

count 可能是 index scan 也可能是 sequence scan，取决于 count 数量占表大小的比例

以上为个人经验，希望能给大家一个参考，也希望大家多多支持我们。如有错误或未考虑完全的地方，望不吝赐教。

Postgresql 通过出生日期获取年龄的操作

三个基础的时间表示函数 CURRENT_DATE/CURRENT_TIME/NOW() SELECT CURRENT_DATE ; 返回当前日期以年-月-日(yyyy-MM-dd)的形式: 2019-01-10 SELECT CURRENT_TIME; 返回当日时间以时:分:秒+时区(HH:mm:ss )的形式: 17:49:11.585308+08 SELECT NOW(); 返回当前时间以年-月-日时:分:秒(yyyy-MM-dd HH:mm:ss)的形式: 2019-01-10
postgresql中时间转换和加减操作

昨天遇到一个floor向下取整的问题,现在将它记录下来. 首先floor是需要一个int或者dp.那么我们日期加减转换,timestamp 转data,是需要先将其转成char,再由char转date,再相减,得出一个符合floor函数要求的值. 代码如下: FLOOR((to_date(to_char(current_timestamp, 'yyyy-MM-dd hh:mi:ss'), 'yyyy-mm-dd') - to_date(to_char(ca.birthday, 'yyyy-MM-
postgresql 获取两个时间类型小时差值案例

我就废话不多说了,看代码吧~ select extract(day from t) * 24 + extract(hour from t) from (select (timestamp '2013-12-14 12:00:00' - timestamp '2013-12-11 4:00:00') as t) as a; 例如a表中有start,end俩字段 select extract(day from (end-start)) * 24 + extract(hour from (end-st
postgresql通过索引优化查询速度操作

当数据量比较大的时候,提升查询效率就是需要去考虑的事情了.一个百万级别的表格,如果不做任何优化的话,即使是最简单的查询语句执行起来也是慢的让人难以接受:当然"优化"本身是一个比较复杂的工程,从设计表.字段到查询语句的写法都有很多讲究,这里只考虑索引的方式,且是最普通的索引: 下面的操作中对应数据库表w008_execrise_info(8000数据量), w008_wf02_info(4000数据量) 1 任务表数据 SELECT w.* FROM w008_wf02_info w W
postgresql的now()与Oracle的sysdate区别说明

postgresql的now()为当前事务开始时间, 而Oracle的sysdate是当前时间. 区别在于事务. postgresql中的now(): postgres=# begin ; BEGIN postgres=# select now(); now ------------------------------- 2017-03-31 14:28:32.403869+08 (1 row) postgres=# select now(); now ---------------------
详解PostgreSql数据库对象信息及应用

PostgreSql数据库对象主要有数据库.表.视图.索引.schema.函数.触发器等.PostgreSql提供了information_schema schema,其中包括返回数据库对象的视图.如用户有访问权限,可以也在pg_catalog schema中查询表.视图等对象. 1. 查询数据库对象下面通过示例分别展示如何查询各种数据库对象. 1.1 表查询 PostgreSql 表信息可以从information_schema.tables 或 pg_catalog.pg_tables 视
postgreSQL 非count方法算记录数操作

一般方法 select count(1) from table_name; 全量扫描一遍表,记录越多,查询速度越慢新法 PostgreSQL 还真提供了一个这样的途径,那就是系统表 pg_class,这个系统表里头,存储着每个表的统计信息,其中 reltuples 就是对应的表的统计行,统计行的数据是pg有个独立进程,定期扫描不同的表,收集这些表的统计信息,保存在系统表里头. 方法如下: select reltuples::int as total from pg_class where re
Abp.NHibernate连接PostgreSQl数据库的方法

Abp.NHibernate动态库连接PostgreSQl数据库,供大家参考,具体内容如下初次接触Abp框架,其框架中封装的操作各类数据的方法还是很好用的,本人还在进一步的学习当中,并将利用abp.NHibernate类库操作PostgreSQL数据的相关方法做一记录,不足之处让评论指点扔砖. 话不多说,直接开干: 1.vs 新建一个项目,(窗体或者控制台程序或者测试程序) 2.NuGet 获取类库(adp.NHibernate) 还需安装一个pgSQl 对应的驱动 3.新建一个继承AbpMo
Python3开发实例之非关系型图数据库Neo4j安装方法及Python3连接操作Neo4j方法实例

非关系型图数据库Neo4j简介 Neo4j是现今最火爆的图数据.在2010年发布,产品的发展势头还算不错. 作为图数据库,Neo4j最大的特点是关系数据的存储. 图数据库除了能够像普通的数据库一样存储一行一行的数据之外,还可以很方便的看出存储数据之间的关系信息. 适合存储"修改较少,查询较多,没有超大节点"的图数据. 图数据库Neo4j应用场景社交网络根据用户与其他用户的关系为用户推荐新的朋友.例如,在QQ中给你推荐朋友的朋友 . 智能推荐引擎通过分析用户有哪些朋友.用户朋友喜好
PHP+MySQL统计该库中每个表的记录数并按递减顺序排列的方法

本文实例讲述了PHP+MySQL统计该库中每个表的记录数并按递减顺序排列的方法.分享给大家供大家参考,具体如下: 这是一段简单的代码,可实现统计该数据库中每个表的记录数,并按递减顺序排列的功能 $host = '127.0.0.1'; $port = 3306; $dbname = 'test'; $username = 'root'; $password = ''; function ee($p) { if(PHP_SAPI == 'cli') { echo "\n"; }else{
php连接与操作PostgreSQL数据库的方法

本文实例讲述了php连接与操作PostgreSQL数据库的方法.分享给大家供大家参考. 具体实现方法如下: 复制代码代码如下: $pg=@pg_connect("host=localhost user=postgres password=sa dbname=employes") or die("can't connect to database."); $query="select * from employes order by serial_no&q
PHP连接及操作PostgreSQL数据库的方法详解

本文实例讲述了PHP连接及操作PostgreSQL数据库的方法.分享给大家供大家参考,具体如下: PostgreSQL扩展在默认情况下在最新版本的PHP 5.3.x中是启用的. 可以在编译时使用--without-pgsql来禁用它.仍然可以使用yum命令来安装PHP-PostgreSQL接口: yum install php-pgsql 在开始使用PHP连接PostgreSQL接口之前,请先在PostgreSQL安装目录中找到pg_hba.conf文件,并添加以下行: # IPv4 local
SpringBoot使用AOP记录接口操作日志的方法

目录一.操作日志简介 1.1.系统日志和操作日志的区别 1.2.操作日志记录实现方式二.AOP面向切面编程 2.1.AOP简介 2.2.AOP作用 2.3.AOP相关术语 2.4.JointPoint和ProceedingJoinPoint 2.5.AOP相关注解三.AOP切面实现接口日志记录 3.1.引入AOP依赖 3.2.创建日志信息封装类WebLog 3.3.创建切面类WebLogAspect 3.4.调用接口进行测试四.AOP切面+自定义注解实现接口日志记录 4.1.自定义日志注
MySQL数据库查看数据表占用空间大小和记录数的方法

如果想知道MySQL数据库中每个表占用的空间.表记录的行数的话,可以打开MySQL的 information_schema 数据库.在该库中有一个 TABLES 表,这个表主要字段分别是: TABLE_SCHEMA : 数据库名 TABLE_NAME:表名 ENGINE:所使用的存储引擎 TABLES_ROWS:记录数 DATA_LENGTH:数据大小 INDEX_LENGTH:索引大小其他字段请参考MySQL的手册,这几个字段对我们来说最有用. 一个表占用空间的大小,相当于是数据大小 +
一文解答为什么MySQL的count()方法这么慢

目录前言 count()的原理各种count()方法的原理允许粗略估计行数的场景必须精确估计行数的场景总结前言 mysql用count方法查全表数据,在不同的存储引擎里实现不同,myisam有专门字段记录全表的行数,直接读这个字段就好了.而innodb则需要一行行去算. 比如说,你有一张短信表(sms),里面放了各种需要发送的短信信息. sms建表sql: sms表; 需要注意的是state字段,为0的时候说明这时候短信还未发送. 此时还会有一个异步线程不断的捞起未发送(state=
MySQL 如何限制一张表的记录数

目录一.触发器解决方案二.分区表解决方案三.通用表空间解决方案关于MySQL 如何限制一张表的记录数,这没有一个简化的答案,比如执行一条命令或者说简单设置一个参数都不能完美解决.接下来我给出一些可选解决方案. 对数据库来讲,一般问题的解决方案无非有两种,一种是在应用端:另外一种是在数据库端. 首先是在数据库端(假设表硬性限制为1W条记录): 一.触发器解决方案触发器的思路很简单,每次插入新记录前,检查表记录数是否到达限定数量,数量未到,继续插入:数量达到,先插入一条新记录,再删除最老的