教你怎么用python删除相似度高的图片

2025-02-23 15:06:14

1. 前言

因为输入是视频，切完帧之后都是连续图片，所以我的目录结构如下：

其中frame_output是视频切帧后的保存路径，1和2文件夹分别对应两个是视频切帧后的图片。

2. 切帧代码如下：

#encoding:utf-8
import os
import sys
import cv2

video_path = '/home/pythonfile/video/'  # 绝对路径，video下有两段视频
out_frame_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'frame_output')  #frame_output是视频切帧后的保存路径
if not os.path.exists(out_frame_path):
    os.makedirs(out_frame_path)
print('out_frame_path', out_frame_path)
files = []
list1 = os.listdir(video_path)
print('list', list1)
for i in range(len(list1)):
    item = os.path.join(video_path, list1[i])
    files.append(item)
print('files',files)
for k,file in enumerate(files):
    frame_dir = os.path.join(out_frame_path, '%d'%(k+1))
    if not os.path.exists(frame_dir):
        os.makedirs(frame_dir)
    cap = cv2.VideoCapture(file)
    j = 0
    print('start prossing NO.%d video' % (k + 1))
    while True:
        ret, frame = cap.read()
        j += 1
        if ret:
        #每三帧保存一张
            if j % 3 == 0:
                cv2.imwrite(os.path.join(frame_dir, '%d.jpg'%j), frame)
        else:
            cap.release()
            break
    print('prossed NO.%d video'%(k+1))

3. 删除相似度高的图片

# coding: utf-8
import os
import cv2
# from skimage.measure import compare_ssim
# from skimage.metrics import _structural_similarity
from skimage.metrics import structural_similarity as ssim

def delete(filename1):
    os.remove(filename1)

def list_all_files(root):
    files = []
    list = os.listdir(root)
    # os.listdir()方法：返回指定文件夹包含的文件或子文件夹名字的列表。该列表顺序以字母排序
    for i in range(len(list)):
        element = os.path.join(root, list[i])
        # 需要先使用python路径拼接os.path.join()函数，将os.listdir()返回的名称拼接成文件或目录的绝对路径再传入os.path.isdir()和os.path.isfile().
        if os.path.isdir(element):  # os.path.isdir()用于判断某一对象(需提供绝对路径)是否为目录
            # temp_dir = os.path.split(element)[-1]
            # os.path.split分割文件名与路径,分割为data_dir和此路径下的文件名，[-1]表示只取data_dir下的文件名
            files.append(list_all_files(element))

        elif os.path.isfile(element):
            files.append(element)
    # print('2',files)
    return files

def ssim_compare(img_files):
    count = 0
    for currIndex, filename in enumerate(img_files):
        if not os.path.exists(img_files[currIndex]):
            print('not exist', img_files[currIndex])
            break
        img = cv2.imread(img_files[currIndex])
        img1 = cv2.imread(img_files[currIndex + 1])
        #进行结构性相似度判断
        # ssim_value = _structural_similarity.structural_similarity(img,img1,multichannel=True)
        ssim_value = ssim(img,img1,multichannel=True)
        if ssim_value > 0.9:
            #基数
            count += 1
            imgs_n.append(img_files[currIndex + 1])
            print('big_ssim:',img_files[currIndex], img_files[currIndex + 1], ssim_value)
        # 避免数组越界
        if currIndex+1 >= len(img_files)-1:
            break
    return count

if __name__ == '__main__':
    path = '/home/dj/pythonfile/frame_output/'

    img_path = path
    imgs_n = []

    all_files = list_all_files(path) #返回包含完整路径的所有图片名的列表
    print('1',len(all_files))

    for files in all_files:
        # 根据文件名排序，x.rfind('/')是从右边寻找第一个‘/'出现的位置，也就是最后出现的位置
        # 注意sort和sorted的区别，sort作用于原列表，sorted生成新的列表，且sorted可以作用于所有可迭代对象
        files.sort(key = lambda x: int(x[x.rfind('/')+1:-4]))#路径中包含“/”
        # print(files)
        img_files = []
        for img in files:
            if img.endswith('.jpg'):
                # 将所有图片名都放入列表中
                img_files.append(img)
        count = ssim_compare(img_files)
        print(img[:img.rfind('/')],"路径下删除的图片数量为：",count)
    for image in imgs_n:
        delete(image)

4. 导入skimage.measure import compare_ssim出错的解决方法：

将

from skimage.measure import compare_ssim

改为

from skimage.metrics import _structural_similarity

5. structural_similarity.py的源码

from warnings import warn
import numpy as np
from scipy.ndimage import uniform_filter, gaussian_filter

from ..util.dtype import dtype_range
from ..util.arraycrop import crop
from .._shared.utils import warn, check_shape_equality

__all__ = ['structural_similarity']

def structural_similarity(im1, im2,
                          *,
                          win_size=None, gradient=False, data_range=None,
                          multichannel=False, gaussian_weights=False,
                          full=False, **kwargs):
    """
    Compute the mean structural similarity index between two images.

    Parameters
    ----------
    im1, im2 : ndarray
        Images. Any dimensionality with same shape.
    win_size : int or None, optional
        The side-length of the sliding window used in comparison. Must be an
        odd value. If `gaussian_weights` is True, this is ignored and the
        window size will depend on `sigma`.
    gradient : bool, optional
        If True, also return the gradient with respect to im2.
    data_range : float, optional
        The data range of the input image (distance between minimum and
        maximum possible values). By default, this is estimated from the image
        data-type.
    multichannel : bool, optional
        If True, treat the last dimension of the array as channels. Similarity
        calculations are done independently for each channel then averaged.
    gaussian_weights : bool, optional
        If True, each patch has its mean and variance spatially weighted by a
        normalized Gaussian kernel of width sigma=1.5.
    full : bool, optional
        If True, also return the full structural similarity image.

    Other Parameters
    ----------------
    use_sample_covariance : bool
        If True, normalize covariances by N-1 rather than, N where N is the
        number of pixels within the sliding window.
    K1 : float
        Algorithm parameter, K1 (small constant, see [1]_).
    K2 : float
        Algorithm parameter, K2 (small constant, see [1]_).
    sigma : float
        Standard deviation for the Gaussian when `gaussian_weights` is True.

    Returns
    -------
    mssim : float
        The mean structural similarity index over the image.
    grad : ndarray
        The gradient of the structural similarity between im1 and im2 [2]_.
        This is only returned if `gradient` is set to True.
    S : ndarray
        The full SSIM image.  This is only returned if `full` is set to True.

    Notes
    -----
    To match the implementation of Wang et. al. [1]_, set `gaussian_weights`
    to True, `sigma` to 1.5, and `use_sample_covariance` to False.

    .. versionchanged:: 0.16
        This function was renamed from ``skimage.measure.compare_ssim`` to
        ``skimage.metrics.structural_similarity``.

    References
    ----------
    .. [1] Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P.
       (2004). Image quality assessment: From error visibility to
       structural similarity. IEEE Transactions on Image Processing,
       13, 600-612.
       https://ece.uwaterloo.ca/~z70wang/publications/ssim.pdf,
       :DOI:`10.1109/TIP.2003.819861`

    .. [2] Avanaki, A. N. (2009). Exact global histogram specification
       optimized for structural similarity. Optical Review, 16, 613-621.
       :arxiv:`0901.0065`
       :DOI:`10.1007/s10043-009-0119-z`

    """
    check_shape_equality(im1, im2)

    if multichannel:
        # loop over channels
        args = dict(win_size=win_size,
                    gradient=gradient,
                    data_range=data_range,
                    multichannel=False,
                    gaussian_weights=gaussian_weights,
                    full=full)
        args.update(kwargs)
        nch = im1.shape[-1]
        mssim = np.empty(nch)
        if gradient:
            G = np.empty(im1.shape)
        if full:
            S = np.empty(im1.shape)
        for ch in range(nch):
            ch_result = structural_similarity(im1[..., ch],
                                              im2[..., ch], **args)
            if gradient and full:
                mssim[..., ch], G[..., ch], S[..., ch] = ch_result
            elif gradient:
                mssim[..., ch], G[..., ch] = ch_result
            elif full:
                mssim[..., ch], S[..., ch] = ch_result
            else:
                mssim[..., ch] = ch_result
        mssim = mssim.mean()
        if gradient and full:
            return mssim, G, S
        elif gradient:
            return mssim, G
        elif full:
            return mssim, S
        else:
            return mssim

    K1 = kwargs.pop('K1', 0.01)
    K2 = kwargs.pop('K2', 0.03)
    sigma = kwargs.pop('sigma', 1.5)
    if K1 < 0:
        raise ValueError("K1 must be positive")
    if K2 < 0:
        raise ValueError("K2 must be positive")
    if sigma < 0:
        raise ValueError("sigma must be positive")
    use_sample_covariance = kwargs.pop('use_sample_covariance', True)

    if gaussian_weights:
        # Set to give an 11-tap filter with the default sigma of 1.5 to match
        # Wang et. al. 2004.
        truncate = 3.5

    if win_size is None:
        if gaussian_weights:
            # set win_size used by crop to match the filter size
            r = int(truncate * sigma + 0.5)  # radius as in ndimage
            win_size = 2 * r + 1
        else:
            win_size = 7   # backwards compatibility

    if np.any((np.asarray(im1.shape) - win_size) < 0):
        raise ValueError(
            "win_size exceeds image extent.  If the input is a multichannel "
            "(color) image, set multichannel=True.")

    if not (win_size % 2 == 1):
        raise ValueError('Window size must be odd.')

    if data_range is None:
        if im1.dtype != im2.dtype:
            warn("Inputs have mismatched dtype.  Setting data_range based on "
                 "im1.dtype.", stacklevel=2)
        dmin, dmax = dtype_range[im1.dtype.type]
        data_range = dmax - dmin

    ndim = im1.ndim

    if gaussian_weights:
        filter_func = gaussian_filter
        filter_args = {'sigma': sigma, 'truncate': truncate}
    else:
        filter_func = uniform_filter
        filter_args = {'size': win_size}

    # ndimage filters need floating point data
    im1 = im1.astype(np.float64)
    im2 = im2.astype(np.float64)

    NP = win_size ** ndim

    # filter has already normalized by NP
    if use_sample_covariance:
        cov_norm = NP / (NP - 1)  # sample covariance
    else:
        cov_norm = 1.0  # population covariance to match Wang et. al. 2004

    # compute (weighted) means
    ux = filter_func(im1, **filter_args)
    uy = filter_func(im2, **filter_args)

    # compute (weighted) variances and covariances
    uxx = filter_func(im1 * im1, **filter_args)
    uyy = filter_func(im2 * im2, **filter_args)
    uxy = filter_func(im1 * im2, **filter_args)
    vx = cov_norm * (uxx - ux * ux)
    vy = cov_norm * (uyy - uy * uy)
    vxy = cov_norm * (uxy - ux * uy)

    R = data_range
    C1 = (K1 * R) ** 2
    C2 = (K2 * R) ** 2

    A1, A2, B1, B2 = ((2 * ux * uy + C1,
                       2 * vxy + C2,
                       ux ** 2 + uy ** 2 + C1,
                       vx + vy + C2))
    D = B1 * B2
    S = (A1 * A2) / D

    # to avoid edge effects will ignore filter radius strip around edges
    pad = (win_size - 1) // 2

    # compute (weighted) mean of ssim
    mssim = crop(S, pad).mean()

    if gradient:
        # The following is Eqs. 7-8 of Avanaki 2009.
        grad = filter_func(A1 / D, **filter_args) * im1
        grad += filter_func(-S / B2, **filter_args) * im2
        grad += filter_func((ux * (A2 - A1) - uy * (B2 - B1) * S) / D,
                            **filter_args)
        grad *= (2 / im1.size)

        if full:
            return mssim, grad, S
        else:
            return mssim, grad
    else:
        if full:
            return mssim, S
        else:
            return mssim

到此这篇关于教你怎么用python删除相似度高的图片的文章就介绍到这了,更多相关python删除相似度高的图片内容请搜索我们以前的文章或继续浏览下面的相关文章希望大家以后多多支持我们！

python判断图片宽度和高度后删除图片的方法

本文实例讲述了python判断图片宽度和高度后删除图片的方法.分享给大家供大家参考.具体分析如下: Image对象有open方法却没有close方法,如果打开图片,判断图片高度和宽度,判断完成后希望删除或者给图片改名,是无法操作的,这段代码可以解决这个问题,注意open函数打开图片文件要使用二进制方式,及参数使用'rb',有的文章给出的只有个'r'参数,Image是无法open的 import os import Image fileName = 'c:/py/jb51.jpg' fp = op
用python删除文件夹中的重复图片(图片去重)

第一部分:判断两张图片是否相同要查找重复的图片,必然绕不开判断两张图片是否相同.判断两张图片简单呀!图片可以看成数组,比较两个数组是否相等不就行了.但是这样做太过简单粗暴,因为两个数组的每个元素都要一一比较,效率很低.为了尽量避免两个庞大的数组比较: 先进行两张图片的大小(byte)比较,若大小不相同,则两张图片不相同: 在两张图片的大小相同的前提下,进行两张图片的尺寸(长和宽)比较,若尺寸不相同,则两张不相同: 在两张图片的尺寸相同的前提下,进行两张图片的内容(即数组元素)比较,若内容不相同
使用python如何删除同一文件夹下相似的图片

前言最近整理图片发现,好多图片都非常相似,于是写如下代码去删除,有两种方法: 注:第一种方法只对于连续图片(例一个视频里截下的图片)准确率也较高,其效率高:第二种方法准确率高,但效率低方法一:相邻两个文件比较相似度,相似就把第二个加到新列表里,然后进行新列表去重,统一删除. 例如:有文件1-10,首先1和2相比较,若相似,则把2加入到新列表里,再接着2和3相比较,若不相似,则继续进行3和4比较-一直比到最后,然后删除新列表里的图片代码如下: #!/usr/bin/env python #
使用python opencv对目录下图片进行去重的方法

版本: 平台:ubuntu 14 / I5 / 4G内存 python版本:python2.7 opencv版本:2.13.4 依赖: 如果系统没有python,则需要进行安装 sudo apt-get install python sudo apt-get install python-dev sudo apt-get install python-pip sudo pip install numpy mathplotlib sudo apt-get install libcv-dev sud
python删除文件夹下相同文件和无法打开的图片

前天不小心把硬盘格式化了,丢了好多照片,后来用Recuva这款软件成功把文件恢复过来,可是恢复的文件中有好多重复的文件和无法打开的图片,所以写了两个python的小程序用来解决这个问题删除相同文件: #coding=utf-8 import os import os.path import Image import hashlib def get_md5(filename): m = hashlib.md5() mfile = open(filename, "rb") m.updat
Python实现的删除重复文件或图片功能示例【去重】

本文实例讲述了Python实现的删除重复文件或图片功能.分享给大家供大家参考,具体如下: 通过python爬虫或其他方式保存的图片文件通常包含一些重复的图片或文件, 通过下面的python代码可以将重复的文件删除以达到去重的目的.其中,文件目录结构如下图: # /usr/bin/env python # -*- coding:utf-8 -*- # 运行的代码文件要放到删除重复的文件或图片所包含的目录中 import os import hashlib def filecount(): file
python查找重复图片并删除（图片去重）

本文实例为大家分享了python查找重复图片并删除的具体代码,供大家参考,具体内容如下和网络爬虫配套的,也可单独使用,从网上爬下来的图片重复太多,代码支持识别不同尺寸大小一致的图片,并把重复的图片删除,只保留第一份. # -*- coding: utf-8 -*- import cv2 import numpy as np import os,sys,types def cmpandremove2(path): dirs = os.listdir(path) dirs.sort() if le
教你怎么用python删除相似度高的图片

1. 前言因为输入是视频,切完帧之后都是连续图片,所以我的目录结构如下: 其中frame_output是视频切帧后的保存路径,1和2文件夹分别对应两个是视频切帧后的图片. 2. 切帧代码如下: #encoding:utf-8 import os import sys import cv2 video_path = '/home/pythonfile/video/' # 绝对路径,video下有两段视频 out_frame_path = os.path.join(os.path.dirname(
基于python利用Pyecharts使高清图片导出并在PPT中动态展示

目录 1.前言 2.导出png格式图片 3.如何在PPT中展示pyecharts图片 1.前言 pyecharts 是一个用于生成 Echarts 图表的类库.Echarts 是百度开源的一个数据可视化 JS 库.用 Echarts 生成的图可视化效果非常棒,为了与 Python 进行对接,方便在 Python 中直接使用数据生成图”.pyecharts可以展示动态图,在线报告使用比较美观,并且展示数据方便,鼠标悬停在图上,即可显示数值.标签等.pyecharts画出的图很好看,但是怎么展示是个
python实现知乎高颜值图片爬取

导入相关包 import time import pydash import base64 import requests from lxml import etree from aip import AipFace from pathlib import Path 百度云人脸检测申请信息 #唯一必须填的信息就这三行 APP_ID = "xxxxxxxx" API_KEY = "xxxxxxxxxxxxxxxx" SECRET_KEY = "xxxxx
Python利用pywin32库实现将PPT导出为高清图片

目录一.安装库二.代码原理三.使用效果四.所有代码一.安装库需要安装pywin32库 pip install pywin32 二.代码原理 WPS高清图片导出需要会员,就为了一个这个小需求开一个会员太亏了,因此就使用python对ppt进行高清图片导出. 设置format=19即可: ppt.SaveAs(imgs_path, 19) 三.使用效果输入一个文件路径 path = 'D:\\自动化\\课件.pptx' 最后的效果: 会在路径下创建一个‘’课件‘’文件夹里面是所有转换
教你学会通过python的matplotlib库绘图

一.前言 python的matplotlib库很强大可以绘制各种类型的图像. 首先要装一些基础的库,如numpy,matplotlib或是pandas. 二.基础命令首先介绍绘图时常用的基础命令: 1.plt.plot(x,y)即为绘图命令. ①基础画图: plt.plot(x, y) ②设置颜色: color属性如果没有特别要求的话可以不手动设置颜色,如果要在一张图上画不同的线时,会自动分配颜色.也可以使用ax.plot效果相同. plt.plot(x, y, color = 'red')
教你怎么用Python操作MySql数据库

一.关于Python操作数据库的概述 Python所有的数据库接口程序都在一定程度上遵守 Python DB-API 规范. DB-API定义了一系列必须的对象和数据库存取方式,以便为各种底层数据库系统和多种多样的数据库接口程序提供一致的访问接口.由于DB-API 为不同的数据库提供了一致的访问接口, 在不同的数据库之间移植代码成为一件轻松的事情. 在Python中如果要连接数据库,不管是MySQL.SQL Server.PostgreSQL亦或是SQLite,使用时都是采用游标的方式. 二.一
python删除指定类型（或非指定）的文件实例详解

本文实例分析了python删除指定类型(或非指定)的文件用法.分享给大家供大家参考.具体如下: 如下,删除目录下非源码文件 import os import string def del_files(dir,topdown=True): for root, dirs, files in os.walk(dir, topdown): for name in files: pathname = os.path.splitext(os.path.join(root, name)) if (pathna
Python删除空文件和空文件夹的方法

本文实例讲述了Python删除空文件和空文件夹的方法.分享给大家供大家参考.具体实现方法如下: #-*- coding:cp936 -*- """ os.walk() 函数声明:walk(top,topdown=True,onerror=None) 1>参数top表示需要遍历的目录树的路径 2>参数topdown的默认值是"True",表示首先返回目录树下的文件,然后在遍历目录树的子目录.Topdown的值为"False"时
python删除列表中重复记录的方法

本文实例讲述了python删除列表中重复记录的方法.分享给大家供大家参考.具体实现方法如下: def removeListDuplicates(seq): seen = set() seen_add = seen.add return [ x for x in seq if x not in seen and not seen_add(x) ] 希望本文所述对大家的Python程序设计有所帮助.
python删除特定文件的方法

本文实例讲述了python删除特定文件的方法.分享给大家供大家参考.具体如下: #!/usr/bin/python # -*- coding: utf-8 -*- import os def del_files(path): for root , dirs, files in os.walk(path): for name in files: if name.endswith(".CR2"): os.remove(os.path.join(root, name)) print (&qu

教你怎么用python删除相似度高的图片

1. 前言

2. 切帧代码如下：

3. 删除相似度高的图片

4. 导入skimage.measure import compare_ssim出错的解决方法：

5. structural_similarity.py的源码

相关推荐

随机推荐