Python实现随机从图像中获取多个patch

2025-06-22 22:58:45

经常有一些图像任务需要从一张大图中截取固定大小的patch来进行训练。这里面常常存在下面几个问题：

patch的位置尽可能随机，不然数据丰富性可能不够，容易引起过拟合
如果原图较大，读图带来的IO开销可能会非常大，影响训练速度，所以最好一次能够截取多个patch
我们经常不太希望因为随机性的存在而使得图像中某些区域没有被覆盖到，所以还需要注意patch位置的覆盖程度

基于以上问题，我们可以使用下面的策略从图像中获取位置随机的多个patch：

以固定的stride获取所有patch的左上角坐标
对左上角坐标进行随机扰动
对patch的左上角坐标加上宽和高得到右下角坐标
检查patch的坐标是否超出图像边界，如果超出则将其收进来，收的过程应保证patch尺寸不变
加入ROI（Region Of Interest）功能，也就是说patch不一定非要在整张图中获取，而是可以指定ROI区域

下面是实现代码和例子：

注意下面代码只是获取了patch的bounding box，并没有把patch截取出来。

# -*- coding: utf-8 -*-
import cv2
import numpy as np

def get_random_patch_bboxes(image, bbox_size, stride, jitter, roi_bbox=None):
    """
    Generate random patch bounding boxes for a image around ROI region

    Parameters
    ----------
    image: image data read by opencv, shape is [H, W, C]
    bbox_size: size of patch bbox, one digit or a list/tuple containing two
        digits, defined by (width, height)
    stride: stride between adjacent bboxes (before jitter), one digit or a
        list/tuple containing two digits, defined by (x, y)
    jitter: jitter size for evenly distributed bboxes, one digit or a
        list/tuple containing two digits, defined by (x, y)
    roi_bbox: roi region, defined by [xmin, ymin, xmax, ymax], default is whole
        image region

    Returns
    -------
    patch_bboxes: randomly distributed patch bounding boxes, n x 4 numpy array.
        Each bounding box is defined by [xmin, ymin, xmax, ymax]
    """
    height, width = image.shape[:2]
    bbox_size = _process_geometry_param(bbox_size, min_value=1)
    stride = _process_geometry_param(stride, min_value=1)
    jitter = _process_geometry_param(jitter, min_value=0)

    if bbox_size[0] > width or bbox_size[1] > height:
        raise ValueError('box_size must be <= image size')

    if roi_bbox is None:
        roi_bbox = [0, 0, width, height]

    # tl is for top-left, br is for bottom-right
    tl_x, tl_y = _get_top_left_points(roi_bbox, bbox_size, stride, jitter)
    br_x = tl_x + bbox_size[0]
    br_y = tl_y + bbox_size[1]

    # shrink bottom-right points to avoid exceeding image border
    br_x[br_x > width] = width
    br_y[br_y > height] = height
    # shrink top-left points to avoid exceeding image border
    tl_x = br_x - bbox_size[0]
    tl_y = br_y - bbox_size[1]
    tl_x[tl_x < 0] = 0
    tl_y[tl_y < 0] = 0
    # compute bottom-right points again
    br_x = tl_x + bbox_size[0]
    br_y = tl_y + bbox_size[1]

    patch_bboxes = np.concatenate((tl_x, tl_y, br_x, br_y), axis=1)
    return patch_bboxes

def _process_geometry_param(param, min_value):
    """
    Process and check param, which must be one digit or a list/tuple containing
    two digits, and its value must be >= min_value

    Parameters
    ----------
    param: parameter to be processed
    min_value: min value for param

    Returns
    -------
    param: param after processing
    """
    if isinstance(param, (int, float)) or \
            isinstance(param, np.ndarray) and param.size == 1:
        param = int(np.round(param))
        param = [param, param]
    else:
        if len(param) != 2:
            raise ValueError('param must be one digit or two digits')
        param = [int(np.round(param[0])), int(np.round(param[1]))]

    # check data range using min_value
    if not (param[0] >= min_value and param[1] >= min_value):
        raise ValueError('param must be >= min_value (%d)' % min_value)
    return param

def _get_top_left_points(roi_bbox, bbox_size, stride, jitter):
    """
    Generate top-left points for bounding boxes

    Parameters
    ----------
    roi_bbox: roi region, defined by [xmin, ymin, xmax, ymax]
    bbox_size: size of patch bbox, a list/tuple containing two digits, defined
        by (width, height)
    stride: stride between adjacent bboxes (before jitter), a list/tuple
        containing two digits, defined by (x, y)
    jitter: jitter size for evenly distributed bboxes, a list/tuple containing
        two digits, defined by (x, y)

    Returns
    -------
    tl_x: x coordinates of top-left points, n x 1 numpy array
    tl_y: y coordinates of top-left points, n x 1 numpy array
    """
    xmin, ymin, xmax, ymax = roi_bbox
    roi_width = xmax - xmin
    roi_height = ymax - ymin

    # get the offset between the first top-left point of patch box and the
    # top-left point of roi_bbox
    offset_x = np.arange(0, roi_width, stride[0])[-1] + bbox_size[0]
    offset_y = np.arange(0, roi_height, stride[1])[-1] + bbox_size[1]
    offset_x = (offset_x - roi_width) // 2
    offset_y = (offset_y - roi_height) // 2

    # get the coordinates of all top-left points
    tl_x = np.arange(xmin, xmax, stride[0]) - offset_x
    tl_y = np.arange(ymin, ymax, stride[1]) - offset_y
    tl_x, tl_y = np.meshgrid(tl_x, tl_y)
    tl_x = np.reshape(tl_x, [-1, 1])
    tl_y = np.reshape(tl_y, [-1, 1])

    # jitter the coordinates of all top-left points
    tl_x += np.random.randint(-jitter[0], jitter[0] + 1, size=tl_x.shape)
    tl_y += np.random.randint(-jitter[1], jitter[1] + 1, size=tl_y.shape)
    return tl_x, tl_y

if __name__ == '__main__':
    image = cv2.imread('1.bmp')
    patch_bboxes = get_random_patch_bboxes(
        image,
        bbox_size=[64, 96],
        stride=[128, 128],
        jitter=[32, 32],
        roi_bbox=[500, 200, 1500, 800])

    colors = [
        (255, 0, 0),
        (0, 255, 0),
        (0, 0, 255),
        (255, 255, 0),
        (255, 0, 255),
        (0, 255, 255)]
    color_idx = 0

    for bbox in patch_bboxes:
        color_idx = color_idx % 6
        pt1 = (bbox[0], bbox[1])
        pt2 = (bbox[2], bbox[3])
        cv2.rectangle(image, pt1, pt2, color=colors[color_idx], thickness=2)
        color_idx += 1

    cv2.namedWindow('image', 0)
    cv2.imshow('image', image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    cv2.imwrite('image.png', image)

在实际应用中可以进一步增加一些简单的功能：

1.根据位置增加一些过滤功能。比如说太靠近边缘的给剔除掉，有些算法可能有比较严重的边缘效应，所以此时我们可能不太想要边缘的数据加入训练

2.也可以根据某些简单的算法策略进行过滤。比如在超分辨率这样的任务中，我们可能一般不太关心面积非常大的平坦区域，比如纯色墙面，大片天空等，此时可以使用方差进行过滤

3.设置最多保留数目。有时候原图像的大小可能有很大差异，此时利用上述方法得到的patch数量也就随之有很大的差异，然而为了保持训练数据的均衡性，我们可以设置最多保留数目，为了确保覆盖程度，一般需要在截取之前对patch进行shuffle，或者计算stride

以上就是Python实现随机从图像中获取多个patch的详细内容，更多关于Python图像获取patch的资料请关注我们其它相关文章！

使用Python-OpenCV消除图像中孤立的小区域操作

之前一直使用Skimage中的形态学处理来进行孤立小区域的去除,代码如下 img = morphology.remove_small_objects(img, size) img = morphology.remove_small_holes(img, size) 后面需要将相应算法翻译到C++环境中,而Skimage没有对应的C++版本,为了确保python算法和C++算法结果的一致性,需要进行迁移,因而打算使用OpenCV来重写去除孤立小区域的代码.代码如下: _,binary = cv2.
python+opencv图像分割实现分割不规则ROI区域方法汇总

在图像分割领域,一个重要任务便是分割出感兴趣(ROI)区域.如果是简易的矩形ROI区域其实是非常容易分割的,opencv的官方python教程里也有教到最简易的矩形ROI分割(剪裁),其本质是多维数组(矩阵)的切片.但是现实情况中,ROI是不规则的多边形,也可能是曲线边界,那么该如何分割出来呢?下面总结几种思路. 可能只提供核心部分的代码示例,具体应用要结合你自己的项目来修正. 一.已知边界坐标,直接画出多边形例:最基础的画个四边形 # 定义四个顶点坐标 pts = np.array([[10
python用opencv批量截取图像指定区域的方法

代码如下 import os import cv2 for i in range(1,201): if i==169 or i==189: i = i+1 pth = "C:\\Users\\Desktop\\asd\\"+str(i)+".bmp" image = cv2.imread(pth) //从指定路径读取图像 cropImg = image[600:1200,750:1500] //获取感兴趣区域 cv2.imwrite("C:\\Users\
Python OpenCV图像指定区域裁剪的实现

在工作中.在做数据集时,需要对图片进行处理,照相的图片我们只需要特定的部分,所以就想到裁剪一种所需的部分.当然若是图片有规律可循则使用opencv对其进行膨胀腐蚀等操作.这样更精准一些. 一.指定图像位置的裁剪处理 import os import cv2 # 遍历指定目录,显示目录下的所有文件名 def CropImage4File(filepath,destpath): pathDir = os.listdir(filepath) # 列出文件路径中的所有路径或文件 for allDir i
python eventlet绿化和patch原理

说明 eventlet是一个必备工具,经常用,绿化原理有点忘记了,重新复习一遍. 三个主要问题 1. 绿化的原理是什么? 2. 绿化怎么管理? 3. 绿化怎么引入? 绿化原理利用select/epolls/kqueue等操作系统提供的非阻塞操作,将阻塞改为非阻塞. 引用管理 eventlet在import之后,将模块中的属性绿化. 用一小段代码来查看看 import sys import eventlet # eventlet.monkey_patch() httplib2 = eventl
Python实现随机从图像中获取多个patch

经常有一些图像任务需要从一张大图中截取固定大小的patch来进行训练.这里面常常存在下面几个问题: patch的位置尽可能随机,不然数据丰富性可能不够,容易引起过拟合如果原图较大,读图带来的IO开销可能会非常大,影响训练速度,所以最好一次能够截取多个patch 我们经常不太希望因为随机性的存在而使得图像中某些区域没有被覆盖到,所以还需要注意patch位置的覆盖程度基于以上问题,我们可以使用下面的策略从图像中获取位置随机的多个patch: 以固定的stride获取所有patch的左上角坐标对
Python图像处理之识别图像中的文字(实例讲解)

①安装PIL:pip install Pillow(之前的博客中有写过) ②安装pytesser3:pip install pytesser3 ③安装pytesseract:pip install pytesseract ④安装autopy3: 先安装wheel:pip install wheel 下载autopy3-0.51.1-cp36-cp36m-win_amd64.whl[点击打开链接] 执行命令:pip install E:\360安全浏览器下载\autopy3-0.51.1-cp36
Python从单元素字典中获取key和value的实例

之前写代码很多时候会遇到这么一种情况:在python的字典中只有一个key/value键值对,想要获取其中的这一个元素还要写个for循环获取. 网上搜了一下,发现还有很多简单的方法: 方法一 d = {'name':'haohao'} (key, value), = d.items() 方法二 d = {'name':'haohao'} key = list(d)[0] value = list(d.values())[0] 方法三 d = {'name':'haohao'} key, = d
python opencv 找出图像中的最大轮廓并填充(生成mask)

本文主要介绍了python opencv 找出图像中的最大轮廓并填充,分享给大家,具体如下: import cv2 import numpy as np from PIL import Image from joblib import Parallel from joblib import delayed # Parallel 和 delayed是为了使用多线程处理 # 使用前需要安装joblib:pip install joblib # img_stack的shape为:num, h, w #
使用Python和OpenCV检测图像中的物体并将物体裁剪下来

介绍硕士阶段的毕设是关于昆虫图像分类的,代码写到一半,上周五导师又给我新的昆虫图片数据集了,新图片中很多图片很大,但是图片中的昆虫却很小,所以我就想着先处理一下图片,把图片中的昆虫裁剪下来,这样除去大部分无关背景,应该可以提高识别率. 原图片举例(将红色矩形框部分裁剪出来)): step1:加载图片,转成灰度图 image = cv2.imread("353.jpg") gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) step2:用Sob
python如何在一个py文件中获取另一个py文件中的值(一个或多个)

目录如何在一个py文件中获取另一个py文件中的值(一个或多个) 在一个py文件中调用另一个py文件中的变量一.目的二.解决方案三.实例演示如何在一个py文件中获取另一个py文件中的值(一个或多个) 创建两个py文件分别为z1.py 和 z2.py,运行z1.py时,将z1中的值传递给z2(z2获取z1中的值) z1.py import os key = '123' if __name__ == '__main__': os.system("python z2.py {}&q
Python在字典中获取带权重的随机值实现方式

一.前言 python在数组中随机取值有现成的方法,但是要给每个随机值被取到的概率加权重的话,可以参考下面这个方法二.实现方式 import random def random_with_weight(data_dict): sum_wt = sum(data_dict.values()) # 计算权重和 sum_wt ra_wt = random.uniform(0, sum_wt) # 随机获取 0-sum_wt 之间的一个浮点数 ra_wt cur_wt = 0 for key in d
在python带权重的列表中随机取值的方法

1 random.choice python random模块的choice方法随机选择某个元素 foo = ['a', 'b', 'c', 'd', 'e'] from random import choice print choice(foo) 2 random.sample 使用python random模块的sample函数从列表中随机选择一组元素 list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] slice = random.sample(list, 5)
使用python批量修改XML文件中图像的depth值

最近刚刚接触深度学习,并尝试学习制作数据集,制作过程中发现了一个问题,现在跟大家分享一下.问题是这样的,在制作voc数据集时,我采集的是灰度图像,并已经用labelimg生成了每张图像对应的XML文件.训练时发现好多目标检测模型使用的训练集是彩色图像,因此特征提取网络的输入是m×m×3的维度的图像.所以我就想着把我采集的灰度图像的深度也改成3吧.批量修改了图像的深度后,发现XML中的depth也要由1改成3才行.如果重新对图像标注一遍生成XML文件的话太麻烦,所以就想用python批量处理一下.
Python使用Keras OCR实现从图像中删除文本

目录介绍处理实现 Keras ocr简介 cv2修复函数汇总结尾介绍本文将讨论如何快速地从图像中删除文本,作为图像分类器的预处理步骤. 删除文本可能有多种或多种原因,例如,我们可以使用无文本图像进行数据增强. 在本教程中,我们将使用OCR(光学字符识别)检测图像中的文本,并在修复过程中填充照片中丢失的部分以生成完整的图像——以删除我们检测到的文本. 处理为了从图像中删除文本,我们将执行以下三个步骤: 1.识别图像中的文本,并使用KerasOCR获取每个文本的边界框坐标. 2.对于

Python实现随机从图像中获取多个patch

相关推荐

随机推荐