LMDB 数据打包
写入:
- 准备环境路径
lmdb.open
打开的是一个文件夹,在当前文件夹下面会创建两个.lmdb文件。1
2
3
4dataroot_GT = "/youtu_action_data/denoise/sidd/lmdb/GT/"
if not os.path.exists(dataroot_GT):
os.makedirs(dataroot_GT)
env = lmdb.open(os.path.join(dataroot_GT, 'medium_imgs_train'), map_size=int(1099511627776))
其中map_size
需要足够大,不然会报错,申请的只是最大,实际大小由最终写入的大小决定。
写入数据转码。
1
2
3
4
5
6
7
8
9
10
11...
with env.begin(write=True) as txn:
...
pch_gt = im_gt_int8[start_H:start_H + pch_size, start_W:start_W + pch_size, ]
# pch_imgs = np.concatenate((pch_noisy, pch_gt), axis=2)
pch_gt = pch_gt.tobytes()
key_ = path_all_gt[ii].split(".")[0] + "_" + str(inner_pch_num)
keys.append(key_)
txn.put(key_.encode('ascii'), pch_gt)索引键值可以记录在任意地方。这里可以记录在pickel文件中。
1
2
3
4dict_data["keys"] = keys
dict_data["resolution"] = "3_%d_%d" % (pch_size, pch_size)
with open(os.path.join(dataroot_GT, "meta_info.pkl"), 'wb') as fo:
pickle.dump(dict_data, fo)
读取:
打开lmdb 环境:
1
2lmdb.open("path_to_lmdb", readonly=True, lock=False, readahead=False,
meminit=False)取得键值,这里使用pickel文件存键值,一般其img路径就设置为键值,来进行索引。
1
2
3
4
5
6
7
8
9
10
11
12
13
def _get_paths_from_lmdb(dataroot):
'''get image path list from lmdb meta info'''
meta_info = pickle.load(open(os.path.join(dataroot, 'meta_info.pkl'), 'rb'))
paths = meta_info['keys']
sizes = meta_info['resolution']
if len(sizes) == 1:
sizes = sizes * len(paths)
return paths, sizes
...
with env.begin(write=False) as txn:
buf = txn.get(key.encode('ascii'))
...读取:需要将大小还原。与nori同理。
1
2
3
4
5
6
7
8
9def _read_img_lmdb(env, key, size):
'''read image from lmdb with key (w/ and w/o fixed size)
size: (C, H, W) tuple'''
with env.begin(write=False) as txn:
buf = txn.get(key.encode('ascii'))
img_flat = np.frombuffer(buf, dtype=np.uint8)
C, H, W = size
img = img_flat.reshape(H, W, C)
return img注意在dataset中不用每一次都open一次lmdb