使用shell脚本批量运行caffe程序

Posted 2022-12-04 小丫头い

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了使用shell脚本批量运行caffe程序相关的知识，希望对你有一定的参考价值。

写这个博文的初衷是这样的：老师让我训练LeNet，并且修改它的网络架构（多种变形），然后每一种做N次重复试验求平均值，最后和随机权重的网络进行比较；如此多的训练网络以及如此多重复的内容，便激发了我写shell脚本来自动化运行它；

主要的shell脚本

#!/usr/bin/env sh 

folder="/path/"
solver="lenet_solver.prototxt" #solver文件保持不变
N=10   # 每个网络训练N次

for file in $folder*
do
    filename=$(basename $file)  
    if [[ "$filename" == lenet_train_test*.prototxt ]]  #用模糊匹配的方式遍历所有的网络架构文件
    then 
        for i in $(seq $N)
        do
            python $foldermodifySolver.py $folder $solver $filename $i #这个python脚本用来修改solver中net和snap的位置
            ./build/tools/caffe train --solver=$folder$solver &> $folder$filename%.*"_log_"$i".md" # 保存日志文件，便于后续解析日志
        done
    fi
done

修改solver文件的python脚本

其实对于网络架构文件本来也可以使用这种方式，但是会产生大量冗余代码，于是手动写了网络架构，这样也不至于出错。

#!/usr/bin/python 
import caffe
from caffe import proto 
from google.protobuf.text_format import Merge
import sys

if __name__=="__main__":
    if(len(sys.argv)<=4):
        print "you should input three argv:folder,solver,net"
    solver = proto.caffe_pb2.SolverParameter()
    Merge((open(sys.argv[1]+sys.argv[2],'r').read()), solver)
    solver.net = sys.argv[1]+sys.argv[3] # change net file name
    solver.snapshot_prefix = sys.argv[1]+sys.argv[3][:sys.argv[3].find(".")]+sys.argv[4] # change model prefix
    with open(sys.argv[1]+sys.argv[2], 'w') as f:
        f.write(str(solver))

提取日志信息，对实验结果取平均值

import re
import numpy as np
import os

# input log file and output accuracy array
def parse(filepath):
    pattern = re.compile(r".*Test net output #0: accuracy = (.*)")
    f = open(filepath).readlines()
    lst = []
    for line in f:
    match = pattern.match(line)
    if match:
        lst.append(float(match.group(1)))
    return lst


folder = "/home/shipan/Work/MnistRandom/"

# get all the net name(without postfix) and log name(with postfix)
netNames = []
logNames = []
path = os.walk(folder)
for root,dirs,files in path: 
    for file in files:
        if(file.startswith('lenet_train_test') and file.endswith('.prototxt')):
            netNames.append(file[:file.index('.')])
    if(file.startswith('lenet_train_test') and file.endswith('.md')):
        logNames.append(file)

# get all the accuracy data
allResults = [] # get all the accuracy data
average = [] # get each net's average accuracy

output = open(folder+'averageAccuracy.md','w')
for netName in netNames:
    allResult = []
    finalResult = []
    for logName in logNames:
    if(logName.startswith(netName+'_log')):     
        accuracy_log = parse(folder+logName)
        allResult.append(accuracy_log)
        finalResult.append(accuracy_log[-1]) # just save the final result
    allResults.append(allResult)
    sum = 0
    output.write("(Accuracy)ten times training of "+str(netName)+" is:"+str(finalResult))
    output.write('\\n')
    for result in finalResult:
     sum += result
    average.append(sum/10)
output.write("average accuracys are: "+str(average))

以上是关于使用shell脚本批量运行caffe程序的主要内容，如果未能解决你的问题，请参考以下文章