c++如何读取word

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了c++如何读取word相关的知识,希望对你有一定的参考价值。

首先建立一个win32控制台程序,注意如图要选择支持MFC选项。

然后选择视图->类向导,点击Add Class;选择Import from Type Library;将Office安装目录下的MSWORD.OLB打开,会出现如下对话框:

将_Application,Documents,_Document,Range四个类导入。

在主函数的else语句中添加如下代码即可实现。

_Application wordApp; //在对象继承结构中是最顶层的对象,可以通过它的方法访问别的工作类型的对象(本例主要是word)
Documents docs; //代表所有打开的文档对象
_Document doc;
Range aRange;

COleVariant vTrue((short)TRUE),
vFalse((short)FALSE),
vOpt((long)DISP_E_PARAMNOTFOUND, VT_ERROR);

wordApp.CreateDispatch("Word.Application",NULL);
wordApp.SetVisible(FALSE);
docs=wordApp.GetDocuments();
doc=docs.Open(COleVariant(filename...),vFalse,vTrue,vFalse,vOpt,vOpt,vOpt,vOpt,vOpt,vOpt,vOpt,vOpt,vOpt,vOpt,vOpt,vOpt);
aRange=doc.Range(vOpt,vOpt);

string str(aRange.GetText());
cout<<str<<endl;
doc.Close(vOpt,vOpt,vOpt);
wordApp.Quit(vOpt,vOpt,vOpt);
CoUninitialize(); //关闭COM对象库,释放资源

参考技术A

基本步骤
(1)创建)一个 MFC 的程序工程。

 注意:在VC中对WORD进行操作需要在MFC AppWizard - Step 2 of
4中的Automaiton选项上打上勾。


(2)Ctrl+W 执行 ClassWizard(本文按照 VC6 操作,示例程序是在VC6 下编写测试的)。


(3)Add Class...\\From a type Library... 在 Office
目录中,找到想使用的类型库。(我使用的是 Office2003,其Word 的类型库文件,保存在 E:\\Program
Files\\Microsoft Office\\Office12\\MSWOR.OLB)。

 

(4)选择类型库文件后,在弹出的对话窗中继续选择要添加的类。具体选择什么类,要看你将来在程序中打算调用什么功能。当然,也可以不用考虑这么多,用鼠标和Shift键配合,全部选择也可以。


(5)初始化COM。方法一,找到App的InitInstance()函数,在其中添加
AfxOleInit()函数的调用;方法二,在需要调用COM功能的地方 CoInitialize(NULL),调用完毕后
CoUninitialize()。

(6)在你需要调用 Office 功能函数的 cpp 文件中
    
#include
<atlbase.h>  //
为了方便操作 VARIANT 类型变量,使用 CComVariant 模板类
    
#include "文件名.h"   //
具体的头文件名,是由装载类型库的文件名决定的,如MSWORD。

示例程序:

 

//word应用程序
  _Application app;
  //初始化连接
  app.CreateDispatch("word.Application");
  Documents doc;
  CComVariant
a(_T(strWord)),b(false),c(0),d(true),aa(0),bb(1);
  _Document doc1;
  doc.AttachDispatch(app.GetDocuments());
  doc1.AttachDispatch(doc.Add(&a,&b,&c,&d));
  Range range;
  //求出文档的所选区域
  range=doc1.GetContent();//取出文件内容
  str=range.GetText();
  m_richedit.SetWindowText(str);
  //关闭
  app.Quit(&b,&c,&c);
  //释放环境
  app.ReleaseDispatch();

参考技术B freopen ("文件名+扩展名","w",stdout);

如何在 C++ 中读取 MNIST 数据?

【中文标题】如何在 C++ 中读取 MNIST 数据?【英文标题】:How to read MNIST data in C++? 【发布时间】:2011-11-27 15:35:12 【问题描述】:

我在阅读 C++ 中的 MNIST database of handwritten digits 时遇到问题。

它是二进制格式,我知道如何阅读,但我不知道 MNIST 的确切格式。

所以,想请教一下看过MNIST数据的人,关于MNIST数据的格式,你对如何用C++读取这些数据有什么建议吗?

【问题讨论】:

我认为您可以在Neural Network for Recognition of Handwritten Digits by Mike O'Neill中找到有关 MNIST 和 C++ 的所有信息 您可以在此处找到已解码的 MNIST 数据集版本:mnist-decoded.000webhostapp.com 【参考方案1】:
int reverseInt (int i) 

    unsigned char c1, c2, c3, c4;

    c1 = i & 255;
    c2 = (i >> 8) & 255;
    c3 = (i >> 16) & 255;
    c4 = (i >> 24) & 255;

    return ((int)c1 << 24) + ((int)c2 << 16) + ((int)c3 << 8) + c4;

void read_mnist(/*string full_path*/)

    ifstream file (/*full_path*/"t10k-images-idx3-ubyte.gz");
    if (file.is_open())
    
        int magic_number=0;
        int number_of_images=0;
        int n_rows=0;
        int n_cols=0;
        file.read((char*)&magic_number,sizeof(magic_number)); 
        magic_number= reverseInt(magic_number);
        file.read((char*)&number_of_images,sizeof(number_of_images));
        number_of_images= reverseInt(number_of_images);
        file.read((char*)&n_rows,sizeof(n_rows));
        n_rows= reverseInt(n_rows);
        file.read((char*)&n_cols,sizeof(n_cols));
        n_cols= reverseInt(n_cols);
        for(int i=0;i<number_of_images;++i)
        
            for(int r=0;r<n_rows;++r)
            
                for(int c=0;c<n_cols;++c)
                
                    unsigned char temp=0;
                    file.read((char*)&temp,sizeof(temp));

                
            
        
    

【讨论】:

请注意,以免其他人重复我的愚蠢错误:尽管此答案中的文件名具有“.gz”扩展名,但实际上,在使用文件之前必须先解压缩文件。 (原始网站上有一条评论可能被解释为另有说明。)如果文件的前四个字节是0x1f8b0808 而不是0x000008010x00000803,则表示该文件尚未解压缩。 你在 ifstream file() 中缺少 ios::binary 作为第二个参数。对我来说,否则它不起作用。【参考方案2】:

我最近对 ​​MNIST 数据做了一些工作。这是我用 Java 编写的一些代码,应该很容易移植过来:

import net.vivin.digit.DigitImage;    
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

/**
 * Created by IntelliJ IDEA.
 * User: vivin
 * Date: 11/11/11
 * Time: 10:07 AM
 */
public class DigitImageLoadingService 

    private String labelFileName;
    private String imageFileName;

    /** the following constants are defined as per the values described at http://yann.lecun.com/exdb/mnist/ **/

    private static final int MAGIC_OFFSET = 0;
    private static final int OFFSET_SIZE = 4; //in bytes

    private static final int LABEL_MAGIC = 2049;
    private static final int IMAGE_MAGIC = 2051;

    private static final int NUMBER_ITEMS_OFFSET = 4;
    private static final int ITEMS_SIZE = 4;

    private static final int NUMBER_OF_ROWS_OFFSET = 8;
    private static final int ROWS_SIZE = 4;
    public static final int ROWS = 28;

    private static final int NUMBER_OF_COLUMNS_OFFSET = 12;
    private static final int COLUMNS_SIZE = 4;
    public static final int COLUMNS = 28;

    private static final int IMAGE_OFFSET = 16;
    private static final int IMAGE_SIZE = ROWS * COLUMNS;


    public DigitImageLoadingService(String labelFileName, String imageFileName) 
        this.labelFileName = labelFileName;
        this.imageFileName = imageFileName;
    

    public List<DigitImage> loadDigitImages() throws IOException 
        List<DigitImage> images = new ArrayList<DigitImage>();

        ByteArrayOutputStream labelBuffer = new ByteArrayOutputStream();
        ByteArrayOutputStream imageBuffer = new ByteArrayOutputStream();

        InputStream labelInputStream = this.getClass().getResourceAsStream(labelFileName);
        InputStream imageInputStream = this.getClass().getResourceAsStream(imageFileName);

        int read;
        byte[] buffer = new byte[16384];

        while((read = labelInputStream.read(buffer, 0, buffer.length)) != -1) 
           labelBuffer.write(buffer, 0, read);
        

        labelBuffer.flush();

        while((read = imageInputStream.read(buffer, 0, buffer.length)) != -1) 
            imageBuffer.write(buffer, 0, read);
        

        imageBuffer.flush();

        byte[] labelBytes = labelBuffer.toByteArray();
        byte[] imageBytes = imageBuffer.toByteArray();

        byte[] labelMagic = Arrays.copyOfRange(labelBytes, 0, OFFSET_SIZE);
        byte[] imageMagic = Arrays.copyOfRange(imageBytes, 0, OFFSET_SIZE);

        if(ByteBuffer.wrap(labelMagic).getInt() != LABEL_MAGIC)  
            throw new IOException("Bad magic number in label file!");
        

        if(ByteBuffer.wrap(imageMagic).getInt() != IMAGE_MAGIC) 
            throw new IOException("Bad magic number in image file!");
        

        int numberOfLabels = ByteBuffer.wrap(Arrays.copyOfRange(labelBytes, NUMBER_ITEMS_OFFSET, NUMBER_ITEMS_OFFSET + ITEMS_SIZE)).getInt();
        int numberOfImages = ByteBuffer.wrap(Arrays.copyOfRange(imageBytes, NUMBER_ITEMS_OFFSET, NUMBER_ITEMS_OFFSET + ITEMS_SIZE)).getInt();

        if(numberOfImages != numberOfLabels) 
            throw new IOException("The number of labels and images do not match!");
        

        int numRows = ByteBuffer.wrap(Arrays.copyOfRange(imageBytes, NUMBER_OF_ROWS_OFFSET, NUMBER_OF_ROWS_OFFSET + ROWS_SIZE)).getInt();
        int numCols = ByteBuffer.wrap(Arrays.copyOfRange(imageBytes, NUMBER_OF_COLUMNS_OFFSET, NUMBER_OF_COLUMNS_OFFSET + COLUMNS_SIZE)).getInt();

        if(numRows != ROWS && numRows != COLUMNS) 
            throw new IOException("Bad image. Rows and columns do not equal " + ROWS + "x" + COLUMNS);
        

        for(int i = 0; i < numberOfLabels; i++) 
            int label = labelBytes[OFFSET_SIZE + ITEMS_SIZE + i];
            byte[] imageData = Arrays.copyOfRange(imageBytes, (i * IMAGE_SIZE) + IMAGE_OFFSET, (i * IMAGE_SIZE) + IMAGE_OFFSET + IMAGE_SIZE);

            images.add(new DigitImage(label, imageData));
        

        return images;
    

【讨论】:

DigitImage 在哪里?而numRows != COLUMNS 似乎不对...【参考方案3】:

为了它的价值,我已经调整了@mrgloom 的代码:

用于读取图像数据集:

uchar** read_mnist_images(string full_path, int& number_of_images, int& image_size) 
    auto reverseInt = [](int i) 
        unsigned char c1, c2, c3, c4;
        c1 = i & 255, c2 = (i >> 8) & 255, c3 = (i >> 16) & 255, c4 = (i >> 24) & 255;
        return ((int)c1 << 24) + ((int)c2 << 16) + ((int)c3 << 8) + c4;
    ;

    typedef unsigned char uchar;

    ifstream file(full_path, ios::binary);

    if(file.is_open()) 
        int magic_number = 0, n_rows = 0, n_cols = 0;

        file.read((char *)&magic_number, sizeof(magic_number));
        magic_number = reverseInt(magic_number);

        if(magic_number != 2051) throw runtime_error("Invalid MNIST image file!");

        file.read((char *)&number_of_images, sizeof(number_of_images)), number_of_images = reverseInt(number_of_images);
        file.read((char *)&n_rows, sizeof(n_rows)), n_rows = reverseInt(n_rows);
        file.read((char *)&n_cols, sizeof(n_cols)), n_cols = reverseInt(n_cols);

        image_size = n_rows * n_cols;

        uchar** _dataset = new uchar*[number_of_images];
        for(int i = 0; i < number_of_images; i++) 
            _dataset[i] = new uchar[image_size];
            file.read((char *)_dataset[i], image_size);
        
        return _dataset;
     else 
        throw runtime_error("Cannot open file `" + full_path + "`!");
    

用于读取标签数据集:

uchar* read_mnist_labels(string full_path, int& number_of_labels) 
    auto reverseInt = [](int i) 
        unsigned char c1, c2, c3, c4;
        c1 = i & 255, c2 = (i >> 8) & 255, c3 = (i >> 16) & 255, c4 = (i >> 24) & 255;
        return ((int)c1 << 24) + ((int)c2 << 16) + ((int)c3 << 8) + c4;
    ;

    typedef unsigned char uchar;

    ifstream file(full_path, ios::binary);

    if(file.is_open()) 
        int magic_number = 0;
        file.read((char *)&magic_number, sizeof(magic_number));
        magic_number = reverseInt(magic_number);

        if(magic_number != 2049) throw runtime_error("Invalid MNIST label file!");

        file.read((char *)&number_of_labels, sizeof(number_of_labels)), number_of_labels = reverseInt(number_of_labels);

        uchar* _dataset = new uchar[number_of_labels];
        for(int i = 0; i < number_of_labels; i++) 
            file.read((char*)&_dataset[i], 1);
        
        return _dataset;
     else 
        throw runtime_error("Unable to open file `" + full_path + "`!");
    


编辑: 感谢@Jürgen Brauer 提醒我更正我的答案,虽然我已经及时在我的代码中修复了它,但忘记更新答案。

【讨论】:

我打印了一个 28x28 数字的值,它们都接近 0。我发现,写 ifstream file(full_path, ios::binary); 是很重要的。而不是 ifstream file(full_path);在上面的代码中。希望这有助于防止其他人浪费时间。【参考方案4】:

以下代码来自caffe,我做了一些更改并将其转换为cv::Mat

uint32_t swap_endian(uint32_t val) 
    val = ((val << 8) & 0xFF00FF00) | ((val >> 8) & 0xFF00FF);
    return (val << 16) | (val >> 16);


void read_mnist_cv(const char* image_filename, const char* label_filename)
    // Open files
    std::ifstream image_file(image_filename, std::ios::in | std::ios::binary);
    std::ifstream label_file(label_filename, std::ios::in | std::ios::binary);

    // Read the magic and the meta data
    uint32_t magic;
    uint32_t num_items;
    uint32_t num_labels;
    uint32_t rows;
    uint32_t cols;

    image_file.read(reinterpret_cast<char*>(&magic), 4);
    magic = swap_endian(magic);
    if(magic != 2051)
        cout<<"Incorrect image file magic: "<<magic<<endl;
        return;
    

    label_file.read(reinterpret_cast<char*>(&magic), 4);
    magic = swap_endian(magic);
    if(magic != 2049)
        cout<<"Incorrect image file magic: "<<magic<<endl;
        return;
    

    image_file.read(reinterpret_cast<char*>(&num_items), 4);
    num_items = swap_endian(num_items);
    label_file.read(reinterpret_cast<char*>(&num_labels), 4);
    num_labels = swap_endian(num_labels);
    if(num_items != num_labels)
        cout<<"image file nums should equal to label num"<<endl;
        return;
    

    image_file.read(reinterpret_cast<char*>(&rows), 4);
    rows = swap_endian(rows);
    image_file.read(reinterpret_cast<char*>(&cols), 4);
    cols = swap_endian(cols);

    cout<<"image and label num is: "<<num_items<<endl;
    cout<<"image rows: "<<rows<<", cols: "<<cols<<endl;

    char label;
    char* pixels = new char[rows * cols];

    for (int item_id = 0; item_id < num_items; ++item_id) 
        // read image pixel
        image_file.read(pixels, rows * cols);
        // read label
        label_file.read(&label, 1);

        string sLabel = std::to_string(int(label));
        cout<<"lable is: "<<sLabel<<endl;
        // convert it to cv Mat, and show it
        cv::Mat image_tmp(rows,cols,CV_8UC1,pixels);
        // resize bigger for showing
        cv::resize(image_tmp, image_tmp, cv::Size(100, 100));
        cv::imshow(sLabel, image_tmp);
        cv::waitKey(0);
    

    delete[] pixels;

用法(我已经简化了代码,省略了标题和命名空间):

string base_dir = "/home/xy/caffe-master/data/mnist/";
string img_path = base_dir + "train-images-idx3-ubyte";
string label_path = base_dir + "train-labels-idx1-ubyte";

read_mnist_cv(img_path.c_str(), label_path.c_str());

输出如下:

【讨论】:

【参考方案5】:

通过使用 in() ,您可以读取任何大小的数据。

const int MAXN = 6e4 + 7;
unsigned int image[MAXN][30][30];
unsigned int num, magic, rows, cols;
unsigned int label[MAXN];
unsigned int in(ifstream& icin, unsigned int size) 
    unsigned int ans = 0;
    for (int i = 0; i < size; i++) 
        unsigned char x;
        icin.read((char*)&x, 1);
        unsigned int temp = x;
        ans <<= 8;
        ans += temp;
    
    return ans;

void input() 
    ifstream icin;
    icin.open("train-images.idx3-ubyte", ios::binary);
    magic = in(icin, 4), num = in(icin, 4), rows = in(icin, 4), cols = in(icin, 4);
    for (int i = 0; i < num; i++) 
        for (int x = 0; x < rows; x++) 
            for (int y = 0; y < cols; y++) 
                image[i][x][y] = in(icin, 1);
            
        
    
    icin.close();
    icin.open("train-labels.idx1-ubyte", ios::binary);
    magic = in(icin, 4), num = in(icin, 4);
    for (int i = 0; i < num; i++) 
        label[i] = in(icin, 1);
    

【讨论】:

以上是关于c++如何读取word的主要内容,如果未能解决你的问题,请参考以下文章

python如何读取word文件

如何用c++语言将word转换成图片?

java如何实现读取word文件并按指定word样式格式输出

C++代码粘贴到word文档中如何保持代码关键字的颜色不变

如何使用 POI 读取 word 文档中每个单词的字体大小?

如何用C语言读取word中的数据