使用 C 使用多线程从文本文件中解析单词

Posted

技术标签:

【中文标题】使用 C 使用多线程从文本文件中解析单词【英文标题】:Parsing words from text file using multithreading using C 【发布时间】:2021-12-30 11:57:40 【问题描述】:

目前我正在尝试从目录中的所有文本文件中解析单词(在这种情况下,可以安全地假设目录中只有文本文件)。似乎我能够在线程函数中打开文件,但是我无法获取其中的文本。没有显示错误消息,但splitInput 中的 printf 没有打印到终端。

请原谅我在代码中的语义工作,我是 C 的新手!除此之外,main 中可能还有未使用的代码,因为这将是更大项目的一部分。提前感谢您的帮助!

#include <stdlib.h>
#include <dirent.h>
#include <pthread.h>
#include <string.h>
#include <stdio.h>
#include <stdint.h>
#include "queue.h"

void* splitInput(void *filename) 

  printf("Thread %s Created\n", (char*)filename);

  FILE *file;

  int i = 0;
  char *cp;
  char *bp;
  char line[255];
  char *array[5000];

  file = fopen((char*)filename, "r");

  if(file == NULL) 
    perror("Error opening file");
  



  printf("Opened File %s\n", (char*)filename);

  while(fgets(line, sizeof(line), file) != NULL) 

    bp = line;

    while(1) 

      cp = strtok(bp, ",.!? \n");
      bp = NULL;

      if(cp == NULL) 
        break;
      

      array[i++] = cp;

      printf("Check print - word %i:%s:\n", i-1, cp);

    
  

  fclose(file);

  return 0;




int main(int argc, char *argv[]) 

  DIR* d;

  struct dirent* e;

  // grab our queueSize and threadCount
  int queueSize = atoi(argv[2]);
  int threadCount = atoi(argv[3]);

  // var for creating a thread each file
  int i = 0;

  // open the dir
  d = opendir(argv[1]);

  printf("Queue Size: %d\n", queueSize);

  printf("Thread Count: %d\n", threadCount);

  // set our thread count now that we know how many files are in dir
  pthread_t threads[threadCount];


  // read through our directory
  while((e = readdir(d)) != NULL) 

    // make sure we aren't reading . and ..
    if(strcmp(e->d_name, ".") == 0) 
      continue;
    

    if(strcmp(e->d_name, "..") == 0) 
      continue;
    

    printf("entered file %s\n", e->d_name);

    char *filename = strdup(e->d_name);

    if(i < threadCount) 

      // create our threads
      pthread_create(&threads[i], NULL, splitInput, filename);
    

    // increment i
    i++;
  

  // join our existing threads
  for(int j = 0; j < i; j++) 
    pthread_join(threads[j], NULL);
  

  return 0;

电流输出

device@user:~/os/testdir$ ./output ~/os/testdir/test 10 10 output
Queue Size: 10
Thread Count: 10
entered file test
Thread test Created
Opened File test

【问题讨论】:

strtok 不是线程安全的。我没有仔细查看您的代码,但通常“多线程”和“strtok”不会一起使用。 【参考方案1】:

找到了答案,我试图在我的工作目录之外打开一个文件,如果没有完整路径就无法完成。将工作目录更改为给定的参数解决了这个问题。在这种情况下,可以使用chdir(argv[1])

修改后的代码如下。

#include <stdlib.h>
#include <dirent.h>
#include <pthread.h>
#include <string.h>
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include "queue.h"

void* splitInput(void *filename) 

  printf("Thread %s Created\n", (char*)filename);

  FILE *file;

  int i = 0;
  char *cp;
  char *bp;
  char line[255];
  char *array[5000];

  file = fopen((char*)filename, "r");

  if(file == NULL) 
    perror("Error opening file");
  



  printf("Opened File %s\n", (char*)filename);

  while(fgets(line, sizeof(line), file) != NULL) 

    bp = line;

    while(1) 

      cp = strtok(bp, ",.!? \n");
      bp = NULL;

      if(cp == NULL) 
        break;
      

      array[i++] = cp;

      printf("Check print - word %i:%s:\n", i-1, cp);

    
  

  fclose(file);

  return 0;




int main(int argc, char *argv[]) 

  DIR* d;

  struct dirent* e;

  // grab our queueSize and threadCount
  int queueSize = atoi(argv[2]);
  int threadCount = atoi(argv[3]);

  // var for creating a thread each file
  int i = 0;

  // open the dir
  chdir(argv[1]);
  d = opendir(argv[1]);

  printf("Queue Size: %d\n", queueSize);

  printf("Thread Count: %d\n", threadCount);

  // set our thread count now that we know how many files are in dir
  pthread_t threads[threadCount];


  // read through our directory
  while((e = readdir(d)) != NULL) 

    // make sure we aren't reading . and ..
    if(strcmp(e->d_name, ".") == 0) 
      continue;
    

    if(strcmp(e->d_name, "..") == 0) 
      continue;
    

    printf("entered file %s\n", e->d_name);

    char *filename = strdup(e->d_name);

    if(i < threadCount) 

      // create our threads
      pthread_create(&threads[i], NULL, splitInput, filename);
    

    // increment i
    i++;
  

  // join our existing threads
  for(int j = 0; j < i; j++) 
    pthread_join(threads[j], NULL);
  

  return 0;

【讨论】:

以上是关于使用 C 使用多线程从文本文件中解析单词的主要内容,如果未能解决你的问题,请参考以下文章

遇到多线程问题同时连接到多个设备

从文本文件中读取单词并存储到 C 中的动态数组 Valgrind 错误中

C ++:从文本文件中读取单行,按字母顺序对单词进行排序

word2vec如何解析文本文件?

尝试在单独的文本文件中查找单词

C++ 从文本文件中逐字读取单词或逐字符读取单词