如何在Python中使用textcat?

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何在Python中使用textcat?相关的知识,希望对你有一定的参考价值。

我想尝试一下TextCat。如果我可以从Python运行它对我来说最方便,因为我希望看到它在私有数据集上的表现如何。

我给了languagedet,但根据

from languagedet.mixed import MixedDetector
det = MixedDetector()
print(det.available)

可以通过languagedet获得比TextCats网站上声称的69种语言少得多的语言。

我也试过pylibtextcat,但我得到:

Collecting pylibtextcat
  Using cached pylibtextcat-0.2.tar.bz2
Building wheels for collected packages: pylibtextcat
  Running setup.py bdist_wheel for pylibtextcat ... error
  Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-1dkslney/pylibtextcat/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('
', '
');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpyct9pyfepip-wheel- --python-tag cp35:
  running bdist_wheel
  running build
  running build_ext
  building 'textcat' extension
  creating build
  creating build/temp.linux-x86_64-3.5
  x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION="0.2" -I/usr/include/python3.5m -c libtextcat.c -o build/temp.linux-x86_64-3.5/libtextcat.o -Wall -Wextra
  libtextcat.c:7:32: fatal error: libtextcat/textcat.h: No such file or directory
  compilation terminated.
  error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

  ----------------------------------------
  Failed building wheel for pylibtextcat
  Running setup.py clean for pylibtextcat
Failed to build pylibtextcat
Installing collected packages: pylibtextcat
  Running setup.py install for pylibtextcat ... error
    Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-1dkslney/pylibtextcat/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('
', '
');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-lwxglu50-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_ext
    building 'textcat' extension
    creating build
    creating build/temp.linux-x86_64-3.5
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION="0.2" -I/usr/include/python3.5m -c libtextcat.c -o build/temp.linux-x86_64-3.5/libtextcat.o -Wall -Wextra
    libtextcat.c:7:32: fatal error: libtextcat/textcat.h: No such file or directory
    compilation terminated.
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    ----------------------------------------
Command "/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-1dkslney/pylibtextcat/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('
', '
');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-lwxglu50-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-1dkslney/pylibtextcat/

当我尝试安装它(我有libexttextcat-2.0-0libexttextcat-datalibexttextcat-dev安装)。

我可以在Python中使用TextCat吗?

答案

似乎不一样,但是nltk有:

from nltk.classify import textcat

text = "This is a simple example."
cls = textcat.TextCat()

distances = cls.lang_dists(text)  # a dict of 437 elements
cls.guess_language(text)  # a str

以上是关于如何在Python中使用textcat?的主要内容,如果未能解决你的问题,请参考以下文章

如何在 python 中并行化以下代码片段?

常用python日期日志获取内容循环的代码片段

在 Python 多处理进程中运行较慢的 OpenCV 代码片段

Python之如何优雅的重试

使用 Python 代码片段编写 LaTeX 文档

如何在 Javadoc 中使用 @ 和 符号格式化代码片段?