列出 R 包依赖项而不安装包
Posted
技术标签:
【中文标题】列出 R 包依赖项而不安装包【英文标题】:Listing R Package Dependencies Without Installing Packages 【发布时间】:2013-01-16 17:02:39 【问题描述】:有没有一种简单的方法来获取给定包的 R 包依赖项(所有递归依赖项)列表,而无需安装包及其依赖项?类似于 portupgrade 或 apt 中的虚假安装。
【问题讨论】:
谢谢,这会节省我一些时间 :),由于文档中没有明确说明,假设 ggplot 的示例是dependsOnPkgs("ggplot2",installed=available.packages( )) 如果某处有任何帮助函数(utils
,tools
?)从本地 DESCRIPTION
文件中非递归地提取所有 deps,那么最好将其发布为答案也。否则,read.dcf
上的包装器提取各种 dep 类型 + 剥离空格,可以实现这一点。
全 R + 递归解决方案在这里:***.com/questions/38686427/…
【参考方案1】:
您可以使用available.packages
函数的结果。例如,查看ggplot2
依赖于什么:
pack <- available.packages()
pack["ggplot2","Depends"]
这给出了:
[1] "R (>= 2.14), stats, methods"
请注意,根据您想要实现的目标,您可能还需要检查 Imports
字段。
【讨论】:
酷——我总是喜欢了解方便的工具。可悲的是,这对于我们这些被困在公司防火墙后面的人来说是行不通的。我们可能会被困在browseURL('http://cran.r-project.org/web/packages/package.name')
谢谢,这帮了我很大的忙,我确实改变了问题的范围,但是通过递归搜索依赖和导入的列表,我可以构建一个完整的列表。
@CarlWitthoft 如果您使用的是 Windows,setInternet2()
可能会有所帮助。
@hadley,谢谢,但我前段时间做过那个练习。传出请求被阻止。显然我们需要一个新包r.apt-get
:-)
pack["ggplot2","Depends"]
似乎不再返回包依赖项,但 pack["ggplot2","Imports"]
确实 - 正如@juba 所建议的那样。至少对我来说:R 版本 4.0.2 (2020-06-22)【参考方案2】:
我没有安装 R,我需要找出哪些 R 包依赖于我公司请求使用的 R 包列表。
我编写了一个 bash 脚本,它遍历文件中的 R 包列表并递归地发现依赖关系。
脚本使用名为 rinput_orig.txt 的文件作为输入(示例如下)。该脚本将在其工作时创建一个名为 rinput.txt 的文件。
脚本将创建以下文件:
rdepsfound.txt - 列出找到的依赖项,包括依赖它的 R 包(示例如下)。 routput.txt - 列出所有 R 包(来自原始列表和依赖项列表)以及许可证和 CRAN URL(示例如下)。 r404.txt - 尝试 curl 时收到 404 的 R 包列表。如果您的原始列表有任何拼写错误,这会很方便。Bash 脚本:
#!/bin/bash
# CLEANUP
rm routput.txt
rm rdepsfound.txt
rm r404.txt
# COPY ORIGINAL INPUT TO WORKING INPUT
cp rinput_orig.txt rinput.txt
IFS=","
while read PACKAGE; do
echo Processing $PACKAGE...
PACKAGEURL="http://cran.r-project.org/web/packages/$PACKAGE/index.html"
if [ `curl -o /dev/null --silent --head --write-out '%http_code\n' $PACKAGEURL` != 404 ]; then
# GET LICENSE INFO OF PACKAGE
LICENSEINFO=$(curl $PACKAGEURL 2>/dev/null | grep -A1 "License:" | grep -v "License:" | gawk 'match($0, /<a href=".*">(.*)<\/a>/, a) print a[0]' | sed "s/|/,/g" | sed "s/+/,/g")
for x in $LICENSEINFO[*]
do
# SAVE LICENSE
LICENSE=$(echo $x | gawk 'match($0, /<a href=".*">(.*)<\/a>/, a) print a[1]')
break
done
# WRITE PACKAGE AND LICENSE TO OUTPUT FILE
echo $PACKAGE $LICENSE $PACKAGEURL >> routput.txt
# GET DEPENDENCIES OF PACKAGE
DEPS=$(curl $PACKAGEURL 2>/dev/null | grep -A1 "Depends:" | grep -v "Depends:" | gawk 'match($0, /<a href=".*">(.*)<\/a>/, a) print a[0]')
for x in $DEPS[*]
do
FOUNDDEP=$(echo "$x" | gawk 'match($0, /<a href=".*">(.*)<\/a>/, a) print a[1]' | sed "s/<\/span>//g")
if [ "$FOUNDDEP" != "" ]; then
echo Found dependency $FOUNDDEP for $PACKAGE...
grep $FOUNDDEP rinput.txt > /dev/null
if [ "$?" = "0" ]; then
echo $FOUNDDEP already exists in package list...
else
echo Adding $FOUNDDEP to package list...
# SAVE FOUND DEPENDENCY BACK TO INPUT LIST
echo $FOUNDDEP >> rinput.txt
# SAVE FOUND DEPENDENCY TO DEPENDENCY LIST FOR EASY VIEWING OF ALL FOUND DEPENDENCIES
echo $FOUNDDEP is a dependency of $PACKAGE >> rdepsfound.txt
fi
fi
done
else
echo Skipping $PACKAGE because 404 was received...
echo $PACKAGE $PACKAGEURL >> r404.txt
fi
done < rinput.txt
echo -e "\nRESULT:"
sort -u routput.txt
rinput_orig.txt 示例:
shiny
rmarkdown
xtable
RODBC
RJDBC
XLConnect
openxlsx
xlsx
Rcpp
运行脚本时的控制台输出示例:
Processing shiny...
Processing rmarkdown...
Processing xtable...
Processing RODBC...
Processing RJDBC...
Found dependency DBI for RJDBC...
Adding DBI to package list...
Found dependency rJava for RJDBC...
Adding rJava to package list...
Processing XLConnect...
Found dependency XLConnectJars for XLConnect...
Adding XLConnectJars to package list...
Processing openxlsx...
Processing xlsx...
Found dependency rJava for xlsx...
rJava already exists in package list...
Found dependency xlsxjars for xlsx...
Adding xlsxjars to package list...
Processing Rcpp...
Processing DBI...
Processing rJava...
Processing XLConnectJars...
Processing xlsxjars...
Found dependency rJava for xlsxjars...
rJava already exists in package list...
rdepsfound.txt 示例:
DBI is a dependency of RJDBC
rJava is a dependency of RJDBC
XLConnectJars is a dependency of XLConnect
xlsxjars is a dependency of xlsx
routput.txt 示例:
shiny GPL-3 http://cran.r-project.org/web/packages/shiny/index.html
rmarkdown GPL-3 http://cran.r-project.org/web/packages/rmarkdown/index.html
xtable GPL-2 http://cran.r-project.org/web/packages/xtable/index.html
RODBC GPL-2 http://cran.r-project.org/web/packages/RODBC/index.html
RJDBC GPL-2 http://cran.r-project.org/web/packages/RJDBC/index.html
XLConnect GPL-3 http://cran.r-project.org/web/packages/XLConnect/index.html
openxlsx GPL-3 http://cran.r-project.org/web/packages/openxlsx/index.html
xlsx GPL-3 http://cran.r-project.org/web/packages/xlsx/index.html
Rcpp GPL-2 http://cran.r-project.org/web/packages/Rcpp/index.html
DBI LGPL-2 http://cran.r-project.org/web/packages/DBI/index.html
rJava GPL-2 http://cran.r-project.org/web/packages/rJava/index.html
XLConnectJars GPL-3 http://cran.r-project.org/web/packages/XLConnectJars/index.html
xlsxjars GPL-3 http://cran.r-project.org/web/packages/xlsxjars/index.html
我希望这对某人有帮助!
【讨论】:
【参考方案3】:另一个简洁的解决方案是来自库packrat
的内部函数recursivePackageDependencies
。但是,该软件包必须安装在您机器上的 some 库中。优点是它也适用于自制的非 CRAN 包。示例:
packrat:::recursivePackageDependencies("ggplot2",lib.loc = .libPaths()[1])
给予:
[1] "R6" "RColorBrewer" "Rcpp" "colorspace" "dichromat" "digest" "gtable"
[8] "labeling" "lazyeval" "magrittr" "munsell" "plyr" "reshape2" "rlang"
[15] "scales" "stringi" "stringr" "tibble" "viridisLite"
【讨论】:
对于那些寻求快速而肮脏的解决方案的人来说,请注意使用:::
访问内部函数是一个设计错误。 stat.ethz.ch/R-manual/R-devel/library/base/html/…
我不会说这个说法总体上是正确的。虽然文档中给出的建议有时可能是正确的,但绝对没有理由不在脚本中使用此代码供您自己使用。该声明可能更多地指的是在另一个包中使用此类功能以供公众使用的情况。
你是对的。我的意思是这样,你必须考虑你将在哪里使用它。在编码时,您几乎可以做任何事情;)【参考方案4】:
我很惊讶没有人提到 tools::package_dependencies()
,这是最简单的解决方案,并且有一个 recursive
参数(公认的解决方案不提供)。
查看 CRAN 上前 200 个包的递归依赖关系的简单示例:
library(tidyverse)
avail_pks <- available.packages()
deps <- tools::package_dependencies(packages = avail_pks[1:200, "Package"],
recursive = TRUE)
tibble(Package=names(deps),
data=map(deps, as_tibble)) %>%
unnest(data)
#> # A tibble: 7,125 x 2
#> Package value
#> <chr> <chr>
#> 1 A3 xtable
#> 2 A3 pbapply
#> 3 A3 parallel
#> 4 A3 stats
#> 5 A3 utils
#> 6 aaSEA DT
#> 7 aaSEA networkD3
#> 8 aaSEA shiny
#> 9 aaSEA shinydashboard
#> 10 aaSEA magrittr
#> # … with 7,115 more rows
由reprex package (v0.3.0) 于 2020-12-04 创建
【讨论】:
我很惊讶,这绝对应该是公认的答案:它本身是无依赖性的,根本不是 hacky,并且完全回答了 OP 的问题。 tidyverse 变换很不错。【参考方案5】:试试这个:tools::package_dependencies(recursive = TRUE)$package_name
作为一个例子——这里是 dplyr 的依赖:
tools::package_dependencies(recursive = TRUE)$dplyr
[1] "ellipsis" "generics" "glue" "lifecycle" "magrittr" "methods"
[7] "R6" "rlang" "tibble" "tidyselect" "utils" "vctrs"
[13] "cli" "crayon" "fansi" "pillar" "pkgconfig" "purrr"
[19] "digest" "assertthat" "grDevices" "utf8" "tools"
【讨论】:
【参考方案6】:我针对packrat
和tools
测试了我自己的解决方案(检查了本地安装的包)。
您可以发现方法之间的明显差异。tools::package_dependencies
看起来对于旧 R 版本(直到 4.1.0 和 recursive = TRUE
)提供了太多,并且不是有效的解决方案。
R 4.1.0 NEWS
"Function tools::package_dependencies() (in package tools) can now use different dependency types for direct and recursive dependencies."
packrat:::recursivePackageDependencies
使用的是available.packages
,所以它基于最新的远程包,而不是本地包。
默认情况下,我的功能是跳过基本包,如果您也想附加它们,请更改 base
参数。
在 R 4.1.0 下测试:
get_deps <- function(package, fields = c("Depends", "Imports", "LinkingTo"), base = FALSE, lib.loc = NULL)
stopifnot((length(package) == 1) && is.character(package))
stopifnot(all(fields %in% c("Depends", "Imports", "Suggests", "LinkingTo")))
stopifnot(is.logical(base))
stopifnot(package %in% rownames(utils::installed.packages(lib.loc = lib.loc)))
paks_global <- NULL
deps <- function(pak, fileds)
pks <- packageDescription(pak)
res <- NULL
for (f in fileds)
ff <- pks[[f]]
if (!is.null(ff))
res <- c(
res,
vapply(
strsplit(trimws(strsplit(ff, ",")[[1]]), "[ \n\\(]"),
function(x) x[1],
character(1)
)
)
if (is.null(res))
return(NULL)
for (r in res)
if (r != "R" && !r %in% paks_global)
paks_global <<- c(r, paks_global)
deps(r, fields)
deps(package, fields)
setdiff(unique(paks_global), c(
package,
"R",
if (!base)
c(
"stats",
"graphics",
"grDevices",
"utils",
"datasets",
"methods",
"base",
"tools"
)
else
NULL
))
own = get_deps("shiny", fields = c("Depends", "Imports"))
packrat = packrat:::recursivePackageDependencies("shiny", lib.loc = .libPaths(), fields = c("Depends", "Imports"))
tools = tools::package_dependencies("shiny", which = c("Depends", "Imports"), recursive = TRUE)[[1]]
setdiff(own, packrat)
#> character(0)
setdiff(packrat, own)
#> character(0)
setdiff(own, tools)
#> character(0)
setdiff(tools, own)
#> [1] "methods" "utils" "grDevices" "tools" "stats" "graphics"
setdiff(packrat, tools)
#> character(0)
setdiff(tools, packrat)
#> [1] "methods" "utils" "grDevices" "tools" "stats" "graphics"
own
#> [1] "lifecycle" "ellipsis" "cachem" "jquerylib" "rappdirs"
#> [6] "fs" "sass" "bslib" "glue" "commonmark"
#> [11] "withr" "fastmap" "crayon" "sourcetools" "base64enc"
#> [16] "htmltools" "digest" "xtable" "jsonlite" "mime"
#> [21] "magrittr" "rlang" "later" "promises" "R6"
#> [26] "Rcpp" "httpuv"
packrat
#> [1] "R6" "Rcpp" "base64enc" "bslib" "cachem"
#> [6] "commonmark" "crayon" "digest" "ellipsis" "fastmap"
#> [11] "fs" "glue" "htmltools" "httpuv" "jquerylib"
#> [16] "jsonlite" "later" "lifecycle" "magrittr" "mime"
#> [21] "promises" "rappdirs" "rlang" "sass" "sourcetools"
#> [26] "withr" "xtable"
tools
#> [1] "methods" "utils" "grDevices" "httpuv" "mime"
#> [6] "jsonlite" "xtable" "digest" "htmltools" "R6"
#> [11] "sourcetools" "later" "promises" "tools" "crayon"
#> [16] "rlang" "fastmap" "withr" "commonmark" "glue"
#> [21] "bslib" "cachem" "ellipsis" "lifecycle" "sass"
#> [26] "jquerylib" "magrittr" "base64enc" "Rcpp" "stats"
#> [31] "graphics" "fs" "rappdirs"
microbenchmark::microbenchmark(get_deps("shiny", fields = c("Depends", "Imports")),
packrat:::recursivePackageDependencies("shiny", lib.loc = .libPaths(), fields = c("Depends", "Imports")),
tools = tools::package_dependencies("shiny", which = c("Depends", "Imports"), recursive = TRUE)[[1]],
times = 5
)
#> Warning in microbenchmark::microbenchmark(get_deps("shiny", fields =
#> c("Depends", : less accurate nanosecond times to avoid potential integer
#> overflows
#> Unit: milliseconds
#> expr
#> get_deps("shiny", fields = c("Depends", "Imports"))
#> packrat:::recursivePackageDependencies("shiny", lib.loc = .libPaths(), fields = c("Depends", "Imports"))
#> tools
#> min lq mean median uq max neval
#> 5.316552 5.607365 6.054568 5.674359 6.633308 7.041258 5
#> 18.767340 19.387588 21.739127 21.581457 23.526169 25.433079 5
#> 411.589734 449.179354 458.526354 465.431262 468.440211 497.991207 5
由reprex package (v0.3.0) 于 2021 年 6 月 25 日创建
在旧 R 版本下证明 tools
解决方案有问题。在 R 3.6.3 下测试。
paks <- tools::package_dependencies("shiny", which = c("Depends", "Imports"), recursive = TRUE)[[1]]
"lifecycle" %in% paks
#> [1] TRUE
any(c(paks, "shiny") %in% tools::dependsOnPkgs("lifecycle"))
#> [1] FALSE
由reprex package (v0.3.0) 于 2021 年 6 月 25 日创建
【讨论】:
我正在使用 R 4.1.1:> own = get_deps("shiny", fields = c("Depends", "Imports")) Error in utils::installed.packages(lib.loc = lib.loc) : object 'lib.loc' not found
我需要定义其他环境变量吗?
很好,刚刚更新了代码,lib.loc 现在是一个附加参数,默认为 NULL。
请考虑使用cran.r-project.org/web/packages/pacs/index.html 包。有一个经过验证的函数pacs::pac_deps
有很多功能。
非常感谢您提供这个非常有用的功能和示例!以上是关于列出 R 包依赖项而不安装包的主要内容,如果未能解决你的问题,请参考以下文章
Linux查询安装包|软件包依赖和被依赖关系|yum|rpm
在 Windows 上部署 Python 包,编译依赖项,而不安装 Visual Studio?