使用Powershell将大输出从Oracle导出到CSV

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用Powershell将大输出从Oracle导出到CSV相关的知识,希望对你有一定的参考价值。

我每周需要从Oracle导出一个很大的CSV文件。

我尝试了两种方法。

  1. Adapter.fill(数据集)
  2. 循环浏览列和行以一次一行保存到CSV文件中。

第一个在运行时内存不足(服务器计算机只有4 GB RAM),第二个大约要花一个小时,因为要导出的行超过400万。

这里是代码1:

#Your query. It cannot contain any double quotes otherwise it will break.
$query = "SELECT manycolumns FROM somequery"

#Oracle login credentials and other variables
$username = "username"
$password = "password"
$datasource = "database address"
$output = "\\NetworkLocation\Sales.csv"

#creates a blank CSV file and make sure it's in ASCI
Out-File $output -Force ascii

#This here will look for "Oracle.ManagedDataAccess.dll" file inside "C:\Oracle" folder. We usually have two versions of Oracle installed so the adaptor can be in different locations. Needs changing if the Oracle is installed elsewhere.
$location = Get-ChildItem -Path C:\Oracle -Filter Oracle.ManagedDataAccess.dll -Recurse -ErrorAction SilentlyContinue -Force

#Establishes connection to Oracle using the DLL file
Add-Type -Path $location.FullName
$connectionString = 'User Id=' + $username + ';Password=' + $password + ';Data Source=' + $datasource
$connection = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($connectionString)
$connection.open()
$command=$connection.CreateCommand()
$command.CommandText=$query

#Creates a table in memory and fills it with results from the query. Then, export the virtual table into CSV.
$DataSet = New-Object System.Data.DataSet
$Adapter = New-Object Oracle.ManagedDataAccess.Client.OracleDataAdapter($command)
$Adapter.Fill($DataSet)
$DataSet.Tables[0] | Export-Csv $output -NoTypeInformation

$connection.Close()

这里是#2

#Your query. It cannot contain any double quotes otherwise it will break.
$query = "SELECT manycolumns FROM somequery"

#Oracle login credentials and other variables
$username = "username"
$password = "password"
$datasource = "database address"
$output = "\\NetworkLocation\Sales.csv"
$tempfile = $env:TEMP + "\Temp.csv"

#creates a blank CSV file and make sure it's in ASCI
Out-File $tempfile -Force ascii

#This here will look for "Oracle.ManagedDataAccess.dll" file inside "C:\Oracle" folder. Needs changing if the Oracle is installed elsewhere.
$location = Get-ChildItem -Path C:\Oracle -Filter Oracle.ManagedDataAccess.dll -Recurse -ErrorAction SilentlyContinue -Force

#Establishes connection to Oracle using the DLL file
Add-Type -Path $location.FullName
$connectionString = 'User Id=' + $username + ';Password=' + $password + ';Data Source=' + $datasource
$connection = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($connectionString)
$connection.open()
$command=$connection.CreateCommand()
$command.CommandText=$query

#Reads results column by column. This way you don't have to specify how many columns it has.
$reader = $command.ExecuteReader()
  while($reader.Read()) 
       $props = @
       for($i = 0; $i -lt $reader.FieldCount; $i+=1) 
           $name = $reader.GetName($i)
           $value = $reader.item($i)
           $props.Add($name, $value)   
       
       #Exports each line to CSV file. Works best when the file is on local drive as it saves it after each line.
       new-object PSObject -Property $props | Export-Csv $tempfile -NoTypeInformation -Append
  

Move-Item $tempfile $output -Force

$connection.Close()

理想情况下,我想使用第一个代码,因为它比第二个要快,但要避免内存不足。

你们是否知道有某种方法可以“填充”前1百万条记录,将它们附加到CSV,清理“ DataSet”表,再清除1百万条,等等?代码运行完CSV权重后,大约需要1.3 GB,但是在运行时,即使8 GB的内存也不够用(我的笔记本电脑只有8 GB,但是服务器只有4 GB,这确实让它很难受)。

任何技巧都将不胜感激。

答案

在* nix社区中,我们喜欢单线!

您可以在sqlplus中将标记设置为'csv on'(> = 12)

创建查询文件

cat > query.sql <<EOF
set head off
set feed off
set timing off
set trimspool on
set term off
spool output.csv
select 
  object_id, 
  owner, 
  object_name, 
  object_type, 
  status, 
  created, 
  last_ddl_time 
from dba_objects;
spool off
exit;
EOF

像这样假装output.csv

sqlplus -s -m "CSV ON DELIM ',' QUOTE ON" user/password@\"localhost:1521/<my_service>\" @query.sql

另一个选项是SQLcl(SQL Developer CLI工具。二进制名称:'sql'我重命名为'sqlcl')]

创建查询文件(注意!术语打开|关闭)

cat > query.sql <<EOF
set head off
set feed off
set timing off
set term off
set trimspool on
set sqlformat csv
spool output.csv
select 
  object_id, 
  owner, 
  object_name, 
  object_type, 
  status, 
  created, 
  last_ddl_time 
from dba_objects 
where rownum < 5;
spool off
exit;
EOF

像这样假装output.csv

sqlcl -s system/oracle@\"localhost:1521/XEPDB1\" @query.sql

中提琴!

cat output.csv 
9,"SYS","I_FILE#_BLOCK#","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04
38,"SYS","I_OBJ3","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04
45,"SYS","I_TS1","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04
51,"SYS","I_CON1","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04

而优胜者是sqlplus,获得77k行! (已删除的过滤器rownum <5)

time sqlcl -s system/oracle@\"localhost:1521/XEPDB1\" @query.sql

real    0m23.776s
user    0m39.542s
sys     0m1.293s

time sqlplus -s -m "CSV ON DELIM ',' QUOTE ON" system/oracle@localhost/XEPDB1 @query.sql

real    0m3.066s
user    0m0.700s
sys     0m0.265s

wc -l output.csv
77480 output.csv

您可以在SQL Developer中尝试使用格式。

select /*CSV|html|JSON|TEXT|<TONSOFOTHERFORMATS>*/ from dba_objects;

如果要将CSV加载到数据库中,此工具将完成!

https://github.com/csv2db/csv2db

祝你好运!

另一答案

感谢大家的回应,我了解了我不知道的Oracle脚本和sql * plus。将来我可能会使用它们,但是我想我将不得不更新我的Oracle Developer软件包。

我在这里找到了一种使用文档来编辑我的代码的方法:https://docs.oracle.com/database/121/ODPNT/OracleDataAdapterClass.htm#i1002865

这并不完美,因为它每隔一百万行就暂停一次,保存输出并重新运行对它进行重新评估的查询(我正在运行的查询大约需要1-2分钟进行评估。]]

它基本上与运行一次代码x相同(其中x是百万行的上限),先执行“仅获取前1'000'000行”,然后执行“偏移1'000'00行,获取下1个”仅000'000行”等,并将其保存到CSV追加到底部。

这里是代码:

#Your query. It cannot contain any double quotes otherwise it will break.
$query = "SELECT
A lot of columns
FROM
a lot of tables joined together
WHERE
a lot of conditions
"

#Oracle login credentials and other variables
$username = myusername
$password = mypassword
$datasource = TNSnameofmyDatasource
$output = "$env:USERPROFILE\desktop\Sales.csv"

#creates a blank CSV file and make sure it's in ASCII as that's what the output of my query is
Out-File $output -Force ascii

#This here will look for "Oracle.ManagedDataAccess.dll" file inside "C:\Oracle" folder. Needs changing if the Oracle is installed elsewhere.
$location = Get-ChildItem -Path C:\Oracle -Filter Oracle.ManagedDataAccess.dll -Recurse -ErrorAction SilentlyContinue -Force

#Establishes connection to Oracle using the DLL file
Add-Type -Path $location.FullName
$connectionString = 'User Id=' + $username + ';Password=' + $password + ';Data Source=' + $datasource
$connection = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($connectionString)
$connection.open()
$command=$connection.CreateCommand()
$command.CommandText=$query

#Creates a table in memory to be filled up with results from the query using ODAC
$DataSet = New-Object System.Data.DataSet
$Adapter = New-Object Oracle.ManagedDataAccess.Client.OracleDataAdapter($command)

#Declaring variables for the loop
$fromrecord = 0
$numberofrecords = 1000000
$timesrun = 0

#Loop as long as the number of Rows in the virtual table are equal to specified $numberofrecords
while(($timesrun -eq 0) -or ($DataSet.Tables[0].Rows.Count -eq $numberofrecords))

$DataSet.Clear()
$Adapter.Fill($DataSet,$fromrecord,$numberofrecords,'*') | Out-Null #Suppresses writing to console the number of rows filled
Write-progress "Saved: $fromrecord Rows"
$DataSet.Tables[0] | Export-Csv $output -Append -NoTypeInformation
$fromrecord=$fromrecord+$numberofrecords
$timesrun++


$connection.Close()

以上是关于使用Powershell将大输出从Oracle导出到CSV的主要内容,如果未能解决你的问题,请参考以下文章

使用 Powershell 从 SQL Server 2008 R2 导出到固定宽度的文本文件

为啥不能使用 VariablesToExport 导出 PowerShell 模块中的变量成员?

powershell 从SCCM数据库导出数据。

从 Powershell 将不存在的文件路径导出到文本文件

powershell 从dll导出SCCM状态消息本地化字符串。

PowerShell:ConvertFrom-Json 将多个对象导出到 csv