HSSF XSSF SXSSF

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了HSSF XSSF SXSSF相关的知识,希望对你有一定的参考价值。

参考技术A

1.创建工作簿Workbook
2.创建Sheet
3.创建行Row
4.创建单元格Cell

HSSFWorkbook是操作Excel2003以前(包括2003)的版本,扩展名为.xls,所以每个Sheet局限就是导出的行数至多为65535行,一般不会发生内存不足的情况(OOM)。

这种形式的出现是由于HSSFWorkbook的局限性而产生的,因为其所导出的行数比较少,并且只针对Excel2003以前(包括2003)的版本的版本,所以 XSSFWookbook应运而生,其对应的是EXCEL2007以后的版本(1048576行,16384列)扩展名.xlsx,每个Sheet最多可以导出104万行,不过这样就伴随着一个OOM内存溢出的问题,原因是你所创建的sheet row cell 等此时是存在内存中的,随着数据量增大 ,内存的需求量也就增大,那么很大可能就是要OOM了。

从POI 3.8版本开始,提供了一种基于XSSFWorkbook的低内存占用的工作簿SXSSFWorkbook。

引用官方的介绍,简单概括就是:
SXSSF是对XSSF的一种流式扩展, 特点是采用了滑动窗口的机制,低内存占用 ,主要用于数据量非常大的电子表格而虚拟机堆有限的情况。

原理是利用了滑动窗口机制。
SXSSFWorkbook.DEFAULT_WINDOW_SIZE默认值是100,表示在内存中最多存在100个Row对象,当写第101个Row对象的时候就会把第1个Row对象以XML格式写入C:\\Users\\wange\\AppData\\Local\\Temp路径下的临时文件中,后面的以此类推,始终保持内存中最多存在100个Row对象。

SXSSFWorkbook默认使用内联字符串而不是 共享字符串表 (SharedStringsTable)。启用共享字符串时,文档中的所有唯一字符串都必须保存在内存中,因此会占用更多的内存。

与XSSF的对比,在一个时间点上,只可以访问一定数量的Row;不再支持Sheet.clone();不再支持公式的求值。但是除了滑动窗口,其余的EXCLE操作仍然使用的是XSSF的API。
另外官方提示导出EXCEL后应该调用wb.dispose()来删除之前保存的临时文件。

wb.write(out)通过源码了解到过程是
1、将wb的所有sheet调用flushRows()移出内存,写入临时.xml文件中
2、生成了一个临时.xlsx文件将wb的一些模板数据写入这个临时文件
3、将这个临时.xlsx文件转成ZipFile,遍历所有ZipEntry来获取Sheet,如果没有Sheet则直接复制流。
4、如果能够获取到Sheet的则是那些临时.xml文件,在对这些文件进行解析并追踪写入导出文件中。
(这边可能是涉及到了一些EXCEL文件格式的原理,就不深入研究了)

SXSSFWorkbook wb = new SXSSFWorkbook(-1)
初始化设置为-1的时候我们可以自己定义写临时文件规则,比如每读1000行记录flush到临时一次,可以大大减少磁盘IO次数。

使用SAX模型来解析EXCEL不像DOM模型一下把所有文件内容加载进内存,它逐行扫描文档,一边扫描,一边解析。所以那些只需要单遍读取内容的应用程序就可以从SAX解析中受益,这对大型文档的解析是个巨大优势。

对POI开源库excel大概泛(overview)了解


target:learning the APi of .xlsx generation and .xlsx data outputting etc

component APIs
OverView
Excel() (SS=HSSF+XSSF+SXSSF)

1:definition:

HSSF is the POI Project‘s pure Java implementation of the Excel ‘97(-2007) file format
XSSF is the POI Project‘s pure Java implementation of the Excel 2007 OOXML(.xlsx) file format

2:functions:

HSSF and XSSF provides ways to read spreadsheets create,modify,read and write XLS
spreadsheet.they provide:
low level structures for those with special needs
an eventmodel api for efficient read-only access
a full usermodel api for creating, reading and modifying XLS files

3:Other
(1)SS Usermodel for HSSF and XSSF support, seeing the ss usermodel converting guide

(2)An alrernate way of generating a spreadsheet is via the Cocoon seralizer

(3)merely reading spreadsheet data, then use the eventmodel api in either the org.apache.poi.hssf.eventusermodel package
or the org.apache.poi.xssf.eventusermodel package, depending on your file format.

(4)If you‘re modifying spreadsheet data then use the usermodel api. You can also generate spreadsheets this way.

SXSSF:
origin:Since 3.8-beta3,POI provides a low-memory footprint SXSSF API built on top of XSSF.

技术分享图片

 

快速指导:

How to use The HSSF APi:
Capabilities:
HSSF
(1)
allows numeric,string,date or formula cell values to be written to or read
from an XLS files.
(2)
supports row and aolumn sizing,and cell styling(bold,italics etc)
supports for both built-in and user defined data formats
(3)
is available that event-based(only read)APi for reading XLS files.

General use
User API(HSSF and XSSF)
Writing a new file
(1)The high level Api(package:org.apache.poi.ss.usermodel)is what most people should
use,usage is very simple.
first step(create Workbooks)
one way:creating an instance of org.apache.poi.ss.usermodel.Workbook
two way:create a concrete class directly (org.apache.poi.hssf.usermodel.
HSSFWorkbook or org.apache.poi.xssf.usermodel.XSSFWorkbook)
three way:use the handy factory class org.apache.poi.ss.usermodel.WorkbookFactory.
second step(create sheet):
(1)Sheets are created by calling createSheet() from an existing instance of Workbook
(2)the created sheet is automatically added in sequence to the workbook.
(3)you set the name associated with a sheet by calling Workbook.setSheetName(sheetindex,"SheetName",encoding).
(4)For HSSF, the name may be in 8bit format (HSSFWorkbook.ENCODING_COMPRESSED_UNICODE) or Unicode (HSSFWorkbook.ENCODING_UTF_16). Default encoding for
HSSF is 8bit per char. For XSSF, the name is automatically handled as unicode.
third step(create Row)
(1)Row are created by calling createRow(rowNumber)from an existing instance of Sheet
setRowHeight(height) on the row object
The height must be given in twips, or 1/20th of a point. If you prefer, there is also a setRowHeightInPoints method.
forth step(create Cells)
(1)Cells are created by calling createCell(column,type)from an existing Row.
Cells should have their cell type set to either Cell.CELL_TYPE_NUMERIC or Cell.CELL_TYPE_STRING
Cells must also have a value set. Set the value by calling setCellValue with either a String or double as a parameter.
you must call setColumnWidth(colindex, width) (use units of 1/256th of a character) on the Sheet object.
fifth step(create Styles)
Cells are styled with CellStyle objects which in turn contain a reference to an Font object
calling createCellStyle() and createFont()
To set a font for an CellStyle call setFont(fontobj).

Once you have generated your workbook, you can write it out by calling write(outputStream)
passing it an OutputStream (for instance, a FileOutputStream or ServletOutputStream).

组件映射:

Component Map

The Apache POI distribution consists of support for many document file formats. This support is provided in several Jar files. Not all of the Jars are needed for every format. The following tables show the relationships between POI components, Maven repository tags, and the project‘s Jar files.

 

ComponentApplication typeMaven artifactIdNotes
POIFS OLE2 Filesystem poi Required to work with OLE2 / POIFS based files
HPSF OLE2 Property Sets poi  
HSSF Excel XLS poi For HSSF only, if common SS is needed see below
HSLF PowerPoint PPT poi-scratchpad  
HWPF Word DOC poi-scratchpad  
HDGF Visio VSD poi-scratchpad  
HPBF Publisher PUB poi-scratchpad  
HSMF Outlook MSG poi-scratchpad  
DDF Escher common drawings poi  
HWMF WMF drawings poi-scratchpad  
OpenXML4J OOXML poi-ooxml plus either poi-ooxml-schemasor
ooxml-schemas and ooxml-security
See notes below for differences between these options
XSSF Excel XLSX poi-ooxml  
XSLF PowerPoint PPTX poi-ooxml  
XWPF Word DOCX poi-ooxml  
XDGF Visio VSDX poi-ooxml  
Common SL PowerPoint PPT and PPTX poi-scratchpad and poi-ooxml SL code is in the core POI jar, but implementations are in poi-scratchpad and poi-ooxml.
Common SS Excel XLS and XLSX poi-ooxml WorkbookFactory and friends all require poi-ooxml, not just core poi





















































以上是关于HSSF XSSF SXSSF的主要内容,如果未能解决你的问题,请参考以下文章

HSSF XSSF SXSSF

原创POI 5.x XSSF和HSSF使用自定义字体颜色

原创POI 5.x XSSF和HSSF使用自定义字体颜色

org.apache.poi.hssf.usermodel.HSSFWorkbookorg.apache.poi.xssf.usermodel.XSSFWorkbook excel2003 exce

You need to call a different part of POI to process this data (eg XSSF instead of HSSF)的解决方法

HSSFXSSF和SXSSF区别以及Excel导出优化