如何使用 pentaho 水壶查看 http 标头

Posted

技术标签:

【中文标题】如何使用 pentaho 水壶查看 http 标头【英文标题】:How to see an http header with pentaho kettle 【发布时间】:2013-08-12 12:10:45 【问题描述】:

有什么方法可以查看 http 调用的响应标头吗? 我会更具体。我需要查看资源(由网络上的 URL 指向)何时被修改。知道最后修改的日期,我决定是否下载它。我认为这样做的一种方法是查看 http 调用的标头。有什么建议么 ?

【问题讨论】:

不是 Web 开发人员,除了使用 javascript 步骤并检查代码中的标头之外,我不知道其他方法。无论如何,这是 Kettle/ETL 工具的常见问题,我很想知道您找到什么解决方案。 【参考方案1】:

这将使用用户定义的 Java 类轻松完成。在这里,您是一个 Class 示例,期望在上一步中输入一个名为 picture(图片的 url)的输入行。现在使用以下代码添加您的用户定义的 java 类:

import java.util.*;
import java.lang.System.*;
import java.net.*;
import java.io.*;
import java.text.*;
import java.util.Date;
import java.util.Calendar;


public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException, Exception


  //First, get a row from the default input hop
  Object[] r = getRow();

  //If the row object is null, we are done processing.
  if (r == null) 
      setOutputDone();
      return false;
  



String filesSavePath = getParameter("filesSavePath")+"/tmp/pictures";
//remove "file://" from filesSavePath, otherwise gives a file io exception, file not found
filesSavePath =  filesSavePath.replace("file://","");

String picture = get(Fields.In, "picture").getString(r);

//get the last chunk of picture as filename to save in disk
String filePictureName = picture.substring(picture.lastIndexOf('/') + 1);
String fileFullPath = filesSavePath+ "/"+ filePictureName;

//lets get the headers from picture
try   

    boolean fileExists = new File(fileFullPath).isFile();

    //if picture do not exists save it
    if(fileExists != true)
        saveImage(picture, fileFullPath);
        System.out.println("new picture saved = " + filePictureName);
        System.out.println("*******************************");
    

    //if file exists compare date last modified file from header, younger than yesterday.
    //if true save it.
    else
        //get the last-modified header
        URL url = new URL(picture);
        URLConnection conn = url.openConnection();

        long lastModified = conn.getLastModified();         

        //get last-modified date
        Date lastModifiedDate = new Date(lastModified);

        //get yesterday date
        Calendar cal = Calendar.getInstance();
        cal.add(Calendar.DATE, -1);
        Date yesterdayDate = cal.getTime();


        //today just for testing
        //Date today = new Date();
        //boolean  dateCompare = today.after(yesterdayDate);


        boolean  dateCompare = lastModifiedDate.after(yesterdayDate);           

        //if true save it!
        if(dateCompare == true)
            saveImage(picture, fileFullPath);
            System.out.println("new picture saved(last modified after yesterday) = " + filePictureName);
        


        System.out.println("picture = " + picture);
        System.out.println("last modified after yesterday = " + dateCompare);
        System.out.println("last modified = " + lastModifiedDate);
        //System.out.println("today = " + today);
        System.out.println("yesterday date = " + yesterdayDate);
        System.out.println("*******************************");
    



 
catch (Exception e) 
System.out.println("error: " + e);
String fullStackTrace = org.apache.commons.lang.exception.ExceptionUtils.getFullStackTrace(e);
System.out.println("fullStackTrace: " + fullStackTrace);



return true;









    private static void saveImage(String imageUrl, String destinationFile) throws IOException 
    URL url = new URL(imageUrl);
    InputStream is = url.openStream();
    OutputStream os = new FileOutputStream(destinationFile);

    byte[] b = new byte[2048];
    int length;

    while ((length = is.read(b)) != -1) 
        os.write(b, 0, length);
    

    is.close();
    os.close();

【讨论】:

以上是关于如何使用 pentaho 水壶查看 http 标头的主要内容,如果未能解决你的问题,请参考以下文章

无法在 Pentaho 水壶中获取电子邮件附件

pentaho水壶:从流中获取行数

在 Pentaho 水壶中,如何检查文件名是不是存在?

Pentaho:水壶/勺子:插入后组合多个数据

如何存储一行中的变量以在 Pentaho 水壶中的后续行中使用?

如何将pentaho数据集成水壶插件移植到apache hop?