有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

显示读取java中mapreduce程序的CSV文件时出错

下面的代码是mapreduce中的Mapper类。我试图编写的代码是将CSV文件读取到HashMap中,并在每一行中存储两列数据(第1列表示userId,第6列表示书籍的CheckOutDateTime)。我认为StubMapper类中getMapFromCSV函数的代码似乎是错误的。有人能启发我吗?在底部,我输入了错误的输出。谢谢大家的帮助和建议

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Date;
import java.util.HashMap;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;



public class StubMapper extends Mapper<LongWritable, Text, Text, MinMaxCountTuple> {

    private Text outUserId = new Text();
    private MinMaxCountTuple outTuple = new MinMaxCountTuple();

    private final static SimpleDateFormat frmt = 
            new SimpleDateFormat("yyyy-MM--dd'T'HH:mm:ss.SSS");

    public static HashMap<String, String> getMapFromCSV(String filePath) throws IOException
    {

        HashMap<String, String> words = new HashMap<String, String>();

        BufferedReader in = new BufferedReader(new FileReader(filePath));
        String line;
        //= in.readLine())
        while ((line = in.readLine()) != null) {
            String columns[] = line.split("\t");
            if (!words.containsKey(columns[1])) {
                words.put(columns[1], columns[6]);
            }

        }
        //in.close();

        return words;



    }

@Override
  public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {


      HashMap<String, String> parsed = getMapFromCSV(value.toString());
      //String columns[] = value.toString().split("\t");

      String strDate = parsed.get("CheckoutDateTime");

      //String userId = columns[1];
      //String strDate = columns[6];
      String userId = parsed.get("BibNumber");

      try {
        Date creationDate = frmt.parse(strDate);

        outTuple.setMin(creationDate);
        outTuple.setMax(creationDate);

        outTuple.setCount(1);

        outUserId.set(userId);

        context.write(outUserId, outTuple);

      } catch (ParseException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }


  }
}

并显示出我无法理解的跟随错误。我认为问题似乎发生在getMapFromCSV类中的StubMapper函数中。 该函数的参数将包含CSV属性的信息。我试图存储到HashMap中的是成对的键和值。但是,我不知道怎样才能改变。请说明您是否知道我如何修复它

java.io.FileNotFoundException: Code,Description,Code Type,Format Group,Format Subgroup,Category Group,Category Subgroup (No such file or directory)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:120)
    at java.io.FileInputStream.<init>(FileInputStream.java:79)
    at java.io.FileReader.<init>(FileReader.java:41)
    at StubMapper.getMapFromCSV(StubMapper.java:27)
    at StubMapper.map(StubMapper.java:50)
    at StubMapper.map(StubMapper.java:14)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

共 (2) 个答案

  1. # 1 楼答案

    错误出现在此行中:

    BufferedReader in = new BufferedReader(new FileReader(filePath));
    
    1. 检查filePath的值
    2. 检查文件是否位于filePath
    3. 检查文件的内容是否有效
  2. # 2 楼答案

    mapreduce中缺少重要的概念。问题在于下面的一行

    public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
    
    // Below is the problematic line
          HashMap<String, String> parsed = getMapFromCSV(value.toString());
    

    也许您假设Text valueCSV filename,因此尝试从文件中获取值

    它不是那样工作的。映射器的Text value输入是CSV文件的一行

    假设您的CSV为以下结构:

    Code,Description,Code Type,Format Group,Format Subgroup,Category Group,Category Subgroup
    111,sample description,codeType1,IN,....
    

    您的代码应该类似于

    @Override
      public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
    
      if(value.toString().startWith("Code,Description")){
          // Skip header line (first line) of CSV
           return;
      }
    
      String data[] = value.toString().split(",", -1);
    
      String code= data[0];
      String codeType = data[2];
    
    ....
    ....
    and so one