java正则表达式将文本拆分为6列
这是文本文件,也是其中的一部分,我想要它,这样我就可以将其作为:
Column 1 = distribution
Column 2 = votes
Column 3 = rank
Column 4 = title
Column 5 = year
Column 6 = Subtitle (but only where there is a subtitle)
我使用的正则表达式是:
regexp =
"([0-9\\.]+)[ \\t]+([0-9]+)[ \\t]+([0-9\\.]+)[ \\t]+(.*?[ \\t]+\\([0-9]{4}\\).*)";
但正如你所知,我似乎没有办法解决这个问题
1000000103 50 4.5 #1 Single (2006) {THis would be a subtitle example}
2...1.2.12 8 2.7 $1,000,000 Chance of a Lifetime (1986)
11..2.2..2 8 5.0 $100 Taxi Ride (2001)
....13.311 9 7.1 $100,000 Name That Tune (1984)
3..21...22 10 4.6 $2 Bill (2002)
30010....3 18 2.7 $25 Million Dollar Hoax (2004)
2000010002 111 5.6 $40 a Day (2002)
2000000..4 26 1.6 $5 Cover (2009)
.0..2.0122 15 7.8 $9.99 (2003)
..2...1113 8 7.5 $weepstake$ (1979)
0000000125 3238 8.7 Allo Allo! (1982)
1....22.12 8 6.5 Allo Allo! (1982) {A Barrel Full of Airmen (#7.7)
代码IM使用:
try {
FileInputStream file_stream = new FileInputStream("/Users/angadsoni/Desktop/ratings-1.txt");
DataInputStream data_stream = new DataInputStream(file_stream);
BufferedReader bf = new BufferedReader(new InputStreamReader(data_stream));
ResultSet rs;
Statement stmt;
Connection con = null;
Class.forName("org.gjt.mm.mysql.Driver").newInstance();
String url = "jdbc:mysql://localhost/mynewdatabase";
con = DriverManager.getConnection(url,"root","");
stmt = con.createStatement();
try{
stmt.executeUpdate("DROP TABLE myTable");
}catch(Exception e){
System.out.print(e);
System.out.println("No existing table to delete");
//Create a table in the database named mytable
stmt.executeUpdate("CREATE TABLE mytable(distribution char(20)," + "votes integer," + "rank float," + "title char(250)," + "year integer," + "sub char(250));");
String rege= "^([\\d.]+)\\s+(\\d+)\\s+([\\d.]+)\\s+(.+?)\\s+\\((\\d+)\\)(?:\\s+\\{([^{}]+))?";
Pattern pattern = Pattern.compile(rege);
String line;
String data= "";
while ((line = bf.readLine()) != null) {
data = line.replaceAll("'", "");
匹配器匹配器=模式。匹配器(数据)
if (matcher.find()) {
System.out.println("hello");
String distribution = matcher.group(1);
String votes = matcher.group(2);
String rank = matcher.group(3);
String title = matcher.group(4);
String year = matcher.group(5);
String sub = matcher.start(6) != -1 ? matcher.group(6) : "";
System.out.printf("%s %8s %6s%n%s (%s) %s%n%n",
matcher.group(1), matcher.group(2), matcher.group(3), matcher.group(4), matcher.group(5),
matcher.start(6) != -1 ? matcher.group(6) : "");
String todo = ("INSERT into mytable " +
"(Distribution, Votes, Rank, Title, Year, Sub) "+
"values ('"+distribution+"', '"+votes+"', '"+rank+"', '"+title+"', '"+year+", '"+sub+"');");
int r = stmt.executeUpdate(todo);
}//end if statement
}//end while loop
}
# 1 楼答案
我想为这个角色设计一个正则表达式,从标题开始,类似于你设计的
也许你可以提供更多的代码来澄清你到底在用正则表达式做什么?另外,答案有问题吗
# 2 楼答案
此正则表达式适用于您提供的数据:
如果没有字幕,最后一组(第6组)将为空
编辑:下面是一个完整的示例:
部分输出:
。。。等等
# 3 楼答案
可能还有更多的问题,但第一个障碍是反斜杠无法进入regex机器。你需要加倍
# 4 楼答案
我的第一个想法是,使用空格和
StringTokenizer
分割前几个字段可能更容易,然后对其余3个字段使用regexp。这样可以简化所需的regexp