The solution cannot be separated due to a special separator

Today, when dealing with text data, we encountered this kind of text matching with space and tab regularization, which did not work. Later, I asked my colleagues and found that “\ \ uf8f5” could be used to match.

Pending text:

A	abbr.安 
A-10IInone.美空军主力近距离空中支援攻击机,无愧为“坦克杀手”。
A-12none.夭折的美海军第一种隐形舰载攻击机。
A-4  none.54年服役的单座轻型舰载攻击机,现仍被多国使用。
A-6none.双座重型全天候舰载攻击机,主要用于低空突防,可进行核打击。
A-7IInone.离开沙场的单座亚音速攻击机,曾是美海空军主力。
A-OKnone.极好, 妙极, 完美的
A-Znone.无所不包的
A-boilern.原子反应器加热用的锅炉
A-bombn.原子弹
A-certificatenone.儿童不宜n.A级
A-controln.原子能管制
A-energyn.原子能
A-framen.金字塔形建筑物
A-lovelnone.英语学校里某一课程结束时举行的高深考试, 高深级考试及格
A-oneadj.第一等的, 第一流的
A-roadnone.A级公路, 主车道
A-siden.A面
A-testn.原子爆炸试验
A-weaponn.原子武器

Separation processing:

	public static void main(String[] args) throws Exception {
		String dic = util.Directory.GetAppPath("steamData") + "dic.txt.bak";
		BufferedReader br = util.MyFileTool.GetBufferReader(dic);
		while(br.ready()) {
			String line = br.readLine();
			String[] words = line.split("\\uf8f5");
			System.out.println("size: " + words.length);
			System.out.println(words[0]);
		}
		br.close();
	}


Read More: