BeanIo can't read special caracters
monsif opened this issue · 3 comments
monsif commented
I have a .txt file with ANSI (windows-1252) Encoding i'am able to read it on windows but not on unix.
here is the xml mapping file :
<beanio> <stream name="empData" format="csv"> <parser> <property name="delimiter" value=";"/> <property name="alwaysQuote" value="false"/> <!--<property name="quote" value='' />--> </parser> <record name="emp" class="com.MyClass" > <field name="name" /> <field name="job" /> <field name="adress"/> </record> </stream> </beanio>
Java Side :
StreamFactory factory = StreamFactory.newInstance();
InputStream in = this.getClass().getClassLoader()
.getResourceAsStream("mapping.xml");
Reader reader = new InputStreamReader(this.getClass().getClassLoader()
.getResourceAsStream("countries.txt"));
factory.load(in);
BeanReader beanReader = factory.createReader("empData", reader);
Gson gson = new Gson();
/*Object bean =new Object();*/
Object record = null;
while ((record = beanReader.read()) != null) {
System.out.println(beanReader.getRecordName() + ": "
+((MyClass)record).getCountry());
}
Result :
line : France
line : S??o Paulo should be (São Paulo) windows OK but unix is KO
line : USA
line : China
Any idea ?
FYI : i already tried to set Charset to UTF-8 java side .
new InputStreamReader(this.getClass().getClassLoader().getResourceAsStream("countries.txt"), Charset.forName("UTF-8"));
nicoschl commented
Hi @monsif
I have a few questions
- If the file is encoded with Windows-1252 why are you setting the encoding to UTF-8? What happens when you set it to Windows-1252?
- On the Unix side (you don't say which distro), if you have access to a terminal, what does the command
file countries.txt
return? - Is Windows-1252, UTF-8 or the character set returned by the file command supported by the jdk/jre running the application? What does
Charset.isSupprted("windows-1252")
,Charset.isSupprted("UTF-8")
return on unix?
nicoschl commented
Assuming issue is resolved, closing