kevinseim/beanio

BeanIo can't read special caracters

Closed this issue · 3 comments

I have a .txt file with ANSI (windows-1252) Encoding i'am able to read it on windows but not on unix.

here is the xml mapping file :

  <beanio>
    <stream name="empData" format="csv">
        <parser>
            <property name="delimiter" value=";"/>
            <property name="alwaysQuote" value="false"/>
            <!--<property name="quote" value='' />-->
        </parser>    

            <record name="emp" class="com.MyClass" >
            <field name="name" />
            <field name="job" />
            <field name="adress"/>  
       </record>
    </stream>
</beanio>

Java Side :

StreamFactory factory = StreamFactory.newInstance();

    InputStream in = this.getClass().getClassLoader()
            .getResourceAsStream("mapping.xml");

    Reader reader = new InputStreamReader(this.getClass().getClassLoader()
            .getResourceAsStream("countries.txt"));
    factory.load(in);

    BeanReader beanReader = factory.createReader("empData", reader);
    Gson gson = new Gson();
    /*Object bean =new Object();*/
    Object record = null;
    while ((record = beanReader.read()) != null) {
        System.out.println(beanReader.getRecordName() + ": "
                +((MyClass)record).getCountry());
    }

Result :
line : France

line : S??o Paulo should be (São Paulo) windows OK but unix is KO

line : USA

line : China

Any idea ?

FYI : i already tried to set Charset to UTF-8 java side .

new InputStreamReader(this.getClass().getClassLoader().getResourceAsStream("countries.txt"), Charset.forName("UTF-8"));

Hi @monsif

I have a few questions

  1. If the file is encoded with Windows-1252 why are you setting the encoding to UTF-8? What happens when you set it to Windows-1252?
  2. On the Unix side (you don't say which distro), if you have access to a terminal, what does the command file countries.txt return?
  3. Is Windows-1252, UTF-8 or the character set returned by the file command supported by the jdk/jre running the application? What does Charset.isSupprted("windows-1252"), Charset.isSupprted("UTF-8") return on unix?

Hi @monsif

Did you get it working? Can I close this ticket?

Nico

Assuming issue is resolved, closing