Coordinate Y is not correct

Question

Coordinate Y is not correct

CJ1789 opened this issue 7 years ago · 17 comments

I want to only extract line positions so at that desired area I can extract Text from Pdf, But it is taking left top as (0,0) and when I am tring to extract text from pdf using itextsharp it is taking left bottom as (0,0), So I am not able to take correct text.

please help me I am stuck.

CJ1789 commented 7 years ago

ok, thnx

Answer 1 · 2017-12-13T07:39:43.000Z

The code does not work for all files.
You could start comparing the results of other softwares (on line) and see if the can extract data.
Then you can tailor the code on your pdf.
Usually there is a lot of work to do to parse a pdf exactly as you need (so it is good if you need to extract data from a lot of pdfs).

Answer 2 · 2017-12-13T08:33:45.000Z

Can you please tell me why you transform point x and y like

    public double TransformX(double x, double y)
    {
        return a * x + c * y + e;

    }

    public double TransformY(double x, double y)
    {
        return b * x + d * y + f;

    }

and in case 0 Rotation rotated point y as Y= 800-Y;

    public Point Rotate(int pageRotation)
    {
        switch (pageRotation)
        {
            case 0:
                return new Point(X, 800 - Y);
            case 90:
                return new Point(Y, X);
            case 180:
                return new Point(X, Y);
            default:
                return this;
        }
    }

Answer 3 · 2017-12-13T09:29:53.000Z

The first transformation is from the pdf guide.

About second question, 0 as page rotation means no rotation. I prefere to have the origin in upper left corner while pdf origin is lower left. 800 - y is to flip vertically (800 works for me, you can use a different literal). Otherwise you have to do this in 180 rotation.

Answer 4 · 2017-12-13T09:33:00.000Z

How to do 180 rotation?

Answer 5 · 2017-12-13T09:36:51.000Z

you could swap the two rotations.
0 => y
180 => 800 - y

But then I think that you'll find several things not working (the other functions expects that the origin is in the upper left corner).
Anyway, if you see that for some reasons you have everything is already flipped you could try it.

Answer 6 · 2017-12-13T10:03:31.000Z

I am not getting the answer in both the cases. please help me what to do. Is 800 - y is the way to flip pdf or you have got this value for your pdf?

Answer 7 · 2017-12-13T10:09:16.000Z

c - y
c is from my pdf.
The condition to determine c is c -y > 0 and it is used for rendering (debug) so it can't be 1000000 - y

Answer 8 · 2017-12-13T10:14:40.000Z

what is c??? I mean how can I identify it for my pdf

Can I sent you mt pdf??

Answer 9 · 2017-12-13T10:17:17.000Z

c means a literal a constant.

Yes, send me your pdf. I can have a look...

Answer 10 · 2017-12-13T10:18:25.000Z

send me your mail id please

Answer 11 · 2017-12-13T10:34:10.000Z

७_१२_6.pdf
७_१२_7.pdf
७_१२_8.pdf
७_१२_9.pdf
७_१२_10.pdf
७_१२_11.pdf
७_१२_data on 2 pages.pdf

I want to determine vertical line position of line 3 ie line[2] and line 6 ie line[5]

Answer 12 · 2017-12-13T11:59:23.000Z

Ok, I had a look to the first pdf.
You can do the same thing updating the source code and using the BuildTablesFromPdf.Renderer app.
The table in the first page is not really a table because is not well aligned. So the library determines more cells then there are.
Also, there is an issue on text positioning. I'm probably ignoring a pdf statement that locates the text in the right place.

About second page there is a different issue. The coordinates are wrong. Probably I'm ignoring a pdf statements that I should consider.
After solving this issue you will also have the issue about text positioning as in first page.

I will probably try to fix it but I'm not sure and I don't know when.
If you fix it and you share the code it will be really appreciated.

Answer 13 · 2017-12-14T10:14:03.000Z

I got correct Y.

Answer 14 · 2017-12-14T14:27:20.000Z

Could you send me the code?
THX!!!

Answer 15 · 2017-12-15T04:22:43.000Z

Just have to modify it by adding 150.

But now new issue had arrived. I am able to extract pdf but some characters are not being identified. Can you help me with that?

Example
अधिकार as अ\0धकार
महाराष्ट्र as महारा\0\0
क्षेत्र as \0े\0

Answer 16 · 2017-12-21T10:22:57.000Z

Hello,

I want to know that the code is running perfectly for first page in pdf but what to do for second page. I am not getting correct Y. Please help me.