OWASP/phpsec

Scanner Parser

Closed this issue · 9 comments

You are using regexp to consume code, its invalid. Use php parser (like the one used in jwidget) to parse, also move tools folder out of libs.

There are two places where regex is used. 1) When we are searching for dangerous keywords such as echo
2) When we are trimming extra spaces such as \t \n from the the file.

Which one are you referring to??

I saw the jWidget style. You are using T_WHITESPACE to detect whitespaces. Were you referring to this thing ?

You should parse the PHP code, not deal with it like its a string.

What if there is a string in the code like this "echo 'hello there';", how can you detect that with regex? you need to parse a code, regex is for data strings, not formatted codes.
-A


Notice: This message is digitally signed, its source and integrity are verifiable.
If you mail client does not support S/MIME verification, it will display a file (smime.p7s), which includes the X.509 certificate and the signature body. Read more at Certified E-Mail with Comodo and Thunderbird in AbiusX.com

On Mordad 10, 1392, at 10:48 PM, Rahul Chaudhary notifications@github.com wrote:

There are two places where regex is used. 1) When we are searching for dangerous keywords such as echo
2) When we are trimming extra spaces such as \t \n from the the file.

Which one are you referring to??

I saw the jWidget style. You are using T_WHITESPACE to detect whitespaces. Were you referring to this thing ?


Reply to this email directly or view it on GitHub.

In your example, the regex code will look for the word echo. Since its
there, it will report it. I checked this example in the test file and it is
detecting it.

But lets talking about parsing. In j-widget, you made a function to parse
the variable name of a widget (Called parseName() ). In that function, you
take each line, and then search for the keyword new and then in that line
you search for "=" and then search for the variable name.
If I follow the same approach, I can take each line, then search for the
keyword "echo"
OR
search for single quotes or double quotes and reach at the beginning of
the string, and check if echo is before that or not.

On Thu, Aug 1, 2013 at 2:27 PM, AbiusX notifications@github.com wrote:

You should parse the PHP code, not deal with it like its a string.

What if there is a string in the code like this "echo 'hello there';", how
can you detect that with regex? you need to parse a code, regex is for data
strings, not formatted codes.
-A


Notice: This message is digitally signed, its source and integrity are
verifiable.
If you mail client does not support S/MIME verification, it will display a
file (smime.p7s), which includes the X.509 certificate and the signature
body. Read more at Certified E-Mail with Comodo and Thunderbird in
AbiusX.com

On Mordad 10, 1392, at 10:48 PM, Rahul Chaudhary notifications@github.com
wrote:

There are two places where regex is used. 1) When we are searching for
dangerous keywords such as echo
2) When we are trimming extra spaces such as \t \n from the the file.

Which one are you referring to??

I saw the jWidget style. You are using T_WHITESPACE to detect
whitespaces. Were you referring to this thing ?


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//issues/48#issuecomment-21958822
.

Regards,
Rahul Chaudhary
Ph - 412-519-9634

I do not search for keywords.
I parse the PHP code, then step through parsed tokens.


Notice: This message is digitally signed, its source and integrity are verifiable.
If you mail client does not support S/MIME verification, it will display a file (smime.p7s), which includes the X.509 certificate and the signature body. Read more at Certified E-Mail with Comodo and Thunderbird in AbiusX.com

On Mordad 10, 1392, at 11:25 PM, Rahul Chaudhary notifications@github.com wrote:

In your example, the regex code will look for the word echo. Since its
there, it will report it. I checked this example in the test file and it is
detecting it.

But lets talking about parsing. In j-widget, you made a function to parse
the variable name of a widget (Called parseName() ). In that function, you
take each line, and then search for the keyword new and then in that line
you search for "=" and then search for the variable name.
If I follow the same approach, I can take each line, then search for the
keyword "echo"
OR
search for single quotes or double quotes and reach at the beginning of
the string, and check if echo is before that or not.

On Thu, Aug 1, 2013 at 2:27 PM, AbiusX notifications@github.com wrote:

You should parse the PHP code, not deal with it like its a string.

What if there is a string in the code like this "echo 'hello there';", how
can you detect that with regex? you need to parse a code, regex is for data
strings, not formatted codes.
-A


Notice: This message is digitally signed, its source and integrity are
verifiable.
If you mail client does not support S/MIME verification, it will display a
file (smime.p7s), which includes the X.509 certificate and the signature
body. Read more at Certified E-Mail with Comodo and Thunderbird in
AbiusX.com

On Mordad 10, 1392, at 10:48 PM, Rahul Chaudhary notifications@github.com
wrote:

There are two places where regex is used. 1) When we are searching for
dangerous keywords such as echo
2) When we are trimming extra spaces such as \t \n from the the file.

Which one are you referring to??

I saw the jWidget style. You are using T_WHITESPACE to detect
whitespaces. Were you referring to this thing ?


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//issues/48#issuecomment-21958822
.

Regards,
Rahul Chaudhary
Ph - 412-519-9634

Reply to this email directly or view it on GitHub.

Fortunately PHP comes with everything that is needed. Rahul, take a look at the tokenizer extension of PHP. If you feed it a string of PHP code (like the complete file's text), you get back an array with all the identified language elements known to PHP. That way you can easily identify all the occurrances of "echo" that is used as the function call, and not as an innocent content of a string.

Exactly, and its not even an extension! its in the core!
-A


Notice: This message is digitally signed, its source and integrity are verifiable.
If you mail client does not support S/MIME verification, it will display a file (smime.p7s), which includes the X.509 certificate and the signature body. Read more at Certified E-Mail with Comodo and Thunderbird in AbiusX.com

On Mordad 10, 1392, at 11:38 PM, SvenRtbg notifications@github.com wrote:

Fortunately PHP comes with everything that is needed. Rahul, take a look at the tokenizer extension of PHP. If you feed it a string of PHP code (like the complete file's text), you get back an array with all the identified language elements known to PHP. That way you can easily identify all the occurrances of "echo" that is used as the function call, and not as an innocent content of a string.


Reply to this email directly or view it on GitHub.

EnDe commented

I.g. I agree with AbiusX that regex are a bad idea to parse code, you need a parser for that.
Just an example:
echo/new/ "_=_variable name"/echo new variable=name/.'new variable=name';

However, it's not impossible, but very very awkward.
Just my 2 pence ..

There's a related post on stack overflow, that will make you have a very good time, I encourage you to check it out:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/


Notice: This message is digitally signed, its source and integrity are verifiable.
If you mail client does not support S/MIME verification, it will display a file (smime.p7s), which includes the X.509 certificate and the signature body. Read more at Certified E-Mail with Comodo and Thunderbird in AbiusX.com

On Mordad 10, 1392, at 11:46 PM, EnDe notifications@github.com wrote:

I.g. I agree with AbiusX that regex are a bad idea to parse code, you need a parser for that.
Just an example:
echo/new/ "_=_variable name"/echo new variable=name/.'new variable=name';

However, it's not impossible, but very very awkward.
Just my 2 pence ..


Reply to this email directly or view it on GitHub.

done. :)