phodal/chapi

Python parsing Bug: Issues with codebases like dspy, textgrad , litellm etc

JayGhiya opened this issue ยท 11 comments

Example exception that happens

[SCANNER] o.a.scanner.analyser.PythonAnalyser analysis file: /Users/jayghiya/Documents/unoplat/textgrad/textgrad/./tasks/gpqa.py
enterSimple_stmt ->Import_stmtContext
enterSimple_stmt ->Import_stmtContext
enterSimple_stmt ->Import_stmtContext
enterSimple_stmt ->From_stmtContext
enterSimple_stmt ->From_stmtContext
enterSimple_stmt ->From_stmtContext
enterSimple_stmt ->From_stmtContext
enterSimple_stmt ->Return_stmtContext
enterSimple_stmt ->From_stmtContext
enterSimple_stmt ->Assert_stmtContext
enterSimple_stmt ->Return_stmtContext
enterSimple_stmt ->Return_stmtContext
enterSimple_stmt ->Return_stmtContext
enterSimple_stmt ->Return_stmtContext
enterSimple_stmt ->Return_stmtContext
enterSimple_stmt ->Return_stmtContext
enterSimple_stmt ->Return_stmtContext
Exception in thread "main" java.lang.NullPointerException
        at chapi.ast.pythonast.PythonAstBaseListener.buildTestContext(PythonAstBaseListener.kt:126)
        at chapi.ast.pythonast.PythonAstBaseListener.buildAssignPart(PythonAstBaseListener.kt:115)
        at chapi.ast.pythonast.PythonAstBaseListener.buildExprStmt(PythonAstBaseListener.kt:105)
        at chapi.ast.pythonast.PythonFullIdentListener.enterSimple_stmt(PythonFullIdentListener.kt:116)
        at chapi.ast.antlr.PythonParser$Simple_stmtContext.enterRule(PythonParser.java:2133)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.enterRule(ParseTreeWalker.java:50)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:33)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:36)
        at chapi.ast.pythonast.PythonAnalyser.analysis(PythonAnalyser.kt:15)
        at org.archguard.scanner.analyser.PythonAnalyser.analysisByFile(PythonAnalyser.kt:30)
        at org.archguard.scanner.analyser.PythonAnalyser.access$analysisByFile(PythonAnalyser.kt:11)
        at org.archguard.scanner.analyser.PythonAnalyser$analyse$1$2$1.invokeSuspend(PythonAnalyser.kt:21)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
        at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:284)
        at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
        at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
        at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
        at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
        at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
        at org.archguard.scanner.analyser.PythonAnalyser.analyse(PythonAnalyser.kt:17)
        at org.archguard.scanner.core.sourcecode.LanguageSourceCodeAnalyser$DefaultImpls.analyse(LanguageSourceCodeAnalyser.kt:29)
        at org.archguard.scanner.analyser.PythonAnalyser.analyse(PythonAnalyser.kt:11)
        at org.archguard.scanner.ctl.loader.SourceCodeWorker$run$1.invokeSuspend(AnalyserDispatcher.kt:79)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
        at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:284)
        at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
        at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
        at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
        at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
        at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
        at org.archguard.scanner.ctl.loader.SourceCodeWorker.run(AnalyserDispatcher.kt:76)
        at org.archguard.scanner.ctl.loader.AnalyserDispatcher.dispatch(AnalyserDispatcher.kt:39)
        at org.archguard.scanner.ctl.Runner.run(Runner.kt:107)
        at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:198)
        at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:18)
        at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:400)
        at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:397)
        at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:415)
        at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:440)
        at org.archguard.scanner.ctl.RunnerKt.main(Runner.kt:111)

@phodal - we are doing this effort inspired from you and your work here - https://github.com/unoplat/unoplat-code-confluence .

Can you paste the failed code ?

But in other way, In you use Python for your project, I suggest you can use TreeSitter to parse code.

Thanks sure will do first thing in morning it's late here in India. Also the codebases I tested with were dspy litellm textgrad etc using arcguard cli as we use that as a base for the project - unoplat code confluence

this was the file where it failed and for which the exception is shared- https://github.com/zou-group/textgrad/blob/main/textgrad/tasks/gpqa.py . @phodal

@phodal -
Also two more files where it failed - https://github.com/BerriAI/litellm/blob/main/litellm/tests/test_alerting.py and https://github.com/stanfordnlp/dspy/blob/main/dspy/predict/aggregation.py. Do not know if it is the same issue. trying to give you all the data points

yea you are right we could use tree sitter for python but we would want to be based on chapi/arcguard to keep things simple and work closely with arcguard.

already publish new version 2.3.6 to maven central, it's still need 15~30mins maybe for maven to publish.

So is the new version for arcguard ? @phodal . As we also like most people just use arcguard cli to parse codebases into chapi - Thankyou @phodal ! Also if you get time we would to plug https://github.com/unoplat/unoplat-code-confluence it in to your agents if the current work/roadmap excites/aligns with you.

OK, it will do it later.

@JayGhiya Already publish new version to GitHub, and for Maven Central still need 15~30mins to waiting for it.