Failed to parse japanese style name
Closed this issue · 12 comments
I have a use case that will parse and group the text based on their style name.
However I couldn't successfully parse the style name in japanese character, here's the sample output of my testing program:
STYLE &{ans-list} // english characger
TEXT: &{The students will have a school trip next week. <nil>}
STYLE &{候選答案} // chinese character
TEXT: &{The students will have lunch at a hotel near Ueno Zoo. <nil>}
STYLE &{a} // kanji, the original text is '選択肢'
TEXT: &{The students will leave Ueno Zoo at two. <nil>}
I'm not sure if it's related to encoding or something.
If you know how to fix it I could also help submit a PR, thanks for the help.
Hi,
can you provide additional information
Environment Details
- Godocx Version:
- Go Version:
- Operating System:
- Word Processor Used:
- Microsoft Word
- LibreOffice
- Google Docs
- Other (please specify)
- Word Processor Version:
Sample Code
Here's my local environment:
- Godocx Version: v0.1.1-beta.1
- Go Version: 1.22.4
- Operating System: macOS Sonoma 14.3
- Word Processor Used: Microsoft Word (cloud office 365)
Sample Code:
func main() {
docx, err := godocx.OpenDocument("example.docx")
if err != nil {
log.Fatal(err)
}
for _, c := range docx.Document.Body.Children {
fmt.Println("STYLE", c.Para.Property.Style)
for _, c2 := range c.Para.Children {
for _, c3 := range c2.Run.Children {
fmt.Println("TEXT:", c3.Text)
}
}
}
}
I created sample docx(with python-docx) to mimick the issue and read it with exact godocx version as mentioned.
It appears working for me.
STYLE &{ans-list}
TEXT: &{The students will have a school trip next week. }
STYLE &{候選答案}
TEXT: &{The students will have lunch at a hotel near Ueno Zoo. }
STYLE &{選択肢}
TEXT: &{The students will leave Ueno Zoo at two. }
would you mind testing this document ?
The python package is able to parse it correctly while godocx cannot.
Thank you for the input. I can see the issue. In python-docx, It parses the style id into ParagraphStyle class and gets the details from styles.xml
(i.e maps style id 'a' to style name '選択肢'). In godocx, it is just generic struct that contains just style id.
Do you think it's a bug and would you fix it ?
I'm willing to help if you can pinpoint where should I look into
I don't believe it's a bug; the current behavior is as intended. I can write a function to retrieve style details based on the style ID by parsing docProps/styles.xml and indexing them by IDs. However, at the moment, I'm prioritizing implementing basic functions and fixes in the library. I'll certainly work on this as soon as possible. Thank you for your understanding.
v0.1.3-beta.1 has introduced the GetStyle method for paragraph, which can be used to retrieve the style metadata
v0.1.3-beta.1 has introduced the GetStyle method for paragraph, which can be used to retrieve the style metadata
What's the styleID I need for the GetStyle(styleID string)
method ?
I noticed that styleID
isn't actually used in that method.
I tried passing a random string and then it panic:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0xe0 pc=0x10412adc4]
goroutine 1 [running]:
github.com/gomutex/godocx/docx.(*RootDoc).GetStyleByID(...)
/Users/chris/go/pkg/mod/github.com/gomutex/godocx@v0.1.3-beta.1/docx/styles.go:9
github.com/gomutex/godocx/docx.(*Paragraph).GetStyle(0x10412ff4b?, {0x104018b1c?, 0x1400009eed8?})
/Users/chris/go/pkg/mod/github.com/gomutex/godocx@v0.1.3-beta.1/docx/paragraph.go:209 +0x54
Apologies. Yes, the styleID is not used and should not be there. I have fixed the nil pointer error also(in develop branch). Can you try the develop branch and check if there are any other bugs?
It works great:
STYLE: 解答(記号)
STYLE: ans-list
TEXT: &{The students will have a school trip next week. <nil>}
STYLE: 選択肢
TEXT: &{The students will have lunch at a hotel near Ueno Zoo. <nil>}
STYLE: 候選答案
TEXT: &{The students will leave Ueno Zoo at two. <nil>}
STYLE: ans-list
Thanks a lot for the quick response and fix !
I have merged the fix into main branch. You can use version v0.1.3-beta.2 (or latest)