English uses spacing and hyphenation as cues to allow for beautiful line breaks. Japanese, which has none of these, is notoriously more difficult. Breaks occur randomly, usually in the middle of a word. This is a long standing issue in Japanese typography on web, and results in degradation of readability.
Budou automatically translates Japanese sentences into organized HTML code with meaningful chunks wrapped in non-breaking markup so as to semantically control line breaks. Budou uses Cloud Natural Language API to analyze the input sentence, and it concatenates proper words in order to produce meaningful chunks utilizing PoS (part-of-speech) tagging and syntactic information.
Budou outputs HTML code by wrapping the chunks with SPAN
tag.
By specifying their display
property as inline-block
in CSS, semantic units will no longer be split at the end of a line.
Budou supports only Japanese currently, but support for other Asian languages with line break issues, such as Chinese and Thai, will be added as Cloud Natural Language API adds support.
Install the library by running pip install budou
.
Also, a credential json file is needed for authorization to Cloud Natural Language API.
import budou
# Login to Cloud Natural Language API with credentials
parser = budou.authenticate('/path/to/credentials.json')
result = parser.parse(u'今日も元気です', 'wordwrap')
print result['html_code'] # => "<span class="wordwrap">今日も</span><span class="wordwrap">元気です</span>"
print result['chunks'][0] # => "Chunk(word='今日も', pos='NOUN', label='NN', forward=True)"
print result['chunks'][1] # => "Chunk(word='元気です', pos='NOUN', label='ROOT', forward=False)]"
Semantic units in the output HTML will not be split at the end of line by conditioning each SPAN
tag with display: inline-block
in CSS.
<span class="wordwrap">今日も</span><span class="wordwrap">元気です</span>
.wordwrap {
display: inline-block;
}
- Japanese
Support for other Asian languages with line break issues, such as Chinese and Thai, will be added as Cloud Natural Language API adds support.
Shuhei Iitsuka
- Website: https://tushuhei.com
- Twitter: https://twitter.com/tushuhei
This library is authored by a Googler and copyrighted by Google, but is not an official Google product.
Copyright 2016 Google Inc. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.