OOHanzi 0.5 released

Change Log

  • 20080628:

    • Updated for OOHanzi 0.5.
    • Added support for marking up words that are present in DDB.
    • OOHanzi is now fully installable in Ubuntu 8.04 (Hardy Heron).
  • 20080302:

    • Updated for OOHanzi 0.3: the only new functionality is the addition of an “About…” menu item.
  • 200802??:

    • Added some code to make things a bit more user friendly when a JRE is not properly installed.
    • Modified the way web browsers are launched.
    • Changed the nomenclature of menus and some functions.
    • General fixes to improve stability in Windows.

Status

This documentation deals with version 0.5 of OOHanzi. This software is very much at the Alpha stage of its life-cycle. Expect bugs. Expect nonsensical design decisions. Expect quirks.

Note: Version 0.4 was never officially released.

Imagine a paper even before it is at the draft stage, when it is still just a bunch of thoughts quickly put together. Or notes taken at a conference. At this stage, OOHanzi is very much the programmatic equivalent of that paper or those notes.

If you use OOHanzi and find it useful, please manifest yourself. If I do not hear from users I’m going to produce releases as I see fit and I’ll just cater ot my own needs.

Acknowledgments

My thanks go to Charles Muller for prompting me to release this and for testing and reporting bugs. I was planing to release it for a long time but never got around to it. When he asked to see what this was all about, I had to finally package it for other people.

Support

OOHanzi is hosted on Launchpad. Please use the Launchpad facilities to report bugs and ask questions.

You can keep informed of the latest development by subscribing to either of these RSS feeds:

All of the code is available on Launchpad. If you want to hack the code, either leave a comment on this blog or find my email address on Launchpad and contact me.

Prerequisites

This code has been tested with Open Office 2.4.1. I have used it with older versions of Open Office (2.0.x and higher, and specifically with 2.3.0) but the latest version of the code has not been tested on anything older than 2.1.1.

You need to be able to run Java code on your computer. The extensions have been tested with Java 6 or higher.

Linux: People running Ubuntu 7.4 or 7.10 must install either sun-java6-jre or icedtea-java7-jre. (The latter seems preferable.) People running Ubuntu 8.04 should install openjdk-6-jre. Other distributions probably have equivalent packages that can be installed.

NOTE FOR PEOPLE UPGRADING TO HARDY: It seems that IcedTea does not exist for Hardy. Unfortunately, if you have used OOHanzi in Gutsy with IcedTea as the JVM, OpenOffice recorded the JVM to be IcedTea. Once you move to Hardy, you have a problem because that JVM no longer exists. In order to get OpenOffice to forget which JVM you used, you need to execute the following in a shell:

$ rm ~/.openoffice.org2/user/config/javasettings_Linux_x86.xml

If you do not do this, the installation will fail.

Windows:There are many options for Windows. It is possible to download Open Office together with a Java 6 from the Open Office web site. This is the option I used when I installed OO in Windows for testing OOHanzi. I know nothing about the other options.

People running other systems should refer to their system’s documentation.

Automatic Installation

If you use Ubuntu, you can add my PPA archive to your apt sources. Look at the main page of my PPA archive for the specific lines you must add. Then you can issue “apt-get install oohanzi”. Whenever installing or removing extensions it is better to close all of Open Office. The packages will install if Open Office is running and you will get a warning about that but the new extensions won’t be usable unless you restart Open Office.

If you used the manual installation first and want to switch to the automatic system, you should first remove everything you’ve installed manually.

Eventually there will be an automatic installation system for Windows but it may take a while before that is developed.

Manual Installation

  1. If you use the Open Office Quickstarter, please pay attention to the following. If you don’t know what the Quickstarter is, please read page 6 of this document. Note that the OO Quickstarter is installed by default in Windows. Here is the important part: if you use the Quickstarter, then whenever the instructions tell you to restart Open Office, you must also go into the Quickstarter and select “Exit Quickstarter”. If you do not exit the Quickstarter, Open Office will not be unloaded from memory and thus will not restart when you open it again. It will just work from what is already loaded.

  2. You need to get three items:

    • http://lddubeau.com/downloads/java/unihan-lib-[version].jar
    • http://lddubeau.com/downloads/java/webdict-lib-[version].jar
    • http://lddubeau.com/downloads/openoffice/extensions/oounihan-[version].oxt
    • http://lddubeau.com/downloads/openoffice/extensions/oowebdict-[version].oxt
    • http://lddubeau.com/downloads/openoffice/extensions/oohanzi-[version].oxt

    The string “[version]” stands for the version number of the respective packages. Always use the latest version numbers. It is normal if all files do not have the same version number. Here are links to the directories containing the files above:

  3. Make sure that your Open Office setup is set to find the Java JRE. Go in “Tools->Options”. Get to the “Java” tab (located under “Open Office.org” in the hierarchy on the left). Once you open that tab, it will take a little bit of time but Open Office will search your disk for JREs already installed. After it has found them, it will populate the table labeled “Java runtime environments (JRE) already installed:”. Select the one you want to use (usually you want the latest version), and click “Ok”.

    On Ubuntu systems, Open Office is able to find all properly packaged JREs. (That is, all JREs provided by the Ubuntu repositories.)

    If Open Office is unable to find your JRE, then you need to click “Add…”, find where your JRE is located and add it to the list. Because I run Ubuntu, I’ve never had to do this so I do not know the ins and outs of adding a JRE manually.

    After you select your JRE, you will most likely have to restart Open Office. You will get a dialog that will tell you to do so. If you use the Quickstarter, please also exit the Quickstarter. [NOTE: you can wait until you preform step 4 to restart Open Office.]

  4. You must make unihan-lib-[version].jar and webdict-lib-[version].jar available to the Java JRE. Whichever method you use is fine so if you already have your own method to make 3rd party jars available, use your method. Otherwise, do the following.

    Go back into the same Java configuration tab as you did in step 3. This time around you need to click “Class Path…” and then “Add Archive…”. Find the unihan-lib-[version].jar that you saved in step 2 and click “Open”. Do the same thing for webdict-lib-[version].jar

    Restart Open Office so that the library will be loaded next time the JRE is run. If you use the Quickstarter, remember to exit the Quickstarter too.

  5. Start Open Office. Go in “Tools->Extension Manager…”.

  6. Click on “Add…”, find oounihan-[version].oxt and click “Open”. Click on “Add…” again, find oowebdict-[version].oxt and click “Open”. Click on “Add…” again, find oohanzi-[version].oxt and click “Open”.

  7. You’re done! You should now have menu item called “OOHanzi” in your menubar. If you do not get that menu item, then something went wrong. Contact me.

[IMPORTANT NOTE:When you use OOHanzi, if you immediately get some horrible message that says that it cannot load a Java class, it means that your java environment is not working properly.]

Automatic Upgrading

If you use my PPA repository then you can use use apt-upgrade to upgrade to the latest version.

Manual Upgrading

When you get a newer version of an oxt file you can just double click on it and Open Office will install it over the old version. However, I have noticed that Open Office does not register menu changes until I restart it. This seems to be a quirk in how Open Office operates.

Upgrading to a new jar requires that you uninstall the older jar and install the new one. Go to the menu item “Tools->Options”. Go to the Java tab. Click “Class Path…”, remove the old archive and add the new one.

Usage

After installation, you have a new menu called “OOHanzi”.

[KNOWN BUG:None of the items in the menu “OOHanzi” are ever grayed out, even when they should not be usable. For instance “Clear Pronunciation” should ideally be grayed out when no text is selected. If no text is selected, there is no point using “Clear Pronunciation” since it works on the current selection. Several other items in “OOHanzi” are the same. Unfortunately, the documentation for writing extensions for OO is terrible. Most of it is in a half-baked state or out of date or does not cover everything needed. There is probably a way to ensure that menu items are active only when they should be active but I do not know how to do this.]

The functions under it are:

Display Unihan Information

This searches the Unihan database for the currently selected character and displays a window with the data found in Unihan. See the Unihan documentation for meaning of each field.

Mark Words Present In…

This menu provides functionality to mark all words in text selected by the user. The typical use would be to first select a passage from the text being edited. Then select one item under this menu. OOHanzi will then scan the selected passage and mark each word it finds in the database that the user has selected. For instance, if the user select “Mark Words Present In…/DDB”, OOHanzi will mark all the words that are present in the DDB.

There are several things to keep in mind when using this function:

  1. The algorithm will try to find the longest possible terms from the start to the end of the selection.
  2. The algorithm uses a two color scheme: blue and red, and alternates between them.
  3. The algorithm is stupid. It does not know which terms would provide a better payoff for the user. For instance, if characters ABCD appear together and AB and BCD both exist in the database. The algorithm will mark AB as one term and will not mark BCD. If CD also exists, it will mark CD. So basically, it does not handle overlapping hits very well. It is a general problem when performing this kind of function and requires a) careful design and b) a proper interface for showing overlapping hits.

Lookup

This menu contains menu items to perform searches in dictionaries hosted at various web sites. The selected text is looked up at the web site which corresponds to the menu item selected.

DDB is Charles Muller’s Digital Dictionary of Buddhism.

CJKV is Charles Muller’s Chinese Japanese Korean Vietnamese Dictionary.

Etymology is YellowBridge’s Etymology Dictionary.

Fill Rubies Using Unihan

This adds pronunciation to the selected text. The pronunciation is taken from the Unihan database and added as “rubies” to each character of the selected text. See this page if you don’t know the term “ruby” in this context.

The algorithm actually works in 2 steps:

  1. For a given character on which the user wants to add the pronunciation, it first tries to find the same character in the current document. If that character already has its pronunciation in its ruby, then that pronunciation is copied to the current character.
  2. If the search in step 1 fails, then Unihan is used to get the pronunciation. The pronunciation extracted from Unihan is Mandarin.

[FOR FUTURE DEVELOPMENT: Configure OOHanzi to be able to get various pronunciations out of Unihan rather than just Mandarin.]

Fill Rubies Using Documents

This adds pronunciation to the selected text. The pronunciation is taken from any *currently opened Text* document (i.e. Open Office Writer documents). This is like step 1 in “Fix Pronunciation Using Unihan” but the search is made across all currently opened Text documents. If the search fail, Unihan is *not* consulted.

Adjust Rubies

The problem with doing Unihan lookups is that Unihan records all regular pronunciations of a character. I vaguely remember doing a computation showing that there are at most 5 pronunciations for a given character (in Mandarin). So under this item, you find 5 menu items that are used to select which pronunciation to keep. Usage scenario:

  1. 1. You select some text.

  2. 2. You use “Fix Pronunciation Using Unihan”.

    After that is done, the first character has 5 pronunciations in its ruby. You cant to fix it.

  3. You select that character.

  4. You execute “OOHanzi->Adjust Pronunciation->Keep 3rd”.

  5. Now that character only has the 3rd of the 5 pronunciations.

Note that it is always possible to arbitrarily edit the ruby of any character by going in “Format->Asian phonetic guide…”.

Clear Rubies

This removes all the rubies from the selected text.

Preferences

This menu item brings up a dialog that allows the user to specify how “Display Unihan Information” presents information extracted from the Unihan database. It is a good idea to filter that information because Unihan contains a lot of data and some of that data is useful only to some users.

At the top of the dialog there is a checkbox labeled “Filter returned Unihan fields”. If it is unchecked, then OOHanzi does not filter the data provided by Unihan. If it is checked, OOHanzi will include only the fields listed in the “Included fields” list which appears just under the checkbox.

All fields which appear in the “Included fields” list will be displayed by “Display Unihan Information”. All fields which appear in the “Excluded fields” list will not be displayed. Use the two buttons between the two lists to move items from one list to the other.

About…

This item brings up a dialog which tells you which versions of OOHanzi, OOUnihan and the Java Unihan library are in use. This is useful if you want to check what you are running or if you want to produce a bug report.

Known Bugs

As of June 26th, 2008 the “Mark Words Present In…” function does not work in Windows. I suspect the problem is with the default encoding in which the Java JRE opens HTTP streams in Windows but I have not had time to investigate.

Other Issues

In Ubuntu, when running Open Office 2.3.0 or later with compiz the dialog windows are sometimes not sized properly. This is not a bug with OOHanzi. I have reported the bug. This issue does not occur if compiz is turned off. Moreover, this issue does not happen in Windows.

Hacking

Eventually the entire code for everything will be made available on the web. Right now however, only the OOBasic portions are available. To get to them, go into “Tools->Macros->Organize Macros->OpenOffice.org Basic…”. All the OOBasic code for OOHanzi will appear in the hierarchy under “My Macros/OOHanzi”.

4 thoughts on “OOHanzi 0.5 released

  1. Christian Wittern

    Hi Louis-Dominique,
    I tried to install your extension on Mac OS X 10.5.5 and OO 3.0. When I come to the step where it says “Add the extensions”, I get an error, saying something like UNO bridge error, unsupported class version error.

    Do you know what this means and how to fix this?

    Christian

    Reply
  2. Louis-Dominique Post author

    Hi Christian,

    I’m sorry. I’m afraid I have never seen such error message. “Unsupported class version” error suggest that maybe the Java Runtime Environment being used is older than would be desirable? OOHanzi requires Java 6. Unfortunately, I do not have a Mac so I cannot test OOHanzi on a Mac.

    If you check the version of your Java Runtime Environment and find that it is high enough, then I suggest that you file a bug report on Launchpad:

    https://bugs.launchpad.net/oohanzi

    You will have to register if you are not already registered. I know it is not pleasant to register on yet another site but reporting there allows me to keep some sort of structure to the project and to record problems and solutions.

    Thank you.

    Reply
  3. bernardo

    Hi. I can read neither Japanese or Chinese. But I am looking for some kind of extension to give the phonetics of Japanese characters in Chinese. It is for Taiwanese who want to learn Japanese and need to know how to pronounce the characters.

    Do you happen to know of any?

    Thanking you in advance,

    Reply
    1. Louis-Dominique Post author

      bernardo,

      I was out of reach for a few days. Hence the delay in replying. I don’t know of any tool which would give Japanese phonetics in Chinese. For giving Chinese phonetics in “Chinese” there is Hanyu Pinyin, bopomofo, and so on. (It can be argued that Hanyu Pinyin and bopomofo are not Chinese in the same way that the IPA is not English.) Maybe they have been adapted for representing Japanese but I don’t know.

      Sorry I cannot be more helpful.

      Reply

Leave a Reply to Louis-Dominique Cancel reply

Your email address will not be published. Required fields are marked *