For instance, the texts at asianclassics.org are encoded in the TibetanMachineWeb font. This font relies on some arcane encoding to produce the proper stacks of consonants, etc. Because of this, the texts offered by that site cannot be used as-is if any kind of sensible information processing is going to be performed on them. It is possible however, to convert those files to Unicode or Wylie. Here’s the process. Unfortunately, it requires Microsoft software. I tried to find a procedure in Linux but my efforts were thwarted. (Also, I was not inclined to test every html2rtf tool available under the sun.) You probably need to have the TibetanMachineWeb fonts installed for this to work.
How to convert web pages from TibetanMachineWeb to Unicode
Leave a reply