Higher Unicode characters

Discussion:

Higher Unicode characters

(too old to reply)

Jean-Guy Marcil

2008-05-30 12:22:00 UTC

Sorry about all the questions, but I figured it would be easier to have all
my Unicode inquiries under oen roof! This way, uninterested parties can just
skip the one post...

Can someone explain something regarding unicode? Or, if such an explanation
were to be too involved, point me toward web ressources to help me
understand.

I have to work with documents written in Simplified/Traditional Chinese (by
the way, I heard that simplified is Mandarin and that Traditional is
Cantonese.... why not use Mandarin and Cantonese?), Korean, Japanese and
Hebrew.
Occasionally, when exchaning documents with other co-workers, it happens
that one person cannot open a document because Word reports that it cannot
read the characters and it pops up the Unicode conversion dialogue box. I
have never been able to use that dialogue box to succesfully convert
characters... Whenever I see that dialogue, I pretty much know that the
document has become useless. So, why would a document saved in Word by one
person become unreadable for another person also using Word?

Also, once the characters have been written, why is it that applying a font
will not work? If there are characters in Simsun, the only way I can change
font is use another "Chinese-compatible" font, like PMingLiu. I mean, I
understand that Verdana may not be able to render Chinese characters, but if
I apply Verdana to Simsun, I would expect to see white squares instead of the
characters. But no, Word will not even apply the font. However, I can apply
Simsun to Roman characters previously formatted with a non-Chinese font, like
Arial.

I am not sure that these will appear, but here goes 4 Chinese characters
(Simplified Chinese):

飞机票价 (air fare)

The third one is unicode 31080 as repoted by the AscW function.
Anywhere in a Word document, regardless of the font at the current insertion
point, if I do ALT+3180, I get 票 in Simsun. Why don't I get the Arial version
of unicode 31080 (Assuming Arial ws the current font at the insertion point)?
Why do I get "h" in this Web inbterface instead of 票 when I do ALT+31080?

Finally, when using the AscW function, why is the reported Unicode number
often a negative one? The first of the four character is Unicode 39134 but
AscW reports that it is -26402. Why can't the VBA compiler add 65536 to get
the proper answer? I have not found any reference to negative code on
http://www.unicode.org/charts/, so why does the AscW report useless
information?

Thank you for you patience and generosity!

Klaus Linke

2008-05-31 13:16:50 UTC

Hi Jean-Guy,

Can't answer everything... Just some parts.

Post by Jean-Guy Marcil
Finally, when using the AscW function, why is the reported Unicode number
often a negative one? The first of the four character is Unicode 39134 but
AscW reports that it is -26402. Why can't the VBA compiler add 65536 to get
the proper answer? I have not found any reference to negative code on
http://www.unicode.org/charts/, so why does the AscW report useless
information?

see Jay's and my replies to Dave Rado at the end of this thread:
http://groups.google.de/group/microsoft.public.word.word97vba/browse_thread/thread/57560582e208fb1c/dee1d22cbd597dad

Basically, VBA hasn't the proper type of integers (unsigned double-byte) for
Unicode, so it fudges along using signed two-byte integers.
If you work with Unicode, you're usually better off using hex notation
anyway, and both Hex(39134) and Hex(-26402) will give you the same hex code
(98DE).
The other way round, you also have an easier time using hex notation
(iCode=&H98DE instead of iCode=39134).

Klaus

Jean-Guy Marcil

2008-06-02 12:32:00 UTC

Post by Klaus Linke
Hi Jean-Guy,
Can't answer everything... Just some parts.

Post by Jean-Guy Marcil
Finally, when using the AscW function, why is the reported Unicode number
often a negative one? The first of the four character is Unicode 39134 but
AscW reports that it is -26402. Why can't the VBA compiler add 65536 to get
the proper answer? I have not found any reference to negative code on
http://www.unicode.org/charts/, so why does the AscW report useless
information?

http://groups.google.de/group/microsoft.public.word.word97vba/browse_thread/thread/57560582e208fb1c/dee1d22cbd597dad
Basically, VBA hasn't the proper type of integers (unsigned double-byte) for
Unicode, so it fudges along using signed two-byte integers.
If you work with Unicode, you're usually better off using hex notation
anyway, and both Hex(39134) and Hex(-26402) will give you the same hex code
(98DE).
The other way round, you also have an easier time using hex notation
(iCode=&H98DE instead of iCode=39134).

I guess somebody was in a hurry to wrap up the VBA library, so they rounded
some corners!

Thanks for the Hex idea.

Klaus Linke

2008-05-31 13:45:45 UTC

The answers below are more or less guesswork, since I haven't seen a good
"official" documentation yet...

Post by Jean-Guy Marcil
Also, once the characters have been written, why is it that applying a font
will not work? If there are characters in Simsun, the only way I can change
font is use another "Chinese-compatible" font, like PMingLiu. I mean, I
understand that Verdana may not be able to render Chinese characters,
but if I apply Verdana to Simsun, I would expect to see white squares
instead
of the characters. But no, Word will not even apply the font.

If you insert some characer that isn't available in the current font, Word
will choose a font that does contain the character.
At least most times ... sometimes it'll continue to use the Latin font and
show a square.

It depends on what language support is currently installed, probably also on
which fonts are installed, probably also on whether a font for Asian text
(or BiDi... text -- see below) is already defined for that run of text, or
for that style, maybe on other things too.

If it does use another font: which font that'll be seems to depend on two
things...

First, Word looks at the code and determines whether it's Western, Asian,
BiDi (Hebrew/Arab), Thai ...

Now say it's an Asian character, and there's already an Asian font defined
for that style, or for that run of text, it'll use that font.
You can see what font that will be using
? Selection.Font.NameFarEast

If you get "Times New Roman" or some other font that doesn't contain FarEast
characters, no Asian font has been set yet.

In that case, it seems to look through the list of available Asian fonts and
pick one at random.

At least I'll often end up with a lot of different Asian fonts if I insert
exotic characters willy-nilly (MS Mincho, SimSun, Arial Unicode MS...).

Post by Jean-Guy Marcil
However, I can apply Simsun to Roman characters previously formatted
with a non-Chinese font, like Arial.

The large font (SimSun... ) has all the Latin (Western) characters, so Word
does not see any problems applying it (or to continue using that font if you
type in some Western text following the Asian character).

Post by Jean-Guy Marcil
I am not sure that these will appear, but here goes 4 Chinese characters
飞机票价 (air fare)
The third one is unicode 31080 as repoted by the AscW function.
Anywhere in a Word document, regardless of the font at the current insertion
point, if I do ALT+3180, I get 票 in Simsun. Why don't I get the Arial version
of unicode 31080 (Assuming Arial ws the current font at the insertion point)?

Because Arial does not contain that glyph.

Post by Jean-Guy Marcil
Why do I get "h" in this Web inbterface instead of 票 when I do ALT+31080?

Most non-Unicode apps do a "Modulo 256" on Unicode characters.
They can't insert the Unicode character, so they'll just use the lower of
the two bytes:

? 31080 mod 256
104
? ChrW(104)
h

Regards,
Klaus

Jean-Guy Marcil

2008-06-02 12:40:01 UTC

Post by Klaus Linke
The answers below are more or less guesswork, since I haven't seen a good
"official" documentation yet...

Well, it is very good guess work!

I did install all complex/Far-East modules that come with Windows and
Office. So, as you state, Word as all the info it needs to make "intelligent"
guesses...

Sometimes, MSFT has messed things up when trying to make Word "Intelligent"
(Like, try applying "Do not spell check" from the langugae dialogue... Most
of the time, Word turns it back on right away..), but in the case of handling
Unicode, I guess they did a good job, at least as far as the UI is concerned.

Thanks for your time.

Klaus Linke

2008-06-03 15:00:36 UTC

I'd actually prefer KISS (keep it simple, stupid)...
-- Only one font.
-- Squares (or better placeholders with the code, as on the Mac) if a
character is missing.
-- A possibility to find such missing glyphs.

If I'd really need different fonts for different languages, I'd rather use
different (character/paragraph) styles, than have Word choose fonts
depending on character and/or language.
Probably a matter of taste, but I'm annoyed several times every day because
Word has applied some (Asian) font behind my back.

Klaus

Post by Jean-Guy Marcil

Post by Klaus Linke
The answers below are more or less guesswork, since I haven't seen a good
"official" documentation yet...

Well, it is very good guess work!
I did install all complex/Far-East modules that come with Windows and
Office. So, as you state, Word as all the info it needs to make "intelligent"
guesses...
Sometimes, MSFT has messed things up when trying to make Word
"Intelligent"
(Like, try applying "Do not spell check" from the langugae dialogue... Most
of the time, Word turns it back on right away..), but in the case of handling
Unicode, I guess they did a good job, at least as far as the UI is concerned.
Thanks for your time.

Klaus Linke

2008-05-31 13:49:21 UTC

Post by Jean-Guy Marcil
Occasionally, when exchaning documents with other co-workers,
it happens that one person cannot open a document because
Word reports that it cannot read the characters and it pops up the
Unicode conversion dialogue box. I have never been able to use
that dialogue box to succesfully convert characters... Whenever I
see that dialogue, I pretty much know that the document has become
useless. So, why would a document saved in Word by one
person become unreadable for another person also using Word?

Not sure, and I haven't seen that message yet.

I'd make sure "Arial Unicode MS" is installed on all machines. It contains
every Unicode character (in Unicode Version 2, which is all Office supports
AFAIK).
That way, the machine should be able to deal with any character you throw at
it.

In the Office setup, it's somewhere under "Office common features >
International support > Universal font" or something like that.

Regards,
Klaus

Jean-Guy Marcil

2008-06-02 12:51:01 UTC

Post by Klaus Linke

Post by Jean-Guy Marcil
Occasionally, when exchaning documents with other co-workers,
it happens that one person cannot open a document because
Word reports that it cannot read the characters and it pops up the
Unicode conversion dialogue box. I have never been able to use
that dialogue box to succesfully convert characters... Whenever I
see that dialogue, I pretty much know that the document has become
useless. So, why would a document saved in Word by one
person become unreadable for another person also using Word?

Not sure, and I haven't seen that message yet.
I'd make sure "Arial Unicode MS" is installed on all machines. It contains
every Unicode character (in Unicode Version 2, which is all Office supports
AFAIK).
That way, the machine should be able to deal with any character you throw at
it.
In the Office setup, it's somewhere under "Office common features >
International support > Universal font" or something like that.

Thanks for the tip...
I did find it in the Office Shared features, but I need the installation CD,
which I do not have at work... Now I have to call the IT people to get them
to install that feature... But they know me now... I keep calling for them to
install stuff on my machine... The first of which was the VBA help files
which were not installed by default where I work...

Klaus Linke

2008-06-03 15:02:30 UTC

Until a couple of years ago, one was able to download "Arial Unicode MS"
from MS, but all download links were removed AFAIK.

:-( Klaus

Post by Jean-Guy Marcil

Post by Klaus Linke

Post by Jean-Guy Marcil
Occasionally, when exchaning documents with other co-workers,
it happens that one person cannot open a document because
Word reports that it cannot read the characters and it pops up the
Unicode conversion dialogue box. I have never been able to use
that dialogue box to succesfully convert characters... Whenever I
see that dialogue, I pretty much know that the document has become
useless. So, why would a document saved in Word by one
person become unreadable for another person also using Word?

Not sure, and I haven't seen that message yet.
I'd make sure "Arial Unicode MS" is installed on all machines. It contains
every Unicode character (in Unicode Version 2, which is all Office supports
AFAIK).
That way, the machine should be able to deal with any character you throw at
it.
In the Office setup, it's somewhere under "Office common features >
International support > Universal font" or something like that.

Thanks for the tip...
I did find it in the Office Shared features, but I need the installation CD,
which I do not have at work... Now I have to call the IT people to get them
to install that feature... But they know me now... I keep calling for them to
install stuff on my machine... The first of which was the VBA help files
which were not installed by default where I work...

Suzanne S. Barnhill

2008-06-03 15:36:00 UTC

But it is supplied with all versions of Windows and/or Office, isn't it?
--
Suzanne S. Barnhill
Microsoft MVP (Word)
Words into Type
Fairhope, Alabama USA

Post by Klaus Linke
Until a couple of years ago, one was able to download "Arial Unicode MS"
from MS, but all download links were removed AFAIK.
:-( Klaus

Post by Jean-Guy Marcil

Post by Klaus Linke

Post by Jean-Guy Marcil
Occasionally, when exchaning documents with other co-workers,
it happens that one person cannot open a document because
Word reports that it cannot read the characters and it pops up the
Unicode conversion dialogue box. I have never been able to use
that dialogue box to succesfully convert characters... Whenever I
see that dialogue, I pretty much know that the document has become
useless. So, why would a document saved in Word by one
person become unreadable for another person also using Word?

Not sure, and I haven't seen that message yet.
I'd make sure "Arial Unicode MS" is installed on all machines. It contains
every Unicode character (in Unicode Version 2, which is all Office supports
AFAIK).
That way, the machine should be able to deal with any character you throw at
it.
In the Office setup, it's somewhere under "Office common features >
International support > Universal font" or something like that.

Thanks for the tip...
I did find it in the Office Shared features, but I need the installation CD,
which I do not have at work... Now I have to call the IT people to get them
to install that feature... But they know me now... I keep calling for them to
install stuff on my machine... The first of which was the VBA help files
which were not installed by default where I work...

Character

2008-06-03 16:25:06 UTC

Post by Suzanne S. Barnhill
But it is supplied with all versions of Windows and/or Office, isn't it?

It's been included with MS Office since at least Office 2000 (and I
assume the Mac equivalents) but not with Windows.

- Ch.

Character

2008-06-03 17:30:12 UTC

Post by Character

Post by Suzanne S. Barnhill
But it is supplied with all versions of Windows and/or Office, isn't it?

It's been included with MS Office since at least Office 2000 (and I
assume the Mac equivalents) but not with Windows.

According to the MS website, ".. if you are using Microsoft Windows
XP, the universal font for Unicode is automatically installed"

It also states:

"Because of its considerable size and the typographic compromises
required to make such a font, Arial Unicode MS should be used only
when you can't use multiple fonts tuned for different writing systems.
For example, if you have multilingual data from many different writing
systems in Microsoft Access, you can use Arial Unicode MS as the font
to display the data tables, because Access can't accept many different
fonts"

Klaus Linke

2008-06-03 16:25:36 UTC

Post by Suzanne S. Barnhill
But it is supplied with all versions of Windows and/or Office, isn't it?

Yes, but it's a real PITA if everybody you send some doc to has to find
his/her installation disks or call their administrator.

I think Microsoft should have the ressources to buy *one* Unicode font with
all the glyphs and make it freeware.

Regards,
Klaus

Character

2008-06-03 17:17:24 UTC

Post by Klaus Linke

Post by Suzanne S. Barnhill
But it is supplied with all versions of Windows and/or Office, isn't it?

Yes, but it's a real PITA if everybody you send some doc to has to find
his/her installation disks or call their administrator.

Simply embed the font (the used subset) in your document.

Post by Klaus Linke
I think Microsoft should have the ressources to buy *one* Unicode font
with all the glyphs.

NO font can have "all" the million or so unicode characters. The sfnt
space (64K)isn't big enough. And unicode defines CHARACTERS, not
GLYPHS. It does NOT define things like alternate glyphs for a given
character; however, OPENTYPE does - theoretically you could have an
infinitely large font!

Apple's built-in "LastResort" font [included with the OS] does some
interesting things -

Post by Klaus Linke
and make it freeware.

Agree - or at least include it with Windows instead of an application.

Klaus Linke

2008-06-03 19:46:43 UTC

Post by Character

Post by Klaus Linke
Yes, but it's a real PITA if everybody you send some doc to has to find
his/her installation disks or call their administrator.

Simply embed the font (the used subset) in your document.

A good idea in principle, but sometimes does not work in practice.
Say I just sent out some text for a dozen authors, so they add translations
and phonetics in different languages.
I just don't know ahead of time which characters they're going to need/use.
Especially the phonetics characters are all over the place in Unicode,
scattered over many different blocks.

I prepared styles for the translation and phonetics using Arial Unicode MS,
but not unexpectedly, quite a few of them called me because they didn't have
the font installed and couldn't follow my instructions for installing it.

Klaus

Suzanne S. Barnhill

2008-06-03 17:12:23 UTC

All the more reason to use PDFs.
--
Suzanne S. Barnhill
Microsoft MVP (Word)
Words into Type
Fairhope, Alabama USA

Post by Klaus Linke

Post by Suzanne S. Barnhill
But it is supplied with all versions of Windows and/or Office, isn't it?

Yes, but it's a real PITA if everybody you send some doc to has to find
his/her installation disks or call their administrator.
I think Microsoft should have the ressources to buy *one* Unicode font
with all the glyphs and make it freeware.
Regards,
Klaus

grammatim

2008-06-30 03:24:18 UTC

On May 30, 8:22 am, Jean-Guy Marcil

Post by Jean-Guy Marcil
Sorry about all the questions, but I figured it would be easier to have all
my Unicode inquiries under oen roof! This way, uninterested parties can just
skip the one post...
Can someone explain something regarding unicode? Or, if such an explanation
were to be too involved, point me toward web ressources to help me
understand.
I have to work with documents written in Simplified/Traditional Chinese (by
the way, I heard that simplified is Mandarin and that Traditional is
Cantonese.... why not use Mandarin and Cantonese?),

I'm surprised no one commented on this.

"Simplified" is PRC (Mainland China). "Traditional" is Taiwan.

If you have a font with the extra characters that have been devised
for Cantonese (I don't believe they're contained in any official
standard), it could be either Traditional or Simplified, because
Cantonese is the language both of Canton and Hong Kong (PRC), and of a
large proportion of the Chinese diaspora -- which would by and large
be opposed to the use of Mao's Simplified character set.

16 Replies
4 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Jean-Guy Marcil 2008-05-30 12:22:00 UTC

Klaus Linke 2008-05-31 13:16:50 UTC

Jean-Guy Marcil 2008-06-02 12:32:00 UTC

Klaus Linke 2008-05-31 13:45:45 UTC

Jean-Guy Marcil 2008-06-02 12:40:01 UTC

Klaus Linke 2008-06-03 15:00:36 UTC

Klaus Linke 2008-05-31 13:49:21 UTC

Jean-Guy Marcil 2008-06-02 12:51:01 UTC

Klaus Linke 2008-06-03 15:02:30 UTC

Suzanne S. Barnhill 2008-06-03 15:36:00 UTC

Character 2008-06-03 16:25:06 UTC

Character 2008-06-03 17:30:12 UTC

Klaus Linke 2008-06-03 16:25:36 UTC

Character 2008-06-03 17:17:24 UTC

Klaus Linke 2008-06-03 19:46:43 UTC

Suzanne S. Barnhill 2008-06-03 17:12:23 UTC

grammatim 2008-06-30 03:24:18 UTC

about - legalese

Loading...