On Jul 23, 8:40 pm, Alberto Ganesh Barbati <AlbertoBarb...@[EMAIL PROTECTED]
>
wrote:
> Le Chaud Lapin ha scritto:
> > Second, the example I used comparing French "exasperation" to English
> > "exasperation" was poor. I was probably tired. I am again tired, so
> > no good examples come to mind right now, but my gut feeling is that
> > the class should have at least what I have been calling "locale", even
>
> I guess "language" is a more appropriate term than locale here, unless
> with "locale" you mean other kind of contextual metadata, which is bound
> to be more complex.
I finally had a chance today to look more at your ICU link:
http://www.icu-project.org/userguide/intro.html
It seems that locale includes language/country/script, and probably
others. I would probably grab as much information as possible.
> > though that might not be the correct term. All of you have warned
> > against putting intelligence in the string class. I wonder if this bit
> > of extra information would count as too much intelligence. While I
> > have not read enough about Unicode to know the path I will follow, I
> > will probably include this bit of information anyway.
> > <snip>
> > String<> s1 = "mein"; // German for English "mine"
> > String<> s2 = "mein"; // English stolen from Chinese for type of
> > noodle.
>
> Ok, so... what about this:
>
> "Let's eat spaghetti, a bratwurst and a crème brûlée"
>
> is this English, Italian, German or French? You can't simply attach
> metadata to the *whole* string, you have to consider substrings too. You
> have two choices, either you can store metadata in a struct separate
> from the textual data, or you store them in the textual data itself by
> effectively introducing some form of tagging. XML (with xml:lang) and
> Unicode language tags (see 16.9
inhttp://www.unicode.org/versions/Unicode5.0.0/ch16.pdf)
follow in the
> latter category. Despite the obvious added complexity in parsing and
> traversing the string, the tag approach has a lot of advantages.
> However, according to Unicode terminology, attached metadata are
> responsibility of a "higher level protocol" and so, IMHO, should not be
> addressed by the basic container.
This is a good example. My answer: I don't know. :) I have to learn
more about Unicode.
If there is some super-code that will include all Latin scripts, I
would use that for this example to represent the string. Then all
would depend on what needed to be done. As everyone mentioned, this
will depend on the context in which the operations are to be
performed.
I will most likely fatten my String<> class sufficiently to allow the
programmer to insert semantic indicators into the string objects
themselves, at run-time, so that operator == will know what to do when
it is applied to two string objects.
-Le Chaud Lapin-
--
[ See http://www.gotw.ca/resources/clcm.htm
for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


|