Ethereal-dev: Re: [Ethereal-dev] Bug with localized decimal separator

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <gharris@xxxxxxxxx>
Date: Fri, 11 Mar 2005 02:32:58 -0800
Lars Ruoff wrote:
Hi,
i want to point out again a bug that has been around since a fair amount of
time now and should be adressed. (I pointed that out already once ago but it
got forgotten).
The bug is a problem with a localization issue (at least) on Windows with
most central european locale settings,

.fr is Central European? The folks in Normandy might be surprised by that. :-)

I think Quebec might also use , as the decimal separator.

where the floating point decimal
separator character is the comma "," instead of the decimal point "." .

Apparently, on systems with coresponding settings (i have no idea which
system setting that actually is),

On UN*X, it's probably either the LANG or LC_NUMERIC environment variable (LANG affects a number of behaviors (perhaps including the spelling of "behavio{u}r" :-)), and LC_NUMERIC affects the decimal separator and thousands separator). If they're not set, the default value is used; how that's determined is probably UN*X-dependent (some UNIX+X desktop environments might have a place to set the default from the GUI; I don't know whether OS X sets the default behavior of those routines from the International item in System Preferences).

In Windows, those probably affect the decimal separator (as they're from ANSI C); I don't know whether the GUI that sets the locale (I assume there is one in Windows) changes the environment variable setting or just changes the default.

the g_XXX_printf-family of functions
produce the localized decimal separator when used with float values ("%f").

This has the following undesireable effects:
A - When used in CList columns with column sorting, the sorting on such a
column won't work because atof() does not recongnize the localized decimal
seperator and expects the point "." instead.

The two places atof() is used are

1) the packet list - I don't *think* there are any places where there are numeric columns (rather than time columns) that aren't integral, so perhaps atof() shouldn't be used there;

2) rtp_analysis, where the answer might be to allocate, for each row, a data structure containing the packet number and the floating-point values for the row, and make a pointer to that be the data for the row, and have the sort routine fetch those numbers and compare them rather than fetching the text and comparing them.

(In the longer term, most tables displayed as CLists should perhaps be done by common code - perhaps Luis Ontanon's stats tree code could be used for this, or enhanced for this - in which the tap merely builds a data structure containing tables of raw data to be displayed, and the common code builds the CList in Ethereal or prints out the data as text in Tethereal; in that case, a pointer to the table row could be associated with the CList row, and the sorting done with that data.)

B - When exporting a CList to CSV format, the exported data is unusable,
since the comma "," is used as both the column- and decimal separator.

That one's harder, but ANSI C should, I think, let you save the current locale by calling

	savelocale = setlocale(LC_NUMERIC, NULL);

change the locale to the "C" local by calling

	setlocale(LC_NUMERIC, "C");

and restore the current locale by calling

	setlocale(LC_NUMERIC, savelocale);

(The "right" solution is with the glibc "_l" variants of various routines, so you could get a handle on the "C" locale using whatever new glibc routine does that and use that handle in a strtof_l() call, but those routines are somewhat new and not available in many, perhaps most, environments, so the "setlocale()" workaround is probably the right solution for now.)

That could also be used for the column sorting.

Has anybody experienced the same problem so far?

Given the addition of the "_l" routines to glibc, I presume somebody experienced that problem in C programming; unfortunately, either nobody experienced it before the ANSI C89 standard was frozen, or, if they did, they didn't bother pointing out to the ANSI C committee that there are probably programs that both read numbers from and/or display numbers to the user, in which case the locale's conventions should probably be used, and read them from and/or write them to files expected to be in a standard format, in which case the C locale's conventions should probably be used, and that the use of "setlocale()" to push and pop the locale, whilst better than nothing, is a bit clumsy, and not thread-safe (unless the locale is thread-specific, but I don't think it is).