1 /* 2 * This file is part of gtkD. 3 * 4 * gtkD is free software; you can redistribute it and/or modify 5 * it under the terms of the GNU Lesser General Public License 6 * as published by the Free Software Foundation; either version 3 7 * of the License, or (at your option) any later version, with 8 * some exceptions, please read the COPYING file. 9 * 10 * gtkD is distributed in the hope that it will be useful, 11 * but WITHOUT ANY WARRANTY; without even the implied warranty of 12 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 13 * GNU Lesser General Public License for more details. 14 * 15 * You should have received a copy of the GNU Lesser General Public License 16 * along with gtkD; if not, write to the Free Software 17 * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110, USA 18 */ 19 20 // generated automatically - do not change 21 // find conversion definition on APILookup.txt 22 // implement new conversion functionalities on the wrap.utils pakage 23 24 /* 25 * Conversion parameters: 26 * inFile = glib-Character-Set-Conversion.html 27 * outPack = glib 28 * outFile = CharacterSet 29 * strct = 30 * realStrct= 31 * ctorStrct= 32 * clss = CharacterSet 33 * interf = 34 * class Code: No 35 * interface Code: No 36 * template for: 37 * extend = 38 * implements: 39 * prefixes: 40 * - g_ 41 * omit structs: 42 * omit prefixes: 43 * - g_convert_with_iconv 44 * - g_iconv_open 45 * - g_iconv 46 * - g_iconv_close 47 * omit code: 48 * omit signals: 49 * imports: 50 * - glib.Str 51 * - glib.ErrorG 52 * - glib.GException 53 * structWrap: 54 * module aliases: 55 * local aliases: 56 * overrides: 57 */ 58 59 module glib.CharacterSet; 60 61 public import gtkc.glibtypes; 62 63 private import gtkc.glib; 64 private import glib.ConstructionException; 65 66 67 private import glib.Str; 68 private import glib.ErrorG; 69 private import glib.GException; 70 71 72 73 74 /** 75 * Description 76 * The g_convert() family of function wraps the functionality of iconv(). In 77 * addition to pure character set conversions, GLib has functions to deal 78 * with the extra complications of encodings for file names. 79 * File Name Encodings 80 * Historically, Unix has not had a defined encoding for file 81 * names: a file name is valid as long as it does not have path 82 * separators in it ("/"). However, displaying file names may 83 * require conversion: from the character set in which they were 84 * created, to the character set in which the application 85 * operates. Consider the Spanish file name 86 * "Presentación.sxi". If the 87 * application which created it uses ISO-8859-1 for its encoding, 88 * Character: P r e s e n t a c i ó n . s x i 89 * Hex code: 50 72 65 73 65 6e 74 61 63 69 f3 6e 2e 73 78 69 90 * However, if the application use UTF-8, the actual file name on 91 * disk would look like this: 92 * Character: P r e s e n t a c i ó n . s x i 93 * Hex code: 50 72 65 73 65 6e 74 61 63 69 c3 b3 6e 2e 73 78 69 94 * Glib uses UTF-8 for its strings, and GUI toolkits like GTK+ 95 * that use Glib do the same thing. If you get a file name from 96 * the file system, for example, from readdir(3) or from g_dir_read_name(), 97 * and you wish to display the file name to the user, you 98 * will need to convert it into UTF-8. The 99 * opposite case is when the user types the name of a file he 100 * wishes to save: the toolkit will give you that string in 101 * UTF-8 encoding, and you will need to convert it to the 102 * character set used for file names before you can create the 103 * file with open(2) or fopen(3). 104 * By default, Glib assumes that file names on disk are in UTF-8 105 * encoding. This is a valid assumption for file systems which 106 * were created relatively recently: most applications use UTF-8 107 * encoding for their strings, and that is also what they use for 108 * the file names they create. However, older file systems may 109 * still contain file names created in "older" encodings, such as 110 * ISO-8859-1. In this case, for compatibility reasons, you may 111 * want to instruct Glib to use that particular encoding for file 112 * names rather than UTF-8. You can do this by specifying the 113 * encoding for file names in the G_FILENAME_ENCODING 114 * environment variable. For example, if your installation uses 115 * ISO-8859-1 for file names, you can put this in your 116 * ~/.profile: 117 * export G_FILENAME_ENCODING=ISO-8859-1 118 * Glib provides the functions g_filename_to_utf8() and 119 * g_filename_from_utf8() to perform the necessary conversions. These 120 * functions convert file names from the encoding specified in 121 * G_FILENAME_ENCODING to UTF-8 and vice-versa. 122 * Figure 2, “Conversion between File Name Encodings” illustrates how 123 * these functions are used to convert between UTF-8 and the 124 * encoding for file names in the file system. 125 * Figure 2. Conversion between File Name Encodings 126 * Checklist for Application Writers 127 * This section is a practical summary of the detailed 128 * description above. You can use this as a checklist of 129 * things to do to make sure your applications process file 130 * name encodings correctly. 131 * If you get a file name from the file system from a function 132 * such as readdir(3) or gtk_file_chooser_get_filename(), 133 * you do not need to do any conversion to pass that 134 * file name to functions like open(2), rename(2), or 135 * fopen(3) — those are "raw" file names which the file 136 * system understands. 137 * If you need to display a file name, convert it to UTF-8 first by 138 * using g_filename_to_utf8(). If conversion fails, display a string like 139 * "Unknown file name". Do not 140 * convert this string back into the encoding used for file names if you 141 * wish to pass it to the file system; use the original file name instead. 142 * For example, the document window of a word processor could display 143 * "Unknown file name" in its title bar but still let the user save the 144 * file, as it would keep the raw file name internally. This can happen 145 * if the user has not set the G_FILENAME_ENCODING 146 * environment variable even though he has files whose names are not 147 * encoded in UTF-8. 148 * If your user interface lets the user type a file name for saving or 149 * renaming, convert it to the encoding used for file names in the file 150 * system by using g_filename_from_utf8(). Pass the converted file name 151 * to functions like fopen(3). If conversion fails, ask the user to enter 152 * a different file name. This can happen if the user types Japanese 153 * characters when G_FILENAME_ENCODING is set to 154 * ISO-8859-1, for example. 155 */ 156 public class CharacterSet 157 { 158 159 /** 160 */ 161 162 /** 163 * Converts a string from one character set to another. 164 * Note that you should use g_iconv() for streaming 165 * conversions[2]. 166 * Params: 167 * str = the string to convert 168 * len = the length of the string, or -1 if the string is 169 * nul-terminated[1]. 170 * toCodeset = name of character set into which to convert str 171 * fromCodeset = character set of str. 172 * bytesRead = location to store the number of bytes in the 173 * input string that were successfully converted, or NULL. 174 * Even if the conversion was successful, this may be 175 * less than len if there were partial characters 176 * at the end of the input. If the error 177 * G_CONVERT_ERROR_ILLEGAL_SEQUENCE occurs, the value 178 * stored will the byte offset after the last valid 179 * input sequence. 180 * bytesWritten = the number of bytes stored in the output buffer (not 181 * including the terminating nul). 182 * Returns: If the conversion was successful, a newly allocated nul-terminated string, which must be freed with g_free(). Otherwise NULL and error will be set. 183 * Throws: GException on failure. 184 */ 185 public static string convert(string str, gssize len, string toCodeset, string fromCodeset, out gsize bytesRead, out gsize bytesWritten) 186 { 187 // gchar * g_convert (const gchar *str, gssize len, const gchar *to_codeset, const gchar *from_codeset, gsize *bytes_read, gsize *bytes_written, GError **error); 188 GError* err = null; 189 190 auto p = g_convert(Str.toStringz(str), len, Str.toStringz(toCodeset), Str.toStringz(fromCodeset), &bytesRead, &bytesWritten, &err); 191 192 if (err !is null) 193 { 194 throw new GException( new ErrorG(err) ); 195 } 196 197 return Str.toString(p); 198 } 199 200 /** 201 * Converts a string from one character set to another, possibly 202 * including fallback sequences for characters not representable 203 * in the output. Note that it is not guaranteed that the specification 204 * for the fallback sequences in fallback will be honored. Some 205 * systems may do an approximate conversion from from_codeset 206 * to to_codeset in their iconv() functions, 207 * in which case GLib will simply return that approximate conversion. 208 * Note that you should use g_iconv() for streaming 209 * conversions[2]. 210 * Params: 211 * str = the string to convert 212 * len = the length of the string, or -1 if the string is 213 * nul-terminated[1]. 214 * toCodeset = name of character set into which to convert str 215 * fromCodeset = character set of str. 216 * fallback = UTF-8 string to use in place of character not 217 * present in the target encoding. (The string must be 218 * representable in the target encoding). 219 * If NULL, characters not in the target encoding will 220 * be represented as Unicode escapes \uxxxx or \Uxxxxyyyy. 221 * bytesRead = location to store the number of bytes in the 222 * input string that were successfully converted, or NULL. 223 * Even if the conversion was successful, this may be 224 * less than len if there were partial characters 225 * at the end of the input. 226 * bytesWritten = the number of bytes stored in the output buffer (not 227 * including the terminating nul). 228 * Returns: If the conversion was successful, a newly allocated nul-terminated string, which must be freed with g_free(). Otherwise NULL and error will be set. 229 * Throws: GException on failure. 230 */ 231 public static string convertWithFallback(string str, gssize len, string toCodeset, string fromCodeset, string fallback, out gsize bytesRead, out gsize bytesWritten) 232 { 233 // gchar * g_convert_with_fallback (const gchar *str, gssize len, const gchar *to_codeset, const gchar *from_codeset, const gchar *fallback, gsize *bytes_read, gsize *bytes_written, GError **error); 234 GError* err = null; 235 236 auto p = g_convert_with_fallback(Str.toStringz(str), len, Str.toStringz(toCodeset), Str.toStringz(fromCodeset), Str.toStringz(fallback), &bytesRead, &bytesWritten, &err); 237 238 if (err !is null) 239 { 240 throw new GException( new ErrorG(err) ); 241 } 242 243 return Str.toString(p); 244 } 245 246 /** 247 * Converts a string which is in the encoding used for strings by 248 * the C runtime (usually the same as that used by the operating 249 * system) in the current locale into a 250 * UTF-8 string. 251 * Params: 252 * opsysstring = a string in the encoding of the current locale. On Windows 253 * this means the system codepage. 254 * len = the length of the string, or -1 if the string is 255 * nul-terminated[1]. 256 * bytesRead = location to store the number of bytes in the 257 * input string that were successfully converted, or NULL. 258 * Even if the conversion was successful, this may be 259 * less than len if there were partial characters 260 * at the end of the input. If the error 261 * G_CONVERT_ERROR_ILLEGAL_SEQUENCE occurs, the value 262 * stored will the byte offset after the last valid 263 * input sequence. 264 * bytesWritten = the number of bytes stored in the output buffer (not 265 * including the terminating nul). 266 * Returns: The converted string, or NULL on an error. 267 * Throws: GException on failure. 268 */ 269 public static string localeToUtf8(string opsysstring, gssize len, out gsize bytesRead, out gsize bytesWritten) 270 { 271 // gchar * g_locale_to_utf8 (const gchar *opsysstring, gssize len, gsize *bytes_read, gsize *bytes_written, GError **error); 272 GError* err = null; 273 274 auto p = g_locale_to_utf8(Str.toStringz(opsysstring), len, &bytesRead, &bytesWritten, &err); 275 276 if (err !is null) 277 { 278 throw new GException( new ErrorG(err) ); 279 } 280 281 return Str.toString(p); 282 } 283 284 /** 285 * Determines the preferred character sets used for filenames. 286 * The first character set from the charsets is the filename encoding, the 287 * subsequent character sets are used when trying to generate a displayable 288 * representation of a filename, see g_filename_display_name(). 289 * On Unix, the character sets are determined by consulting the 290 * environment variables G_FILENAME_ENCODING and 291 * G_BROKEN_FILENAMES. On Windows, the character set 292 * used in the GLib API is always UTF-8 and said environment variables 293 * have no effect. 294 * G_FILENAME_ENCODING may be set to a comma-separated list 295 * of character set names. The special token "@locale" is taken to 296 * mean the character set for the current 297 * locale. If G_FILENAME_ENCODING is not set, but 298 * G_BROKEN_FILENAMES is, the character set of the current 299 * locale is taken as the filename encoding. If neither environment variable 300 * is set, UTF-8 is taken as the filename encoding, but the character 301 * set of the current locale is also put in the list of encodings. 302 * The returned charsets belong to GLib and must not be freed. 303 * Note that on Unix, regardless of the locale character set or 304 * G_FILENAME_ENCODING value, the actual file names present 305 * on a system might be in any random encoding or just gibberish. 306 * Since 2.6 307 * Params: 308 * charsets = return location for the NULL-terminated list of encoding names 309 * Returns: TRUE if the filename encoding is UTF-8. 310 */ 311 public static int getFilenameCharsets(out string[] charsets) 312 { 313 // gboolean g_get_filename_charsets (G_CONST_RETURN gchar ***charsets); 314 char** outcharsets = null; 315 316 auto p = g_get_filename_charsets(&outcharsets); 317 318 charsets = Str.toStringArray(outcharsets); 319 return p; 320 } 321 322 /** 323 * Converts a filename into a valid UTF-8 string. The conversion is 324 * not necessarily reversible, so you should keep the original around 325 * and use the return value of this function only for display purposes. 326 * Unlike g_filename_to_utf8(), the result is guaranteed to be non-NULL 327 * even if the filename actually isn't in the GLib file name encoding. 328 * If GLib can not make sense of the encoding of filename, as a last resort it 329 * replaces unknown characters with U+FFFD, the Unicode replacement character. 330 * You can search the result for the UTF-8 encoding of this character (which is 331 * "\357\277\275" in octal notation) to find out if filename was in an invalid 332 * encoding. 333 * If you know the whole pathname of the file you should use 334 * g_filename_display_basename(), since that allows location-based 335 * translation of filenames. 336 * Since 2.6 337 * Params: 338 * filename = a pathname hopefully in the GLib file name encoding 339 * Returns: a newly allocated string containing a rendition of the filename in valid UTF-8 340 */ 341 public static string filenameDisplayName(string filename) 342 { 343 // gchar * g_filename_display_name (const gchar *filename); 344 return Str.toString(g_filename_display_name(Str.toStringz(filename))); 345 } 346 347 /** 348 * Returns the display basename for the particular filename, guaranteed 349 * to be valid UTF-8. The display name might not be identical to the filename, 350 * for instance there might be problems converting it to UTF-8, and some files 351 * can be translated in the display. 352 * If GLib can not make sense of the encoding of filename, as a last resort it 353 * replaces unknown characters with U+FFFD, the Unicode replacement character. 354 * You can search the result for the UTF-8 encoding of this character (which is 355 * "\357\277\275" in octal notation) to find out if filename was in an invalid 356 * encoding. 357 * You must pass the whole absolute pathname to this functions so that 358 * translation of well known locations can be done. 359 * This function is preferred over g_filename_display_name() if you know the 360 * whole path, as it allows translation. 361 * Since 2.6 362 * Params: 363 * filename = an absolute pathname in the GLib file name encoding 364 * Returns: a newly allocated string containing a rendition of the basename of the filename in valid UTF-8 365 */ 366 public static string filenameDisplayBasename(string filename) 367 { 368 // gchar * g_filename_display_basename (const gchar *filename); 369 return Str.toString(g_filename_display_basename(Str.toStringz(filename))); 370 } 371 372 /** 373 * Converts a string from UTF-8 to the encoding used for strings by 374 * the C runtime (usually the same as that used by the operating 375 * system) in the current locale. On 376 * Windows this means the system codepage. 377 * Params: 378 * utf8string = a UTF-8 encoded string 379 * len = the length of the string, or -1 if the string is 380 * nul-terminated[1]. 381 * bytesRead = location to store the number of bytes in the 382 * input string that were successfully converted, or NULL. 383 * Even if the conversion was successful, this may be 384 * less than len if there were partial characters 385 * at the end of the input. If the error 386 * G_CONVERT_ERROR_ILLEGAL_SEQUENCE occurs, the value 387 * stored will the byte offset after the last valid 388 * input sequence. 389 * bytesWritten = the number of bytes stored in the output buffer (not 390 * including the terminating nul). 391 * Returns: The converted string, or NULL on an error. 392 * Throws: GException on failure. 393 */ 394 public static string localeFromUtf8(string utf8string, gssize len, out gsize bytesRead, out gsize bytesWritten) 395 { 396 // gchar * g_locale_from_utf8 (const gchar *utf8string, gssize len, gsize *bytes_read, gsize *bytes_written, GError **error); 397 GError* err = null; 398 399 auto p = g_locale_from_utf8(Str.toStringz(utf8string), len, &bytesRead, &bytesWritten, &err); 400 401 if (err !is null) 402 { 403 throw new GException( new ErrorG(err) ); 404 } 405 406 return Str.toString(p); 407 } 408 409 /** 410 * Obtains the character set for the current 411 * locale; you might use this character set as an argument to 412 * g_convert(), to convert from the current locale's encoding to some 413 * other encoding. (Frequently g_locale_to_utf8() and g_locale_from_utf8() 414 * are nice shortcuts, though.) 415 * On Windows the character set returned by this function is the 416 * so-called system default ANSI code-page. That is the character set 417 * used by the "narrow" versions of C library and Win32 functions that 418 * handle file names. It might be different from the character set 419 * used by the C library's current locale. 420 * The return value is TRUE if the locale's encoding is UTF-8, in that 421 * case you can perhaps avoid calling g_convert(). 422 * The string returned in charset is not allocated, and should not be 423 * freed. 424 * Params: 425 * charset = return location for character set name 426 * Returns: TRUE if the returned charset is UTF-8 [1] Note that some encodings may allow nul bytes to occur inside strings. In that case, using -1 for the len parameter is unsafe. [2] Despite the fact that byes_read can return information about partial characters, the g_convert_... functions are not generally suitable for streaming. If the underlying converter being used maintains internal state, then this won't be preserved across successive calls to g_convert(), g_convert_with_iconv() or g_convert_with_fallback(). (An example of this is the GNU C converter for CP1255 which does not emit a base character until it knows that the next character is not a mark that could combine with the base character.) 427 */ 428 public static int getCharset(out string charset) 429 { 430 // gboolean g_get_charset (G_CONST_RETURN char **charset); 431 char* outcharset = null; 432 433 auto p = g_get_charset(&outcharset); 434 435 charset = Str.toString(outcharset); 436 return p; 437 } 438 }