glib.CharacterSet source code

1 /*
2  * This file is part of gtkD.
3  *
4  * gtkD is free software; you can redistribute it and/or modify
5  * it under the terms of the GNU Lesser General Public License
6  * as published by the Free Software Foundation; either version 3
7  * of the License, or (at your option) any later version, with
8  * some exceptions, please read the COPYING file.
9  *
10  * gtkD is distributed in the hope that it will be useful,
11  * but WITHOUT ANY WARRANTY; without even the implied warranty of
12  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
13  * GNU Lesser General Public License for more details.
14  *
15  * You should have received a copy of the GNU Lesser General Public License
16  * along with gtkD; if not, write to the Free Software
17  * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110, USA
18  */
19  
20 // generated automatically - do not change
21 // find conversion definition on APILookup.txt
22 // implement new conversion functionalities on the wrap.utils pakage
23 
24 /*
25  * Conversion parameters:
26  * inFile  = glib-Character-Set-Conversion.html
27  * outPack = glib
28  * outFile = CharacterSet
29  * strct   = 
30  * realStrct=
31  * ctorStrct=
32  * clss    = CharacterSet
33  * interf  = 
34  * class Code: No
35  * interface Code: No
36  * template for:
37  * extend  = 
38  * implements:
39  * prefixes:
40  * 	- g_
41  * omit structs:
42  * omit prefixes:
43  * 	- g_convert_with_iconv
44  * 	- g_iconv_open
45  * 	- g_iconv
46  * 	- g_iconv_close
47  * omit code:
48  * omit signals:
49  * imports:
50  * 	- glib.Str
51  * 	- glib.ErrorG
52  * 	- glib.GException
53  * structWrap:
54  * module aliases:
55  * local aliases:
56  * overrides:
57  */
58 
59 module glib.CharacterSet;
60 
61 public  import gtkc.glibtypes;
62 
63 private import gtkc.glib;
64 private import glib.ConstructionException;
65 
66 
67 private import glib.Str;
68 private import glib.ErrorG;
69 private import glib.GException;
70 
71 
72 
73 
74 /**
75  * Description
76  * The g_convert() family of function wraps the functionality of iconv(). In
77  * addition to pure character set conversions, GLib has functions to deal
78  * with the extra complications of encodings for file names.
79  * File Name Encodings
80  * Historically, Unix has not had a defined encoding for file
81  * names: a file name is valid as long as it does not have path
82  * separators in it ("/"). However, displaying file names may
83  * require conversion: from the character set in which they were
84  * created, to the character set in which the application
85  * operates. Consider the Spanish file name
86  * "Presentación.sxi". If the
87  * application which created it uses ISO-8859-1 for its encoding,
88  * Character: P r e s e n t a c i ó n . s x i
89  * Hex code: 50 72 65 73 65 6e 74 61 63 69 f3 6e 2e 73 78 69
90  * However, if the application use UTF-8, the actual file name on
91  * disk would look like this:
92  * Character: P r e s e n t a c i ó n . s x i
93  * Hex code: 50 72 65 73 65 6e 74 61 63 69 c3 b3 6e 2e 73 78 69
94  * Glib uses UTF-8 for its strings, and GUI toolkits like GTK+
95  * that use Glib do the same thing. If you get a file name from
96  * the file system, for example, from readdir(3) or from g_dir_read_name(),
97  * and you wish to display the file name to the user, you
98  * will need to convert it into UTF-8. The
99  * opposite case is when the user types the name of a file he
100  * wishes to save: the toolkit will give you that string in
101  * UTF-8 encoding, and you will need to convert it to the
102  * character set used for file names before you can create the
103  * file with open(2) or fopen(3).
104  * By default, Glib assumes that file names on disk are in UTF-8
105  * encoding. This is a valid assumption for file systems which
106  * were created relatively recently: most applications use UTF-8
107  * encoding for their strings, and that is also what they use for
108  * the file names they create. However, older file systems may
109  * still contain file names created in "older" encodings, such as
110  * ISO-8859-1. In this case, for compatibility reasons, you may
111  * want to instruct Glib to use that particular encoding for file
112  * names rather than UTF-8. You can do this by specifying the
113  * encoding for file names in the G_FILENAME_ENCODING
114  * environment variable. For example, if your installation uses
115  * ISO-8859-1 for file names, you can put this in your
116  * ~/.profile:
117  * export G_FILENAME_ENCODING=ISO-8859-1
118  * Glib provides the functions g_filename_to_utf8() and
119  * g_filename_from_utf8() to perform the necessary conversions. These
120  * functions convert file names from the encoding specified in
121  * G_FILENAME_ENCODING to UTF-8 and vice-versa.
122  *  Figure 2, “Conversion between File Name Encodings” illustrates how
123  * these functions are used to convert between UTF-8 and the
124  * encoding for file names in the file system.
125  * Figure 2. Conversion between File Name Encodings
126  * Checklist for Application Writers
127  * This section is a practical summary of the detailed
128  * description above. You can use this as a checklist of
129  * things to do to make sure your applications process file
130  * name encodings correctly.
131  * If you get a file name from the file system from a function
132  * such as readdir(3) or gtk_file_chooser_get_filename(),
133  * you do not need to do any conversion to pass that
134  * file name to functions like open(2), rename(2), or
135  * fopen(3) — those are "raw" file names which the file
136  * system understands.
137  * If you need to display a file name, convert it to UTF-8 first by
138  * using g_filename_to_utf8(). If conversion fails, display a string like
139  * "Unknown file name". Do not
140  * convert this string back into the encoding used for file names if you
141  * wish to pass it to the file system; use the original file name instead.
142  * For example, the document window of a word processor could display
143  * "Unknown file name" in its title bar but still let the user save the
144  * file, as it would keep the raw file name internally. This can happen
145  * if the user has not set the G_FILENAME_ENCODING
146  * environment variable even though he has files whose names are not
147  * encoded in UTF-8.
148  * If your user interface lets the user type a file name for saving or
149  * renaming, convert it to the encoding used for file names in the file
150  * system by using g_filename_from_utf8(). Pass the converted file name
151  * to functions like fopen(3). If conversion fails, ask the user to enter
152  * a different file name. This can happen if the user types Japanese
153  * characters when G_FILENAME_ENCODING is set to
154  * ISO-8859-1, for example.
155  */
156 public class CharacterSet
157 {
158 	
159 	/**
160 	 */
161 	
162 	/**
163 	 * Converts a string from one character set to another.
164 	 * Note that you should use g_iconv() for streaming
165 	 * conversions[2].
166 	 * Params:
167 	 * str = the string to convert
168 	 * len = the length of the string, or -1 if the string is
169 	 * nul-terminated[1].
170 	 * toCodeset = name of character set into which to convert str
171 	 * fromCodeset = character set of str.
172 	 * bytesRead = location to store the number of bytes in the
173 	 * input string that were successfully converted, or NULL.
174 	 * Even if the conversion was successful, this may be
175 	 * less than len if there were partial characters
176 	 * at the end of the input. If the error
177 	 * G_CONVERT_ERROR_ILLEGAL_SEQUENCE occurs, the value
178 	 * stored will the byte offset after the last valid
179 	 * input sequence.
180 	 * bytesWritten = the number of bytes stored in the output buffer (not
181 	 * including the terminating nul).
182 	 * Returns: If the conversion was successful, a newly allocated nul-terminated string, which must be freed with g_free(). Otherwise NULL and error will be set.
183 	 * Throws: GException on failure.
184 	 */
185 	public static string convert(string str, gssize len, string toCodeset, string fromCodeset, out gsize bytesRead, out gsize bytesWritten)
186 	{
187 		// gchar * g_convert (const gchar *str,  gssize len,  const gchar *to_codeset,  const gchar *from_codeset,  gsize *bytes_read,  gsize *bytes_written,  GError **error);
188 		GError* err = null;
189 		
190 		auto p = g_convert(Str.toStringz(str), len, Str.toStringz(toCodeset), Str.toStringz(fromCodeset), &bytesRead, &bytesWritten, &err);
191 		
192 		if (err !is null)
193 		{
194 			throw new GException( new ErrorG(err) );
195 		}
196 		
197 		return Str.toString(p);
198 	}
199 	
200 	/**
201 	 * Converts a string from one character set to another, possibly
202 	 * including fallback sequences for characters not representable
203 	 * in the output. Note that it is not guaranteed that the specification
204 	 * for the fallback sequences in fallback will be honored. Some
205 	 * systems may do an approximate conversion from from_codeset
206 	 * to to_codeset in their iconv() functions,
207 	 * in which case GLib will simply return that approximate conversion.
208 	 * Note that you should use g_iconv() for streaming
209 	 * conversions[2].
210 	 * Params:
211 	 * str = the string to convert
212 	 * len = the length of the string, or -1 if the string is
213 	 * nul-terminated[1].
214 	 * toCodeset = name of character set into which to convert str
215 	 * fromCodeset = character set of str.
216 	 * fallback = UTF-8 string to use in place of character not
217 	 * present in the target encoding. (The string must be
218 	 * representable in the target encoding).
219 	 * If NULL, characters not in the target encoding will
220 	 * be represented as Unicode escapes \uxxxx or \Uxxxxyyyy.
221 	 * bytesRead = location to store the number of bytes in the
222 	 * input string that were successfully converted, or NULL.
223 	 * Even if the conversion was successful, this may be
224 	 * less than len if there were partial characters
225 	 * at the end of the input.
226 	 * bytesWritten = the number of bytes stored in the output buffer (not
227 	 * including the terminating nul).
228 	 * Returns: If the conversion was successful, a newly allocated nul-terminated string, which must be freed with g_free(). Otherwise NULL and error will be set.
229 	 * Throws: GException on failure.
230 	 */
231 	public static string convertWithFallback(string str, gssize len, string toCodeset, string fromCodeset, string fallback, out gsize bytesRead, out gsize bytesWritten)
232 	{
233 		// gchar * g_convert_with_fallback (const gchar *str,  gssize len,  const gchar *to_codeset,  const gchar *from_codeset,  const gchar *fallback,  gsize *bytes_read,  gsize *bytes_written,  GError **error);
234 		GError* err = null;
235 		
236 		auto p = g_convert_with_fallback(Str.toStringz(str), len, Str.toStringz(toCodeset), Str.toStringz(fromCodeset), Str.toStringz(fallback), &bytesRead, &bytesWritten, &err);
237 		
238 		if (err !is null)
239 		{
240 			throw new GException( new ErrorG(err) );
241 		}
242 		
243 		return Str.toString(p);
244 	}
245 	
246 	/**
247 	 * Converts a string which is in the encoding used for strings by
248 	 * the C runtime (usually the same as that used by the operating
249 	 * system) in the current locale into a
250 	 * UTF-8 string.
251 	 * Params:
252 	 * opsysstring = a string in the encoding of the current locale. On Windows
253 	 * this means the system codepage.
254 	 * len = the length of the string, or -1 if the string is
255 	 * nul-terminated[1].
256 	 * bytesRead = location to store the number of bytes in the
257 	 * input string that were successfully converted, or NULL.
258 	 * Even if the conversion was successful, this may be
259 	 * less than len if there were partial characters
260 	 * at the end of the input. If the error
261 	 * G_CONVERT_ERROR_ILLEGAL_SEQUENCE occurs, the value
262 	 * stored will the byte offset after the last valid
263 	 * input sequence.
264 	 * bytesWritten = the number of bytes stored in the output buffer (not
265 	 * including the terminating nul).
266 	 * Returns: The converted string, or NULL on an error.
267 	 * Throws: GException on failure.
268 	 */
269 	public static string localeToUtf8(string opsysstring, gssize len, out gsize bytesRead, out gsize bytesWritten)
270 	{
271 		// gchar * g_locale_to_utf8 (const gchar *opsysstring,  gssize len,  gsize *bytes_read,  gsize *bytes_written,  GError **error);
272 		GError* err = null;
273 		
274 		auto p = g_locale_to_utf8(Str.toStringz(opsysstring), len, &bytesRead, &bytesWritten, &err);
275 		
276 		if (err !is null)
277 		{
278 			throw new GException( new ErrorG(err) );
279 		}
280 		
281 		return Str.toString(p);
282 	}
283 	
284 	/**
285 	 * Determines the preferred character sets used for filenames.
286 	 * The first character set from the charsets is the filename encoding, the
287 	 * subsequent character sets are used when trying to generate a displayable
288 	 * representation of a filename, see g_filename_display_name().
289 	 * On Unix, the character sets are determined by consulting the
290 	 * environment variables G_FILENAME_ENCODING and
291 	 * G_BROKEN_FILENAMES. On Windows, the character set
292 	 * used in the GLib API is always UTF-8 and said environment variables
293 	 * have no effect.
294 	 * G_FILENAME_ENCODING may be set to a comma-separated list
295 	 * of character set names. The special token "@locale" is taken to
296 	 * mean the character set for the current
297 	 * locale. If G_FILENAME_ENCODING is not set, but
298 	 * G_BROKEN_FILENAMES is, the character set of the current
299 	 * locale is taken as the filename encoding. If neither environment variable
300 	 * is set, UTF-8 is taken as the filename encoding, but the character
301 	 * set of the current locale is also put in the list of encodings.
302 	 * The returned charsets belong to GLib and must not be freed.
303 	 * Note that on Unix, regardless of the locale character set or
304 	 * G_FILENAME_ENCODING value, the actual file names present
305 	 * on a system might be in any random encoding or just gibberish.
306 	 * Since 2.6
307 	 * Params:
308 	 * charsets = return location for the NULL-terminated list of encoding names
309 	 * Returns: TRUE if the filename encoding is UTF-8.
310 	 */
311 	public static int getFilenameCharsets(out string[] charsets)
312 	{
313 		// gboolean g_get_filename_charsets (G_CONST_RETURN gchar ***charsets);
314 		char** outcharsets = null;
315 		
316 		auto p = g_get_filename_charsets(&outcharsets);
317 		
318 		charsets = Str.toStringArray(outcharsets);
319 		return p;
320 	}
321 	
322 	/**
323 	 * Converts a filename into a valid UTF-8 string. The conversion is
324 	 * not necessarily reversible, so you should keep the original around
325 	 * and use the return value of this function only for display purposes.
326 	 * Unlike g_filename_to_utf8(), the result is guaranteed to be non-NULL
327 	 * even if the filename actually isn't in the GLib file name encoding.
328 	 * If GLib can not make sense of the encoding of filename, as a last resort it
329 	 * replaces unknown characters with U+FFFD, the Unicode replacement character.
330 	 * You can search the result for the UTF-8 encoding of this character (which is
331 	 * "\357\277\275" in octal notation) to find out if filename was in an invalid
332 	 * encoding.
333 	 * If you know the whole pathname of the file you should use
334 	 * g_filename_display_basename(), since that allows location-based
335 	 * translation of filenames.
336 	 * Since 2.6
337 	 * Params:
338 	 * filename = a pathname hopefully in the GLib file name encoding
339 	 * Returns: a newly allocated string containing a rendition of the filename in valid UTF-8
340 	 */
341 	public static string filenameDisplayName(string filename)
342 	{
343 		// gchar * g_filename_display_name (const gchar *filename);
344 		return Str.toString(g_filename_display_name(Str.toStringz(filename)));
345 	}
346 	
347 	/**
348 	 * Returns the display basename for the particular filename, guaranteed
349 	 * to be valid UTF-8. The display name might not be identical to the filename,
350 	 * for instance there might be problems converting it to UTF-8, and some files
351 	 * can be translated in the display.
352 	 * If GLib can not make sense of the encoding of filename, as a last resort it
353 	 * replaces unknown characters with U+FFFD, the Unicode replacement character.
354 	 * You can search the result for the UTF-8 encoding of this character (which is
355 	 * "\357\277\275" in octal notation) to find out if filename was in an invalid
356 	 * encoding.
357 	 * You must pass the whole absolute pathname to this functions so that
358 	 * translation of well known locations can be done.
359 	 * This function is preferred over g_filename_display_name() if you know the
360 	 * whole path, as it allows translation.
361 	 * Since 2.6
362 	 * Params:
363 	 * filename = an absolute pathname in the GLib file name encoding
364 	 * Returns: a newly allocated string containing a rendition of the basename of the filename in valid UTF-8
365 	 */
366 	public static string filenameDisplayBasename(string filename)
367 	{
368 		// gchar * g_filename_display_basename (const gchar *filename);
369 		return Str.toString(g_filename_display_basename(Str.toStringz(filename)));
370 	}
371 	
372 	/**
373 	 * Converts a string from UTF-8 to the encoding used for strings by
374 	 * the C runtime (usually the same as that used by the operating
375 	 * system) in the current locale. On
376 	 * Windows this means the system codepage.
377 	 * Params:
378 	 * utf8string = a UTF-8 encoded string
379 	 * len = the length of the string, or -1 if the string is
380 	 * nul-terminated[1].
381 	 * bytesRead = location to store the number of bytes in the
382 	 * input string that were successfully converted, or NULL.
383 	 * Even if the conversion was successful, this may be
384 	 * less than len if there were partial characters
385 	 * at the end of the input. If the error
386 	 * G_CONVERT_ERROR_ILLEGAL_SEQUENCE occurs, the value
387 	 * stored will the byte offset after the last valid
388 	 * input sequence.
389 	 * bytesWritten = the number of bytes stored in the output buffer (not
390 	 * including the terminating nul).
391 	 * Returns: The converted string, or NULL on an error.
392 	 * Throws: GException on failure.
393 	 */
394 	public static string localeFromUtf8(string utf8string, gssize len, out gsize bytesRead, out gsize bytesWritten)
395 	{
396 		// gchar * g_locale_from_utf8 (const gchar *utf8string,  gssize len,  gsize *bytes_read,  gsize *bytes_written,  GError **error);
397 		GError* err = null;
398 		
399 		auto p = g_locale_from_utf8(Str.toStringz(utf8string), len, &bytesRead, &bytesWritten, &err);
400 		
401 		if (err !is null)
402 		{
403 			throw new GException( new ErrorG(err) );
404 		}
405 		
406 		return Str.toString(p);
407 	}
408 	
409 	/**
410 	 * Obtains the character set for the current
411 	 * locale; you might use this character set as an argument to
412 	 * g_convert(), to convert from the current locale's encoding to some
413 	 * other encoding. (Frequently g_locale_to_utf8() and g_locale_from_utf8()
414 	 * are nice shortcuts, though.)
415 	 * On Windows the character set returned by this function is the
416 	 * so-called system default ANSI code-page. That is the character set
417 	 * used by the "narrow" versions of C library and Win32 functions that
418 	 * handle file names. It might be different from the character set
419 	 * used by the C library's current locale.
420 	 * The return value is TRUE if the locale's encoding is UTF-8, in that
421 	 * case you can perhaps avoid calling g_convert().
422 	 * The string returned in charset is not allocated, and should not be
423 	 * freed.
424 	 * Params:
425 	 * charset = return location for character set name
426 	 * Returns: TRUE if the returned charset is UTF-8 [1] Note that some encodings may allow nul bytes to occur inside strings. In that case, using -1 for the len parameter is unsafe. [2] Despite the fact that byes_read can return information about partial characters, the g_convert_... functions are not generally suitable for streaming. If the underlying converter being used maintains internal state, then this won't be preserved across successive calls to g_convert(), g_convert_with_iconv() or g_convert_with_fallback(). (An example of this is the GNU C converter for CP1255 which does not emit a base character until it knows that the next character is not a mark that could combine with the base character.)
427 	 */
428 	public static int getCharset(out string charset)
429 	{
430 		// gboolean g_get_charset (G_CONST_RETURN char **charset);
431 		char* outcharset = null;
432 		
433 		auto p = g_get_charset(&outcharset);
434 		
435 		charset = Str.toString(outcharset);
436 		return p;
437 	}
438 }