Summary: Adds support for the "encoding" field to the new transactional interface.
Test Plan:
{F44189}
{F44190}
Some of the encodings in the second screen are from testing, and can no longer be set.
Reviewers: btrahan, chad
Reviewed By: btrahan
CC: aran
Differential Revision: https://secure.phabricator.com/D6035
81 lines
3.2 KiB
Plaintext
81 lines
3.2 KiB
Plaintext
@title User Guide: UTF-8 and Character Encoding
|
|
@group userguide
|
|
|
|
How Phabricator handles character encodings.
|
|
|
|
= Overview =
|
|
|
|
Phabricator stores all internal text data as UTF-8, processes all text data
|
|
as UTF-8, outputs in UTF-8, and expects all inputs to be UTF-8. Principally,
|
|
this means that you should write your source code in UTF-8. In most cases this
|
|
does not require you to change anything, because ASCII text is a subset of
|
|
UTF-8.
|
|
|
|
If you have a repository with source files that do not have UTF-8, you have two
|
|
options:
|
|
|
|
- Convert all files in the repository to ASCII or UTF-8 (see "Detecting and
|
|
Repairing Files" below). This is recommended, especially if the encoding
|
|
problems are accidental.
|
|
- Configure Phabricator to convert files into UTF-8 from whatever encoding
|
|
your repository is in when it needs to (see "Support for Alternate
|
|
Encodings" below). This is not completely supported, and repositories with
|
|
files that have multiple encodings are not supported.
|
|
|
|
= Detecting and Repairing Files =
|
|
|
|
It is recommended that you write source files only in ASCII text, but
|
|
Phabricator fully supports UTF-8 source files.
|
|
|
|
If you have a project which isn't valid UTF-8 because a few files have random
|
|
binary nonsense in them, there is a script in libphutil which can help you
|
|
identify and fix them:
|
|
|
|
project/ $ libphutil/scripts/utils/utf8.php
|
|
|
|
Generally, run this script on all source files with "-t" to find files with bad
|
|
byte ranges, and then run it without "-t" on each file to identify where there
|
|
are problems. For example:
|
|
|
|
project/ $ find . -type f -name '*.c' -print0 | xargs -0 -n256 ./utf8 -t
|
|
./hello_world.c
|
|
|
|
If this script exits without output, you're in good shape and all the files that
|
|
were identified are valid UTF-8. If it found some problems, you need to repair
|
|
them. You can identify the specific problems by omitting the "-t" flag:
|
|
|
|
project/ $ ./utf8.php hello_world.c
|
|
FAIL hello_world.c
|
|
|
|
3 main()
|
|
4 {
|
|
5 printf ("Hello World<0xE9><0xD6>!\n");
|
|
6 }
|
|
7
|
|
|
|
This shows the offending bytes on line 5 (in the actual console display, they'll
|
|
be highlighted). Often a codebase will mostly be valid UTF-8 but have a few
|
|
scattered files that have other things in them, like curly quotes which someone
|
|
copy-pasted from Word into a comment. In these cases, you can just manually
|
|
identify and fix the problems pretty easily.
|
|
|
|
If you have a prohibitively large number of UTF-8 issues in your source code,
|
|
Phabricator doesn't include any default tools to help you process them in a
|
|
systematic way. You could hack up ##utf8.php## as a starting point, or use other
|
|
tools to batch-process your source files.
|
|
|
|
= Support for Alternate Encodings =
|
|
|
|
Phabricator has some support for encodings other than UTF-8.
|
|
|
|
NOTE: Alternate encodings are not completely supported, and a few features will
|
|
not work correctly. Codebases with files that have multiple different encodings
|
|
(for example, some files in ISO-8859-1 and some files in Shift-JIS) are not
|
|
supported at all.
|
|
|
|
To use an alternate encoding, edit the repository in Diffusion and specify the
|
|
encoding to use.
|
|
|
|
Optionally, you can use the `--encoding` flag when running `arc`, or set
|
|
`encoding` in your `.arcconfig`.
|