Source Insight can load and save files using different character encodings. By default, Source Insight uses UTF-8 (Unicode) encoding.
Characters in files are stored using a character encoding scheme. Examples are:
ASCII
Shift-JIS is a character encoding for the Japanese language
Chinese Big5 is a character encoding for traditional Chinese characters
Unicode encodings
Non-Unicode encodings use code pages, which are tables containing character sets designed for encoding a particular set of glyphs.
Unicode was developed as a unifying encoding that could encode all characters. However, Unicode itself can be encoded in several flavors, such as UTF-8, UTF-16, and UTF-32.
Unfortunately, the code page encoding used in a file is not saved with the file. Therefore, it is possible to open a file assuming the wrong encoding. If you open a file with the wrong encoding, you will probably observe some characters are incorrect or look garbled. You can use the File > Reload As Encoding command to reload the file with the correct encoding.
Another problem with code pages is that characters are not guaranteed to map between code pages, which could lead to files being corrupted. It is best to use a Unicode encoding to avoid those problems.
Programmers are encouraged now to use UTF-8 encoding. Most web browsers and programming tools support UTF-8, and UTF-8 supports a super-set of other regional encodings.
Once you load a file with a specific encoding, that encoding type is associated with the file buffer as long as Source Insight keeps the file open. The encoding type is also stored in the session workspace so that it is preserved between Source Insight sessions. However, if you close a file, and then open it again, the default encoding will be used.
Source Insight uses UTF-8 (Unicode) encoding by default when you first install it. However, if you typically work with files that use a different encoding, such as Japanese Shift-JIS, or Chinese (Big5), then you should change the default encoding.
To change the default encoding, select Options > Preferences: File, and change the setting for Default encoding. See: File Options.
The default encoding setting from the File Preferences is used to interpret files when you open them. It is also used when you create a new file buffer (using File > New) and save the buffer the first time.
If you open a file that has an explicit UTF signature, known as a Byte-Order-Mark (BOM), then the UTF format is used instead of the default. You can override this by using the File > Reload As Encoding command.
Source Insight performs slightly faster if you set the default encoding to "UTF-8" or "UTF-8 with BOM".
If you opened a file and the encoding looks incorrect, then you probably need to reload it with a specific encoding. This can happen if the default encoding is set to something other than the encoding used by the file. Unfortunately, the encoding scheme used in a file is usually not saved with the file. Therefore, it is possible to open a file assuming the wrong encoding.
To reload a file that is already open, and use a different encoding, select File > Reload As Encoding. Select the encoding and click the Load button. If you edited the file before reloading it, Source Insight will ask you to save the file first, because reloading the file with a new encoding will wipe out your changes, and it is not undo-able. See: Reload As Encoding.
If you want to reload a file that you made changes to, and you want to keep your changes, do NOT save the file back over itself, because that will overwrite the file and could corrupt it. Instead, save it to a new file by selecting File > Save As Encoding, and pick the UTF-8 format. After you reload the original file, you can compare the differences between the files and merge your changes back.
Once you load a file with a specific encoding, that encoding type is associated with the file buffer as long as Source Insight keeps the file open. The encoding type is also stored in the session workspace so that it is preserved between Source Insight sessions. However, if you close a file, and then open it again, the default encoding will be used.
If a file is not currently open, and you want to open that file with a specific encoding, select File > Open As Encoding. Select the encoding and click the Open button to select the file. See: Open As Encoding.
This is used to open a file that is not already open. If the file is already open, you should use File > Reload As Encoding.
To save a file with a different encoding, select File > Save As Encoding. Select the desired encoding, then click the Save button. See: Save As Encoding.
Saving files using Unicode (such as UTF-8) is the best practice to avoid corruption due to incorrect decoding or encoding going forward. For example, if you load a file encoded with one code page, and save it using a different code page, that could corrupt your file because some of the characters won’t map to characters in the new code page. However, if you save using a Unicode encoding, the characters will be mapped correctly. UTF-8 is the preferred encoding in Source Insight because there is less conversion required to open and save files.
If you opened a file and it looks like text is garbled and it loaded with the wrong encoding, do NOT save the file again using Save As Encoding. That will overwrite the file and potentially change it. Instead, select File > Reload As Encoding. See: Reload As Encoding.