Support forum of the software localization tool Sisulizer


Localization Tool for VB, Delphi, .NET, C#, VB.NET, XML, Online Help, HTML ... Home

Get in contact with the makers of Sisulizer.
Our forum is open for all questions around Sisulizer from customers and prospects.
Don't hesitate to register and ask. The Sisulizer team will answer ASAP.

Search     Help Home Sisulizer Website Download
Search by username
Not logged in - Login | Register 

 Moderated by: Renate.Reinartz, Markus.Kreisel, Jaakko.Salmenius, Ilkka.Salmenius
New Topic Reply Printer Friendly
Text File localisation and encodings - Usage - Three simple steps to localize - Technical Support (You need to be registered at the forum to write) - Localization Tool for VB, Delphi, .NET, C#, VB.NET, XML, Online Help, HTML ...
AuthorPost
 Posted: Wed Oct 8th, 2008 11:15 pm
PM Quote Reply
bikemike
Member
 

Joined: Tue Nov 20th, 2007
Location: New Zealand
Posts: 105
Status: 
Offline
I have just encountered a problem with the customised NSIS installer strings used to deploy my application.  In development all went well and installing in non-original language, such as Russian, displayed the translated strings in Russian text.

Now, testing and installing on other PCs I have a problem where the strings are shown as a series of repeated characters something like this: πïSπïSπïS

The original text, for example, is this:
     is not installed

I think, on development PC without specific Russian support, if I open the file created locally I see this:
     ? ?????????????
If Russian support is available when the file is created, then I see this:
     íå èíñòàëëèðîâàí
If Russian is available at the time of creation and viewing I see this:
     не инсталлирован

I just looked at the the Sisulizer project and see that the encoding for the Russian file is Windows Cyrillic (1251).
I also noticed that the C# program I use to post-process the file (regionalise the header line for each localised file - ;!insertmacro LANGFILE "Neutral" "Neutral") is not specifying any encoding.

inFileStream = new FileStream(inPathFileExt, FileMode.Open, FileAccess.Read);
reader = new StreamReader(inFileStream);
                       
outFileStream = new FileStream(outPathFileExt, FileMode.Create, FileAccess.Write);
writer = new StreamWriter(outFileStream);


What should I be doing to ensure I always see the Russian text when installing in Russian?  Do I change the encoding on the Sisulizer project, and or specify an encoding for the file handling in my C# app?






Back To Top PM Quote Reply

 Posted: Fri Oct 10th, 2008 03:05 am
PM Quote Reply
bikemike
Member
 

Joined: Tue Nov 20th, 2007
Location: New Zealand
Posts: 105
Status: 
Offline
OK, some views but no replies.... perhaps I didn't phrase a good question.

What about looking at it this way; what is required for the Russian install to work?
In particular, it installed with readable Russian on my PC in the past, but now it does not so I have changed something...
  • Looking at the Text Source settings - Sisulizer seems to be using Windows Cyrillic (1251) encoding for the Russian output.
  • My C# application then reads and rewrites the file with a new header line and uses the default StreamWriter which specifies no encoding and so is UTF-8 by default.
    • "Initializes a new instance of the StreamWriter class for the specified stream, using UTF-8 encoding and the default buffer size"
  • The regional setup of the Build PC includes ANSI 1251
  • The regional setup of the Install includes ANSI 1251
  • The Russian text supplied for the NSIS strings shows in Russian.  The properties of these files as shown by TextPad gives the code set as ANSI
  • it's just my own supplied localisations that do not...TextPad says these are code set UTF-8. This text is a repeating series of characters that looks like 043F 0457 0053 repeated in 1251.   пїЅпїЅпїЅпїЅ
So, should I change Sisulizer to write UTF-8 and keep the C# using UTF-8 and should it work?

Should I keep the ANSI encoding in Sisulizer and have my C# set the ANSI encoding for each of the installer custom strings files it edits so as to keep the ANSI encoding? - and will this work?

Is there something else that needs to be done in Sisulizer, NSIS, Build PC or Install PC in order for either of these to work.

btw - I also have to consider Chinese, Korean, and other languages - mostly European.

many thanks
I've read this but I'm still not sure where the problem lies in my case
http://www.joelonsoftware.com/articles/Unicode.html



Back To Top PM Quote Reply

 Posted: Fri Oct 10th, 2008 09:58 am
PM Quote Reply
Markus.Kreisel
Administrator


Joined: Sat Apr 8th, 2006
Location: Bedburg, Germany
Posts: 832
Status: 
Offline
Hi Mike,

sorry for the late answer. There are many things that can go wrong with code page conversions.

The problem simply is, that a 8-Bit Cyrillic file uses the same 256 possible values an English or German files uses. The same char value of e.g. 192 can represent two different chars. If you load a made for Cyrillic code page you have to have Cyrillic code pages on the target system.  

Sisulizer always uses UNICODE internally. If it writes out a 8-Bit file it has to decide what code page to use. You can see which is default if you use Project - Edit Source -> Encodings.

If .NET reads the 8-Bit file it does not know what code page the file was stored. 8-Bit text files do not have headers where this information is stored. It does not know if it should default to a German umlaut or a Russian char. It simply assumes on German systems it must be a umlaut and on Russian system that it is Cyrillic.

This is the reason we see often Mojibake (http://en.wikipedia.org/wiki/Mojibake) :-(

If you use Windows conversion routines you might often see this ?????. The reason for this is easy. If you have UNICODE containing German umlauts and you use a conversion into 8-bit Cyrillic the converter simply does not know what to do. A German ö is not a part of Cyrillic. In this case it writes ? for every char it can not convert correctly. The downside. Once it writes a ? the original information is lost and can not be converted back anymore.

I'm glad to hear that you can use UTF-8 for your files. Yes, this is the solution for your problem. Convert your source file into UTF-8 or even UTF-16. You can use Windows Notepad to do that. Simply change the file type in the Save As dialog of Notepad.
Notepad will write a Byte Order Mark. If you read the article you linked to, you now know what it is.

Now Sisulizer will default also to UTF-8 or UTF-16 for your file. If you use Scan for Changes on an existing file you might have to change it manually in the settings I mentioned above.
If you use UTF-16 .NET does not need to convert at all. UTF-8 on the other hand produces smaller files. Both can handle UNICODE and should be the end of your mojibake problems.

Markus



____________________
http://www.sisulizer.com - Three simple steps to localize
Back To Top PM Quote Reply

 Posted: Thu Oct 23rd, 2008 04:09 am
PM Quote Reply
bikemike
Member
 

Joined: Tue Nov 20th, 2007
Location: New Zealand
Posts: 105
Status: 
Offline
....
some time later
...

Would that the installer did handle UTF8.  It did not.  I since discovered that it is all ANSI and there is a side project for Unicode.  This was the missing link.

So, I reverted to ANSI and simply changed my intermediate process to both read and write in specified code pages.  Thus far it appears to work now.

Thanks for your time.

Back To Top PM Quote Reply

Current time is 12:10 pm  
Localization Tool for VB, Delphi, .NET, C#, VB.NET, XML, Online Help, HTML ... > Technical Support (You need to be registered at the forum to write) > Usage - Three simple steps to localize > Text File localisation and encodings



WowUltra 1.11 Copyright © 2007 by Jim Hale - Based on WowBB Copyright © 2003-2006 Aycan Gulez

Sisulizer software localization tool - Three simple steps to localize