Today I encountered a problem with XMLRPC call.
An application calls external XMLRPC service via Ruby's xmlrpc/client library. Sometimes server returns error message such as Invalid XMLRPC request "not well-formed (invalid token)"
. XML document includes various data.
I think that XML document includes any control character. I tried to dump data to text file. I read data by less(1) and found `ESC' and <U+200B>.
(ESC(I00;) First made<U+200B> <U+200B>.Lovely meal.
ESC is escape sequence. I opened data by Emacs as US-ASCII encoding, finally I found raw escape sequence, `^['.
<U+200B> is Unicode character that means ZERO WIDTH SPACE. I saw \342\200\212
in Emacs US-ASCII buffer. I'm not good for Unicode, But I can
In ruby world, this code represents ZERO WIDTH SPACE character.
https://gist.github.com/1098835
RUBY_VERSION #=> zero_width_space = "\xE2\x80\x8B" #=> zero_width_space #=> zero_width_space.length #=> white_space = "" #=> white_space #=> white_space.length #=>
Result is here. In Ruby 1.8.7, ZERO WIDTH SPACE is 3 bytes length string. In Ruby 1.9.2, ZERO WIDTH SPACE is 1 byte string.
RUBY_VERSION # => "1.8.7" zero_width_space = "\xE2\x80\x8B" # => "\342\200\213" zero_width_space # => "\342\200\213" zero_width_space.length # => 3 white_space = "" # => "" white_space # => "" white_space.length # => 0 RUBY_VERSION # => "1.9.2" zero_width_space = "\xE2\x80\x8B" # => "​" zero_width_space # => "​" zero_width_space.length # => 1 white_space = "" # => "" white_space # => "" white_space.length # => 0
I chopped these characters by String#sub! and String#gsub!.
string.sub!(/\e/,'') string.gsub!(/#{zero_width_space}/,'')
It worked well.
id:dayflower gives easy to understand description for ZERO WIDTH SPACE.