I Thought He Came With You is Robert Ellison’s blog about software, marketing, politics, photography, time lapse and the occasional well deserved rant. Follow along with a monthly email, RSS or on Facebook. About 7,250,102,767 people have not visited yet so it might be your first time here. Suggested reading: Got It, or roll the dice.

Space and multibyte character encoding for posting to Twitter using OAuth

I've spent the last day learning how to use OAuth and XAuth to post to Twitter. There are rumblings that Twitter will start to phase out basic authentication later this year, and more importantly you can only get the nice “via...” attribution if you use OAuth (for new apps, old ones are grandfathered in).

I coded up my own OAuth implementation, referring to Twitter Development: The OAuth Specification on Wrox and the OAuthBase.cs class from the oauth project on Google Code. Both are great references, but both fail with multibyte characters. The problem is that each byte needs to be separately escaped. OAuthBase.cs encodes characters as ints rather than breaking out the bytes and the Wrox article incorrectly suggests using Uri.EscapeDataString(). 

Here's a method to correctly encode parameters for OAuth:

public static string OAuthUrlEncode(string s)
{
    if (string.IsNullOrEmpty(s))
    {
        return string.Empty;
    }
    else
    {
        StringBuilder sb = new StringBuilder(s.Length);

        for (int i = 0; i < s.Length; i++)
        {
            if (NoEncodeChars.IndexOf(s[i]) == -1)
            {
                // character needs encoding
                byte[] characterBytes = Encoding.UTF8.GetBytes(s[i].ToString());
                for (int b = 0; b < characterBytes.Length; b++)
                {
                    sb.AppendFormat(CultureInfo.InvariantCulture,
                    "%{0:X2}",
                    characterBytes[b]);
                }
            }
            else
            {
                // character is allowed
                sb.Append(s[i]);
            }
        }

        return sb.ToString();
    }
}

NoEncode chars is a list of the permitted characters:

private const string NoEncodeChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";

An impact of this encoding is that spaces must be encoded as %20 rather than plus. I was worried that each space would end up counting as three characters towards the 140 character limit. I tested this and it isn't true, so use HttpUtility.UrlEncode() to calculate the number of characters in the post OAuthUrlEncode() or similar to actually encode post parameter.

Add Comment

All comments are moderated to weed out spam. Email address is optional and is only used to display your Gravatar.