Replace Microsoft chars in JavaScript

Okay, so have you ever had a user call you up and say “the website is putting garbage into my notes” – and it turns out they are copying and pasting paragraphs from a Word document into the textarea on your site. Well, Microsoft Office apps use a special character set (Windows 1252) so they can have nice looking quotes, and longer dashes. Which all look very nice in the Word document, but when they are pasted into a textarea in your site, they are converted to weird characters.

There is a simple function you can use to replace them:

 * Replace Word characters with Ascii equivalent
function replaceWordChars (text) {
    var s = text;
    // smart single quotes and apostrophe
    s = s.replace(/[\u2018|\u2019|\u201A]/g, "\'");
    // smart double quotes
    s = s.replace(/[\u201C|\u201D|\u201E]/g, "\"");
    // ellipsis
    s = s.replace(/\u2026/g, "...");
    // dashes
    s = s.replace(/[\u2013|\u2014]/g, "-");
    // circumflex
    s = s.replace(/\u02C6/g, "^");
    // open angle bracket
    s = s.replace(/\u2039/g, "");
    // spaces
    s = s.replace(/[\u02DC|\u00A0]/g, " ");

    return s;

Stick the above on an onblur event or something, and the problem has gone away. Until the next version of Office anyway…

Thanks to for this lovely code.