Okay, so have you ever had a user call you up and say “the website is putting garbage into my notes” – and it turns out they are copying and pasting paragraphs from a Word document into the textarea on your site. Well, Microsoft Office apps use a special character set (Windows 1252) so they can have nice looking quotes, and longer dashes. Which all look very nice in the Word document, but when they are pasted into a textarea in your site, they are converted to weird characters.
There is a simple function you can use to replace them:
/** * Replace Word characters with Ascii equivalent **/ function replaceWordChars (text) { var s = text; // smart single quotes and apostrophe s = s.replace(/[\u2018|\u2019|\u201A]/g, "\'"); // smart double quotes s = s.replace(/[\u201C|\u201D|\u201E]/g, "\""); // ellipsis s = s.replace(/\u2026/g, "..."); // dashes s = s.replace(/[\u2013|\u2014]/g, "-"); // circumflex s = s.replace(/\u02C6/g, "^"); // open angle bracket s = s.replace(/\u2039/g, ""); // spaces s = s.replace(/[\u02DC|\u00A0]/g, " "); return s; }
Stick the above on an onblur event or something, and the problem has gone away. Until the next version of Office anyway…
Thanks to http://www.andornot.com/blog/post/Replace-MS-Word-special-characters-in-javascript-and-C.aspx for this lovely code.