Walt Mankowski

All glory to the hypnotoad

Finding Unicode Characters in LaTeX and BibTeX

| Comments

Yesterday I discovered what I thought was an odd bug in BibTeX. For one of the journal articles in my bibliography, I had the BibTeX entry

journal = {Human–Computer Interaction},

but it was appearing in my bibliography as HumanComputer Interaction.

The error turned out to be that the innocent-looking hyphen between “Human” and “Computer” was actually a Unicode en-dash. I didn’t intentionally insert it, but I guess I must have copied that bit of text from a Unicode-enabled website or email. LaTeX and BibTeX are happiest plain ASCII characters, and once I changed that character, it looked fine.

But that got me wondering if I had anymore Unicode characters in my dissertation project. They’re nearly impossible to find by hand, so I wrote this little perl one-liner to find them for me:

perl -ne "print if /[^[:ascii:]]/" *.bib *.tex

I discovered 3 more bad dashes, and also a smartquote tossed in as well.