Research Tools

Here's what I found useful. Hopefully they will help you too. Research is better with friends!

Document Organization

Once you collect more than about 4 documents organizing them all can be a prodigious pain. I use the Mendeley reference management software. It is free. It monitors my download folder, imports citation information, and renames files when I change the metadata. It has its pros and cons; I have heard good things about alternatives such as Zotero and EndNote.

Entering Pīnyīn

The accent marks or "diacritics" shown over Chinese words written in Pīnyīn can be entered in three ways.

  1. Google Translate will generate pinyin for any characters you have it translate. You can copy and paste the text where you need it.
  2. Mac computers allow you to enter letters with diacritics by holding the letter's key. Holding down a vowel makes a menu pop up where you can choose several diacritics. It's pretty fast.
  3. Windows computers need extra work to do this. You'll need to install a custom keyboard. Rob Rohan produced one you can download from here. If his installer doesn't work, you can create your own installer in 5 minutes by doing the following.
    1. Download Microsoft's Keyboard Layout Creator (MKLC) here
    2. Download Rob Rohan's source file here.
    3. Install the MKLC to your C drive. Installing to a different drive can cause errors.
    4. Run the MKLC and have it open the source file (File > Load Source File).
    5. Recreate the installer package. (Project > Build DLL and Setup Package).
    6. If a windows pops up saying it has completed but there are warnings, hit cancel. The warning is just the fact that you are using nonstandard characters.
    7. When a window pops up to say it has completed and would you like to go to the directory, hit "okay".
    8. Double click on the "setup" file in that directory. After a few seconds, it will be installed!
    9. Set the newly installed keyboard to be your default. For help on that, see Microsoft's help page.

Automatic Character Recognition

Picture this: you have pages and pages of text you want to add to your document but it is too much to type by hand. What can you do? Use OCR of course!

OCR stands for Optical Character Recognition. It is a type of software that scans image and pdf files for characters and creates a text document with what it found. There are many options out there. I am happy to say that the best free one currently out there is by the company a9t9. It is incredibly fast, free, and supports Chinese and English recognition. The only downside is it can't do both at the same time - you have to run your document twice. But that only takes a second!

Any OCR software will have different levels of success with different documents. There will be errors so it's important to check your new document thoroughly. Even with that requirement, though, you'll end up saving a huge amount of time.  

Language Switching / Translation

It's important to remember that translations are an approximation. Every source is going to give a slightly different answer depending on how you word the question, who wrote the reference, and how they chose what to include. Mandarin is a huge language that has changed over time. While a fluent speaker can recognize tens of thousands of words constructed out of, oh, 3,000 characters, there have been tens of thousands of characters used across history - I've seen estimates that put it as high as 90,000+ characters. You wouldn't encounter most of those in your day-to-day, but If you are digging into ancient records, there is no telling what you may find. If one reference doesn't seem to have what you are looking for, turn to another.

Web Resources For quick lookup of an unknown character, especially when I don't know if it's in simplified or traditional and provides the other version if there is one. It also offers definitions, but I've found those to omit more specialized meanings.

Google Translate: quick sanity check on pinyin of a character and provides tone marks for easy copying and pasting.

The website has a whole host of tools designed to make interacting with Chinese characters easier. The dictionary lookup tools are exceptional.


Google Translate app. INVALUABLE. This has fantastic optical character recognition, meaning from a picture or a live scan I can digitize written or printed Chinese, or get the pinyin so that I can type it in myself. It's even able to read some handwritten and stylized characters.


A large two-way dictionary. I use the hilariously mis-named "Pocket" Oxford Chinese Dictionary (1100 pages aren't about to fit in anybody's pocket) ISBN 978-0-19-800594-0. It takes care of most of my lookup needs, especially thanks to having traditional characters called out next to its simplified entries. It doesn't cover the same words sets in both directions, no english entry for Phoenix, for example, but it has the Chinese character translated to phoenix, 风/鳳.

A large, one way Chinese -> English dictionary. The thought here is the focus on translating in only one direction allows for greater depth, breadth, and related words. I use "A Chinese-English Dictionary (Revised Edition)" ISBN 978-7-5600-1325-1, from the Foreign Language Teaching and Research Press, I don't know anything about them, it just happened to be a dictionary I picked up somewhere and it's met my needs. the 1700+ ppages give it a feel of authority.

A character lookup system. For when your optical character recognition isn't working, and you don't know how the character is pronounced. The one I use is "Chinese Characters, A Genealogy and  Dictionary" from Time consuming to use, sometimes, but is an excellent tool when you've encountered something entirely new. ISBN: 978-0-9660750-0-7. This book contains 360 pages of character trees covering 4,000 characters. The printing is small.

Separating Chinese Characters from Roman text

When collecting lists of songs I found it necessary to separate the Chinese characters of the title from the translation. Most of my sources provided both in the same line, even if they presented the information in a table. I could copy and paste every title into two pieces, but boy does that take a while. Spreadsheet functions came to my rescue.

Microsoft Excel or Google Sheets can both do what you need. I made a column in a spreadsheet where each cell had the Chinese characters for the name, a space, and then the translated name. I put this function into the cell next to it: 

=LEFT(B2,FIND(" ",B2))

Read right to left, this function goes to cell B2, searches for the first space, counts the number of characters before that space (to the left of it), and then copies those characters into a new cell. Voila! A similar function netted me the translated names:


This one counts the number of Chinese characters copied by the previous equation into cell C2, subtracts that number from the total number of characters in the combined Chinese-English name, and copies that many characters into a new cell starting from the right. Since the Chinese characters are always on the left,  I end up with only the English name on the right. This is much faster than copying every line in two pieces. Follow that up with Copy -> Past Special -> Values Only and you have what you need, clean as can be.

Romanization Conversions

Romanization is the representation of Chinese words in the Latin alphabet. When digging through research from across the years and regions it is common to see the same Chinese character represented with different combinations of latin letters. To help you negotiate those differences here is a table of romanization equivalences from UNESCO.

A bit about the systems: The most prevalent system today was created in China in the 1950s and is called Pīnyīn (拼音) (Formal name 汉语拼音方案, Hànyǔ pīnyīn fāng'àn). Before that Foreigner missionaries created various system with different levels of use from the 1500s onward. In the last 200 years major systems have included the Wade and later Wade-Giles system of 1859/1892, the 1902 French-created  École française d'Extrême-Orient or EFEO, the French-created Postal Romanization of 1906, the Lessing-Othmer German system of 1912,  the Yale System of 1943. These are the big ones you might find in your search; they'll certainly help explain why the various sources I quote have four different ways to spell "zheng". There were other systems created for various other purposes but not as widely used; you can see more on the Wikipedia page.

The UNESCO table contains Pinyin, Wade Giles, EFEO, and Yale, the four most common you'll see in English-language sources. It's missing those used for other languages and the earlier missionary systems. There are other partial comparisons and coverage of the French, German, and various other systems on this French article on Biblioweb. Perhaps an enterprising individual would be up for making a full table of equivalences? I'd recommend keying off of IPA tables and programmatically generating the table. Get in touch if you want to commiserate!

Other tools

I found use of various other utilities that seem worth recommending. These aren't specific to Chinese or musical instruments, but I found them valuable nonetheless.

  • Pdf Compressor:
    Useful for when I turned a series of images into a pdf file. Shrank one file from almost 1GB to just 50MB.
  • Website Backup: creates a complete copy of my website on my computer. Having a local copy of all my work gives me great peace of mind. If running on a mac, requires Macports
  • Image conversion software: JPG image files perform better on web pages than PNG files, often because they can be smaller. There are many options out there; I happen to use 
  • Image compression software: Helps keep images reasonable. I use; there are others out there.
  • Image editor such as GIMP.