Warnings about Sentences in the Tatoeba Corpus

  • There are some members of the Tatoeba Project who create sentences in a language other than their own that aren't quite right.
    • There are sentences that are grammatically wrong.
    • There are sentences that don't sound natural even though they are grammatically correct.
    • There are incorrect translations.
  • However, there are also many members who contribute sentences in their own native languages. If you carefully look at usernames, you will eventually learn who the native speakers are and whose sentences you can trust.

What is the Tatoeba Project?

  • Tatoeba.org is a large database of example sentences translated into many languages by its members who volunteer their time.
  • If you do decide to join the project and help, I would encourage you to only translate from a foreign language you know into your native language or strongest language. This will help prevent further errors, both the definitely wrong sentences and the slightly-strange-sounding sentences.
Warnings about Sentences from the Original Tanaka Corpus

  • About 300,000 sentences in the Tatoeba Corpus were imported from the Tanaka Corpus.
  • Most of the sentences were typed in by students as part of a project, but were not proofread by the teacher.
  • There are many sentences that don't sound natural in English.
    • Many sentences sound old-fashioned or archaic.
    • Some sentences sound as if they are computer translations.
  • There are many sentences that don't sound natural in Japanese.
    • Many Japanese sentences are obviously detailed translations of the English and are not the most natural way the ideas would be expressed in Japanese.
