There's an interesting post at Language Log on speech-to-speech translation by computer. The technology appears to be at the edge of practical applications.
Speech-to-speech translation is more difficult than text-to-text translation because of the speech requirements at both ends of the process. Identifying speech sounds is a substantial problem, one that doesn't exist in text-to-text translation, which is generally taken to mean electronic text on the input side. On the output side, generating natural-sounding speech is more difficult than generating a string of ASCII characters, which is trivial. Pronunciation details for phonemes vary contextually and, above and beyond the segmental phonemes, you have the speech contours and stresses.
Note that translation does not imply understanding. That's a different kettle of fish.