Modern Machine Translation Struggling with Traditional Problems

The progress of machine translation development has been very significant during the last decade. The quality of translation in several language pairs is considered successful but there are still problems in certain pairs which are about limited sources involved in the MT big data and the different rules that cause more complex processing of a language to the target.

Different languages have different rules and specifications. The morphological complexity is one instance that makes a certain language become a problem not only for MT but also for other language tools. Highly inflected language belongs to this category. High variation of morphemes can cause confusion because of scattered data with high variation of the corpus. Moreover, this kind of language is often considered a low-resource language that was another obvious problem for the MT.

In several language pairs that include inflectional languages, the MT often faces the problem to obtain information about the target language. While the word level and sentence level could provide a huge difference in usage and form, the absence of equivalent translation of several particles and features has also added more challenges for MT.

Not only the inflectional languages, but the agglutinative languages which have richer morphological variations also offer the same problems. Morphemes in agglutinative languages can give lots of variations that will also create many different grammatical meanings. These kinds of sources can confuse the MT to encode and decode the input and output. The results are often not reliable and even the most recent neural MT is still struggling to deal with this problem.

With several problems that remain unsolved, the quality of MT in several language pairs needs an evaluation. However, we have seen a great development of MT in recent years that it is now very reliable for several language pairs. For the other pairs that contain more complexity, it is likely that the neural MT has a great chance to solve or narrow the difficulties. The larger corpus data is very important to train the active learning MT, and with the current situation, it seems very feasible for developing such a thing.

The content on this website including news, data, quotes, and other information is provided by third parties. Every attempt has been made to give appropriate credit to the sources as listed. If the subject or the author claims further credit or removal, please do not hesitate to contact our webmaster at dedy@transcore.co.id

PT Gobal Media Transcoporindo is not liable to you for any content, image, or information that is not correct, and/or violating any copyright law, and/or credited to third parties.