Omniscien Technologies

Omniscien Technologies
Type	Privately held company
ISIN	🆔
Industry	Localization, eCommerce, Online Research and Publishing, Online Travel, Media Enterprise and Government
Founded 📆
Founder 👔	Gregory Binger, Dion Wiggins, Bob Hayward
Headquarters 🏙️	Singapore
Number of locations	Singapore, Thailand, The Netherlands
Area served 🗺️
Key people	Andrew Rufener (CEO), Gregory Binger (COO), Dion Wiggins (CTO), Philipp Koehn (Chief Scientist)
Products 📟	Language Studio™ Language Processing, Machine Translation and Machine Learning Platform
Services	Automated translation, custom machine translation engines, language processing and machine learning
Members
Number of employees
🌐 Website	http://www.omniscien.com, http://www.languagestudio.com
📇 Address
📞 telephone
	;

Omniscien Technologies (formerly Asia Online) is a privately owned, multinational company delivering services and software for language processing, machine translation and machine learning. The company, led by CEO Andrew Rufener, was founded in 2007 by Prof. Dr. Philipp Koehn, a leading scientist in the field, Gregory Binger, a technologist and IT/IP lawyer, and former Gartner senior analysts Bob Hayward and Dion Wiggins.^[1] Omniscien Technologies is headquartered in Singapore, and has offices in Zoetermeer, the Netherlands (European and North American Sales as well as Technical Operations) and in Bangkok, Thailand (Asian Sales and R&D).

The company provides a range of solutions for the localization industry as well as Government, eCommerce, Online Research and Publishing, Online Travel, Media and large Enterprise customers based on statistical machine translation (SMT) and hybrid neural machine translation (NMT) technology. Omniscien Technologies currently supports in excess of 550 global language pairs in 13 industry domains.

The company's statistically and neural based translation software employ recent advances in automated translation as well as extensive data manufacturing technologies. Until the early 1990s, almost all production-level machine translation technology relied on collections of linguistic rules to analyze the source sentence, and then map the syntactic and semantic structure into the target language. Its current approach uses statistical and/or neural techniques from cryptography, applying machine learning algorithms that automatically acquire statistical models from existing parallel collections of human translations, in the same way as Google Translate and the systems made using Philipp Koehn's own open source Moses tool for SMT.

Differences from other approaches

Google, Microsoft, Baidu, KantanMT, SDL, Systran and others have also employed SMT and more recently NMT systems, some publicly accessible. However, the approaches are substantially different depending on the desired outcome. While the cloud players mainly provide "gist" translation and a few other providers largely aim to perform the same within the confines of an Enterprise, the SDL, KantanMT and Omniscien Technologies systems concentrate on providing a customized solution. In essence, the approach used by this system is not different from any other Moses system with minimal data management differences. The specific differences in Omniscien Technologies approaches are:

Clean data: Omniscien Technologies focuses on clean data in contrast to the traditional approach that leverages content found on the web in corporate sites, news articles and other similar sources where the same content is available in multiple languages, but does not guarantee high quality data. To ensure that data is as clean and as accurate as possible, Omniscien Technologies has put effort into machine and human resources in this area. The company's data is sourced from high-quality translations provided by book publishers and translation companies, and is aligned at the segment level (usually sentences) and converted into a consistent format in order to be processed by the learning software. This step includes extracting segments from files and documents if they are not in a TMX format. Then the extracted sequences are aligned—and processed by machines, with humans used to validate the accuracy. The data is converted to a base UTF-8 encoding for training the SMT system, small subsets are extracted to guide training, and finally the data is reviewed, cleaned, and analyzed.
Multiple domains: the system allows for training in many domains, by extending a base set of information with multiple additional learning sources, including tuning for specific writing style
Real-time feedback loops and unknown term resolution
Scalability and Control, scaling up to billions of words per day and allowing extensive control in the workflow

Languages

The company currently has more than 550 language pairs available in a baseline form and is progressively deploying 13 domains across each language pair. In addition, Omniscien Technologies offers more than 160 Industry Engines that can be used "off the shelf". Language coverage includes all major European languages, Middle Eastern and Asian languages as well as a range of African languages.

References

↑ https://www.omniscien.com

External links

This article "Omniscien Technologies" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Omniscien Technologies. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.

[:0-1] ttps://www.omniscien.com

[1]

Omniscien Technologies

Contents

Differences from other approaches

Languages

Further reading

References

External links

📰 Article(s) of the same category(ies)[edit]