XLIFF stands for XML Localization Interchange File Format. An OASIS open standard for the exchange of Localization and Translation content. XLIFF is considered the standard native file format for almost all Translation CAT Tools as well as for translators.
The concept behind the XLIFF format.
The idea here is to standardize the format of the content that is going to be translated. Almost all types of content may be subject to translation. Based on this, the CAT tool and translators must be aware of all types of files which is not possible! The original content can be created natively in any format like HTML, XML, InDesign, XML, Properties, and many other formats. To save time and cost we need to have only one format that can be recognized and manipulated by all in the localization industry and from here came the XLIFF.
The Structure of the XLIFF file
An XLIFF file has ONE standard structure regardless of the native file type. It always has the following pattern.
In an XLIFF file, each segment of the content is saved within a Translation-Unit. Each Translation-Unit has 2 sub-elements SOURCE and TARGET. Before translation, both elements will have the same text. The translator will read the content in the source element and overwrite the content in the target element with the equivalent translation.
Converting Content to XLIFF format
It is well-known that each CMS has its own specifications in handling content. To speed up the process of developing add-ons, that deal with different Content Management Systems, you can develop a standalone module that includes all the actions and properties of creating XLIFF files regardless of the CMS characteristics. So you can reuse it each time you are creating a connector for a CMS. Like this, you don’t need to reinvent the wheel each time.
How to prepare XLIFF files?
Now we converted the content into XLIFF. How to prepare it for translation? In most cases, the engineering preparation will be minimum. But again, for some Web or Software Content, there is more to do. Check the below piece of content in an XLIFF file:
All the content in red is not for translation and need to be excluded. The CAT tool is not able to detect and protect it. The Localization Engineer will have to do additional steps. Before the evolution of the CAT tools, we were developing and using custom scripts using C# to detect such content and protect it, then convert the file to the bilingual format which was TTX format for Trados. Now using the new versions of the CAT Tools like Trados Studio, I am using the Regular Expressions and XPATH in order to create rules to detect and protect such types of non-relevant content.
Why should we do additional steps during the preparation?
Ignoring and not protecting the non-relevant text for translation will cause the following:
- The Translation Memory will not be accurate.
- The total word count will be higher without reason due to this additional amount of non-translatable content.
- If any of this content (code) was changed by accident during the translation, the structure of the web page or the mobile App will be corrupted and the application or the web page will not work properly.
To conclude, XLIFF is the standard format for translation. It is recognized worldwide by all CAT tools and translators. Content must be converted from its original format to XLIFF before translation and the process depends here on the medium placeholder: Web, App, etc…
Nowadays, CAT Tools facilitates the process of localization engineering. In some cases, additional effort need to be done to ensure the accuracy of the final localized product.
Feel free to leave a comment or to Contact Me for an open discussion!