Cover
Mulai sekarang gratis HIAT final test.pdf
Summary
# Localization is essential for meeting cultural expectations and maintaining brand identity
Localization is a crucial process in the translation industry, particularly for multinational companies developing products for global markets. It involves adapting products and documentation to the specific language and culture of target markets. This adaptation is essential not only for meeting the cultural expectations of users but also for maintaining a consistent and recognizable brand identity across different regions [21](#page=21).
### 1.1 The evolution and challenges of the language industry
The increasing volume of documentation accompanying technical equipment and the economic development of companies have driven the need for efficient translation tools. This includes handling similar and updated documents, as well as those with diverse communicative functions. Furthermore, the rise of electronic content that needs reproduction in various formats has added complexity to the translation landscape [8](#page=8).
The industry faces several challenges, including:
* **Simultaneous product launches:** Multinational companies aim for "simship," meaning products must appear on all local markets simultaneously [21](#page=21).
* **Faster time-to-market:** The pace of product introduction is accelerating, requiring quicker translation and localization processes [21](#page=21).
* **Internationalization (I18N):** Products must be designed to avoid the need for re-design for each local market [21](#page=21).
* **Localization (L10N):** Products and documentation require adaptation to the language and culture of target markets [21](#page=21).
* **Technological advancements:** The development and specialization in computer software, including translation and localization software, are constant. This leads to a proliferation of tools like Translation Memory (TM), Alignment, Terminology Management, Terminology Extraction, Software Localization, Project Management, and Machine Translation. The industry also sees an increase in plug-ins, interfaces, features, options, and variants, alongside a high frequency and speed of updates. Compatibility issues and the necessity for continuous upgrading of both software and user skills are significant concerns [22](#page=22).
* **Electronic file formats:** The proliferation of diverse electronic file formats (Office, DTP, Markup, Software) and their continuous development presents ongoing challenges. Modifications in formats with new software versions require file preparation and post-processing, creating new areas of activity for translators. Translators need to continuously update their technical know-how, adapt workflows, and modify translation strategies due to new tools [23](#page=23).
### 1.2 History and development of translation memory systems
The origins of translation memory (TM) systems can be traced back to the early 1960s [9](#page=9).
#### 1.2.1 Early steps in TM development
* **1960-1965: Federal Armed Forces Translation Agency, Mannheim, Germany:** This initiative focused on a text-related glossary approach. Translators underlined English words for which they needed German equivalents. These words were then processed after morphological reduction and fed into a computer. Words found in the database were printed as text-related glossaries in the order they appeared in the text [9](#page=9).
* **1960-1965: European Coal and Steel Community, Luxembourg:** This project explored automatic dictionary look-up with context. Underlining words prompted the system to keypunch the entire sentence and feed it into the computer. The computer then searched for sentences with the most lexical item matches to the input sentences, returning the desired items with their context. Data from each query was added to the database [11](#page=11).
* **1980-1990: Interactive Translation System (ITS), Alan Melby, Brigham Young University, USA:** This system proposed a three-level approach to Computer-Aided Translation (CAT) [13](#page=13).
* Level 1 included an editor, terminology management, and telecommunication [13](#page=13).
* Level 2 supported source text in electronic form, text analysis (dynamic concordance), automatic terminology look-up, and synchronized bilingual text files created from completed translations [13](#page=13).
* Level 3 aimed for integration with machine translation systems [13](#page=13).
#### 1.2.2 First commercial systems
Several companies emerged in the mid-1980s and early 1990s, developing and commercializing TM software:
* **1984: TRADOS (TRAnslation & DOcumentation Software), Germany:** Established by Jochen Hummel and Iko zu Knyphausen, TRADOS transitioned from a translation provider to a software developer. Key product launches included TED (Translation Editor including TM) in 1988, MultiTerm (terminology management) in 1990, and Translator's Workbench (TM, editor, and MultiTerm) in 1992 for DOS. The company later moved to the Windows platform in 1993 and was acquired by SDL International in 2005 [14](#page=14).
* **1984: STAR (Software Translation Artwork Recording), Switzerland:** STAR offered translation and documentation services alongside software development. Their first TM system, Transit (DOS), was launched in 1991, including TermStar terminology management software. They transitioned to Windows with Transit/TermStar 2.0 in 1994 [15](#page=15).
* **1993: ATRIL, Spain:** ATRIL launched its first TM tool for Windows 3.1 in 1993, featuring an interface for MS Word for Windows. By 1996, the system was redesigned as 32-bit Windows software, creating an integrated translation environment with a proprietary two-column editor and terminology management module, dropping the Word interface entirely [16](#page=16).
* **1992: IBM Germany:** IBM launched Translation Manager/2 (TM/2) in 1992 for the OS/2 operating system. A Windows version followed. This system was notable for including linguistic resources for 19 languages, such as lemmatizers, morphological data, and inflection rules [17](#page=17).
#### 1.2.3 Current market situation
The market for CAT tools includes various players such as Across, DéjàVu, MemoQ, MultiTrans, SDL Trados, SDLX, and Wordfast [18](#page=18).
### 1.3 Translation workflow with CAT tools
The introduction of Translation Memory (TM) systems significantly impacts the translation workflow. When a new translation project begins using a TM system, the TM is initially empty [19](#page=19).
#### 1.3.1 Segmentation and retrieval
* The source language text is imported or opened in the editor and segmented into "translation units" based on predefined segmentation rules, typically punctuation, with user-defined exceptions for elements like abbreviations [19](#page=19).
* The active segment is automatically looked up in the TM [20](#page=20).
* If an identical or "similar" segment is found, the associated translation is displayed and can be selected or modified by the translator for insertion into the target text [20](#page=20).
* If no matching segment is found, the translator enters a new translation. This new translation is then stored in the TM alongside its source segment, becoming immediately available for future identical or similar source segments [20](#page=20).
* This process results in the TM being populated incrementally as the translation progresses [20](#page=20).
#### 1.3.2 Post-translation steps
Following the initial translation, the workflow typically includes:
* Updating of resources (e.g., TM, terminology databases) [52](#page=52).
* Revision of the translated text [52](#page=52).
* Generation of the final translated document [52](#page=52).
* A final review [52](#page=52).
### 1.4 Key components of Computer-Aided Translation (CAT) tools
CAT tools are defined as a series of computer applications designed to efficiently assist translators. Their primary objective is to provide translators with rapid access to necessary resources. The use of CAT tools leads to increased productivity, as they streamline various tasks. Proficiency in CAT tools is a sought-after competency by agencies and institutions. The European Commission highlights the importance of mastering CAT tools and terminology management software, along with common office applications [32](#page=32) [35](#page=35).
It's crucial to distinguish between Machine Translation (MT) and Computer-Aided Translation (CAT). MT is performed by a machine (e.g., Google Translate), whereas CAT involves a human translator using various assistance tools. MT can be integrated into CAT systems [36](#page=36).
Essential CAT tools include [37-41, 43, 46, 47](#page=37, #page=38, #page=41, #page=43, #page=46, #page=47):
1. **Project Management Software:** Controls information flow, assigns tasks, manages quality control, analyzes content, generates reports (e.g., full/fuzzy matches, repetitions), counts words, and handles final delivery [37](#page=37).
2. **Translation Memory (TM) Software:** Stores previous translations, ensures terminological and phraseological consistency, and facilitates the retrieval of translation units for productivity. Typical file extensions for TM include `.tmx` (Translation Memory eXchange - standard, open format), `.sdltm` (proprietary format of SDL Trados Studio), and `.txt`/.`csv` (plain text or spreadsheet formats) [38](#page=38).
3. **Terminology Management Software:** Enables the creation and management of glossaries from ongoing translations. Popular examples include RWS Trados Multiterm, Wordfast, and MemSource. Standard formats are `.tbx` (TermBase eXchange - open standard) and `.sdltb` (proprietary format of SDL Trados Studio) [41](#page=41).
4. **Alignment Software:** Creates TM from an original text and its translation by identifying segment correspondences. This function is useful for building a TM from existing documents (e.g., Word files) that were previously translated without CAT tools [43](#page=43) [46](#page=46).
5. **Localization Tools:** Specifically designed for translating software, video games, or websites [47](#page=47).
### 1.5 Understanding Translation Memory Files
A Translation Memory (TM) file is fundamentally a structured text file. While file extensions like `*.tmx` or `*.xliff` might suggest specialized software is needed, these files can be opened and understood using standard text editors. Most TM files are not "black boxes" but rather functional, structured text files, typically in XML (Extensible Markup Language) format [66](#page=66).
#### 1.5.1 Information stored in TM files
The primary information stored includes:
* Segments (source and target) [67](#page=67).
* Language information [67](#page=67).
* Creation dates and times [67](#page=67).
Additional data that may be stored:
* Author [67](#page=67).
* Usage count [67](#page=67).
* Change dates and times [67](#page=67).
* Creation tool [67](#page=67).
* Domain (field of expertise) [67](#page=67).
* Alternate translations [67](#page=67).
* Notes [67](#page=67).
#### 1.5.2 Typical TM file formats
The two most prevalent industry-standard file types are XLIFF and TMX, both based on XML. Spreadsheet formats like Excel (`.xls`) or comma-separated value (`.csv`) can also be used, though they store less data per translation unit [68](#page=68).
XML is favored for TM files due to several advantages:
* **Parsability:** XML's well-defined structure makes it easy to parse [68](#page=68).
* **Semantic tagging:** Tags like `` or `` provide meaning to the data [68](#page=68).
* **Tool support:** Numerous software tools are built to validate, import, parse, and search XML files [68](#page=68).
* **Interoperability:** A well-defined structure allows different applications and systems to exchange data effectively [68](#page=68).
##### 1.5.2.1 Header and Body structure
TMX and XLIFF files typically consist of a header and a body (#page=69, #page=72) [69](#page=69) [72](#page=72).
* **Header:** Contains metadata about the file and the localization process. The semantic naming of XML tags makes headers human-readable. Examples of TMX and XLIFF headers are provided (#page=70, #page=71) [69](#page=69) [70](#page=70) [71](#page=71).
* **Body:** Contains the most critical data: translation units and segments. Examples of TMX and XLIFF bodies illustrate this structure (#page=73, #page=74) [72](#page=72) [73](#page=73) [74](#page=74).
#### 1.5.3 Importance of TM files
TM files are vital for translators using CAT tools for several reasons [75](#page=75):
* **Efficiency:** Loading a TM file allows translators to leverage prior work. If a segment has been translated before, the tool alerts them to the match (or partial match), enabling faster translation [75](#page=75).
* **Consistency:** TM files help maintain consistency across projects and clients. Using "client-based" or "project-based" TMs ensures accuracy and adherence to specific terminology or phrases [75](#page=75).
### 1.6 Differences between TMX and XLIFF formats
Both TMX and XLIFF are industry-standard, XML-based file types with significant commonalities, including inline markup elements. However, they possess distinct structures and elements due to their slightly different origins and purposes [76](#page=76).
Key differences include:
* **Purpose:** XLIFF was developed to store extracted text and facilitate data transfer across localization process steps, while TMX was designed specifically for exchanging TM data between tools [76](#page=76).
* **Language support:** TMX supports any number of languages within a single document, whereas XLIFF is designed for one source and one target language [76](#page=76).
* **Inline codes:** TMX primarily uses encapsulation methods for inline codes, while XLIFF offers both encapsulation and a placeholder method (where native codes are removed and replaced by references) [76](#page=76).
* **Order and Rebuilding:** A collection of `` elements in TMX has no specified order and lacks a mechanism for rebuilding the original file. XLIFF, however, is more powerful for reconstructing or rebuilding the original file [76](#page=76) [78](#page=78).
* **Additional Data:** XLIFF includes data types and fields not present in TMX, such as pretranslation, history, versioning, and binary objects [76](#page=76).
* **Time/Date Data:** TMX files can store time and date data at the translation unit level, a capability XLIFF lacks [76](#page=76).
#### 1.6.1 Choosing between TMX and XLIFF
Both TMX and XLIFF are robust and widely supported formats. The choice often depends on the specific project, the tools being used, or the format provided by a client. Regardless of the format, utilizing translation memory is vastly superior to not using it. Many tools allow exporting TM data in either format [77](#page=77).
For new projects, some authors prefer TMX for two main reasons:
* **Time-stamped translation units:** Allow for productivity analysis [78](#page=78).
* **Multiple target languages:** Can be stored in a single file [78](#page=78).
Conversely, XLIFF is a more powerful choice if reconstructing or rebuilding the original file is a priority [78](#page=78).
---
Localization goes beyond simple translation by adapting content to the specific cultural nuances, expectations, and local norms of a target audience. This comprehensive adaptation is crucial for businesses aiming for international success, ensuring a product or message feels authentically created for the local market, regardless of location, culture, or language. It encompasses not just linguistic adaptation but also adjustments to visual elements, formatting, and adherence to local laws and customs .
### 1.1 The difference between translation and localization
Translation focuses on converting content from a source language to a target language, adhering to grammar and syntax rules, and preserving the original meaning. It is a fundamental component but represents only one step within the broader localization process .
Localization, in contrast, is a more extensive process that tailors a message to local audiences, considering distinct regional variations even within the same language. This involves adapting marketing strategies and ensuring a customized message for each local audience to build trust. Cultural barriers can hinder understanding, making localization essential for effective communication beyond mere linguistic conversion .
### 1.2 Meeting cultural expectations through localization
Localization is vital for meeting the cultural expectations of local markets, allowing businesses to globalize effectively while maintaining a consistent brand identity worldwide. Companies like Coca-Cola demonstrate this by adapting campaigns to local markets, ensuring brand recognition through consistent elements like company colors, while adjusting marketing strategies to meet specific regional expectations .
In China, for example, Coca-Cola adapted its product name to "kekou kele" (delicious happiness) and developed a local marketing strategy, involving local experts to respect the distinct culture. This approach involves more than just translating content; it's about integrating with the local culture through a cultural approach, making the local public feel the content was built specifically for them .
### 1.3 Elements requiring localization beyond text
A successful localization process requires attention to numerous details beyond textual translation to bridge cultural barriers and enhance user experience. These include :
* **Colors:** Meanings of colors vary significantly across cultures. For instance, red might signify danger, white death, and orange mourning in different regions. Thorough research is necessary before targeting new audiences .
* **Layout:** Different languages have varying text expansion rates. A flexible layout is needed to accommodate text of different lengths that results from translation. English to other languages can cause text to expand by 30% to 100% .
* **Visuals:** Images and photos must be adapted to local cultures. A Western depiction of a mother and child might not resonate with or could even offend audiences in other regions .
* **Units of Measurement:** Most countries use the metric system. Conversion of measurement units is necessary for clarity .
* **Contracts and Agreements:** Compliance with local regulations is essential when conducting business internationally to avoid legal issues, penalties, or website bans .
* **Currency units:** Currency amounts and their symbols need localization. For example, converting from dollars to pounds sterling and showing equivalent amounts requires currency conversion .
* **Paper size:** Document designs may need adjustment if they are formatted for a different paper standard, such as A4 versus American letter size, which can affect formatting and page breaks .
* **Date formats:** Differences in date formats (e.g., MM/DD/YY vs. DD/MM/YY) are crucial and can lead to misinterpretation .
* **Text length:** Localization must account for text expansion, necessitating flexible text length in products or documents .
### 1.4 Broader localization considerations
Beyond the immediate content, localization requires consideration of various conventions and legal requirements specific to the target locale .
* **Economic conventions:** This includes variations in paper sizes, preferred storage media, broadcast TV systems, phone number formats, delivery services, postal codes, postal address formats, currency symbols, measurement systems, and electrical standards .
* **Third-party providers:** Variations in payment service providers, weather reports, and online map presentations need to be addressed .
* **Time zones:** Translators must carefully consider differences in time zones .
* **Legal requirements:** Products may need customization or complete changes to comply with specific country regulations, such as privacy laws, disclaimers, consumer labeling, encryption and export restrictions, subpoena procedures, Internet censorship, tax collections (customs duties, VAT, sales tax), and accessibility requirements .
* **Political issues:** Sensitivity to political matters, including disputed borders and geographical naming disputes, is important .
* **Government numbers:** Consideration for numbers assigned by governments, such as national identification numbers and Social Security Numbers, is necessary .
* **Cultural appropriateness:** Local holidays, title conventions, personal name conventions, aesthetics, colors, images, local architecture, socioeconomic status, clothing, and ethnicity must be taken into account .
* **Local customs and taboos:** Care must be given to local customs, superstitions, religions, and social taboos .
### 1.5 Specific localization examples
Localization is applied across various media to ensure cultural relevance and user engagement.
* **Video game localization:** This process aims to make video games fully understandable to consumers. It involves an audit of materials, the localization process itself (which can take weeks or months), programming translated texts into the game, rigorous quality control to check for errors and system issues, and finally, manufacturer's approval to ensure the localized content meets original requirements .
* **Movie localization:** Given the high cost of reshooting entire movies in different languages, localization offers a more efficient and cost-effective solution for global distribution. The two primary methods are dubbing, where voice actors replace original dialogue and timing is critical to match character movements and speech, and subtitling, where translated spoken lines are displayed at the bottom of the screen with limited characters and display times, requiring precision and synchronization with dialogue and actions .
> **Tip:** Localization is not merely a translation task; it is a strategic adaptation that requires a deep understanding of the target culture, market, and legal landscape.
> **Tip:** Effective localization builds trust and rapport with local audiences, which is essential for a business's success in foreign markets.
---
Localization is critical for businesses operating in global markets to ensure they resonate with diverse audiences and maintain a consistent brand image. This process goes beyond simple translation, involving the adaptation of products, services, and content to meet the specific cultural, linguistic, and market needs of a target region. Failure to localize effectively can lead to costly mistakes, misunderstandings, and damage to brand reputation .
### 1.1 The scope and importance of localization
Localization encompasses a wide range of materials, including websites, mobile applications, and software, all of which need to be adapted to gain traction and user engagement in different markets. For global businesses, a localized brand or product website is paramount to appearing local and avoiding a "foreign" perception .
### 1.2 Costly localization blunders
Several high-profile examples highlight the financial and reputational risks of inadequate localization:
* **HSBC's rebranding campaign**: A mistranslation of the tagline "Assume Nothing" to "Do Nothing" in several countries cost the company USD 10 million USD to rectify .
* **Pepsi's logo redesign in China**: The slogan "Pepsi Brings You Back to Life" was translated to mean "Pepsi Brings Your Ancestors Back from the Grave," leading to a cultural backlash .
* **NASA's Mars Orbiter metric mix-up**: A failure to correctly convert between metric and imperial units resulted in the loss of the USD 125 million USD Mars Climate Orbiter mission .
* **Canadian Maple Leaf Coin error**: A mistranslation of "Souvenir du Canada" to "Souverain du Canada" led to the recall and replacement of approximately 30 million Canadian dollars worth of coins .
* **2012 London Olympics ticket website**: The Welsh version mistranslated "See Tickets" as "Gweld Tocynnau" instead of "Prynu Tocynnau," misdirecting users and causing financial losses .
* **Siri's gender-biased responses**: Apple's virtual assistant faced criticism for perpetuating gender stereotypes in various languages, implying certain job positions were exclusive to men in Chinese .
### 1.3 Internationalization and localization principles for websites
Internationalization (#i18n) is the foundation for effective localization. It involves designing and developing websites with the capacity for easy adaptation to different languages and cultures. Key principles include :
* **Unicode Standard**: Ensures compatibility with diverse writing systems and languages .
* **Separation of Content and Code**: Facilitates translation without extensive code modifications .
* **Flexible User Interface (UI)**: Accommodates varying text lengths and reading directions .
* **Locale-specific formats**: Adapting date, time, and number formats to regional conventions .
* **Culturally Neutral Images and Icons**: Using neutral visuals or providing alternatives for different regions .
The localization (#L10n) process for websites involves several stages:
* **Translation of Content**: Adapting text and multimedia, considering linguistic and cultural nuances .
* **Adaptation of Graphics and Multimedia**: Ensuring visuals are culturally appropriate .
* **Adjustment of Layout and Design**: Modifying the presentation to suit language-specific needs .
* **Integration of Local Regulations**: Complying with legal and regulatory requirements .
* **Testing and Quality Assurance**: Rigorous validation of functionality, accuracy, and cultural appropriateness .
### 1.4 Web localization versus other audiovisual products
Websites present unique localization challenges compared to static products like applications or games:
* **Dynamic Content**: Requires real-time updates, complicating the localization process .
* **SEO Considerations**: Effective localization of metadata, keywords, and tags is vital for search engine visibility .
* **Cultural Sensitivity**: As public-facing platforms, websites demand meticulous attention to cultural nuances .
* **Continuous Updates**: Frequent website updates necessitate ongoing localization efforts .
### 1.5 Translation for SEO: Enhancing international web presence
Professional translators play a crucial role in optimizing website content for international markets and enhancing global online visibility through Search Engine Optimization (SEO). Key considerations include :
* **Keyword Research**: Identifying relevant terms and phrases in the target language and region, including linguistic variations and colloquialisms .
* **Cultural Relevance**: Choosing keywords that resonate with the target audience and avoiding literal translations that may sound unnatural .
* **Localized Content**: Ensuring translated content is linguistically accurate and culturally appropriate, aligning with local customs and market trends .
* **Metadata Optimization**: Translating and optimizing meta titles, meta descriptions, and URL slugs to be compelling and incorporate relevant keywords .
* **Multilingual Link Building**: Collaborating to build high-quality, multilingual backlinks from reputable local websites .
* **Content Structure and Formatting**: Maintaining a user-friendly structure with headers and bullet points for readability and SEO value .
* **Mobile Optimization**: Ensuring translated content is mobile-friendly and optimizing media for fast loading times .
* **Regular Updates**: Staying informed about algorithm changes and updating content to reflect current trends .
* **Analytics and Reporting**: Monitoring website analytics to assess content performance and refine SEO strategies .
* **Communication with Clients**: Understanding business goals and collaborating on a strategy that aligns translation with marketing initiatives .
### 1.6 The significance of web page metadata and the role of professional translators
Metadata, including meta titles and descriptions, is pivotal for a website's visibility and search engine ranking. Professional translators are essential for optimizing this metadata for foreign markets by :
* **Improving Search Engine Visibility**: Ensuring content is indexed accurately by search engines for a global audience .
* **Enhancing User Click-Through Rates**: Crafting compelling and culturally relevant meta titles and descriptions that encourage clicks .
* **Ensuring Local Relevance**: Aligning metadata with local expectations and preferences to increase appeal .
* **Optimizing Keywords**: Incorporating region-specific terms for better search result placement .
* **Maintaining Global Brand Consistency**: Ensuring translated metadata aligns with the brand's tone and message .
* **Adhering to Character Limits**: Crafting concise translations that fit within search engine display limits .
* **Building Credibility and Trust**: Safeguarding content integrity to establish trust with international users .
* **Adapting to Market Trends**: Updating metadata to reflect linguistic and cultural shifts .
In essence, collaboration between professional translators and web developers/marketers is crucial for holistic web page optimization in foreign markets, bridging linguistic and cultural gaps to enhance global competitiveness .
### 1.7 Machine Translation (MT) and human roles: Pre-editing and Post-editing
The integration of Machine Translation (MT) has introduced new roles for human linguists, primarily pre-editing and post-editing .
#### 1.7.1 Pre-editing
Pre-editing involves revising technical documentation *before* it undergoes MT to improve the source text and enhance the quality of the raw MT output. An ideal pre-editor is a specialized human editor who can analyze text from an MT engine's perspective to anticipate potential errors. Pre-editing techniques include :
* Reducing sentence length.
* Avoiding complex or ambiguous syntactic structures.
* Ensuring term consistency.
* Using articles.
* Running automated revision tools like spell-checkers and grammar checkers.
* Tagging elements not to be translated.
These techniques are also valuable for human translation projects, promoting better downstream quality and productivity .
##### 1.7.1.1 Controlled Natural Language (CNL)
Controlled Natural Languages (CNLs) are subsets of natural languages with restricted grammar and vocabulary to reduce ambiguity and complexity. CNLs aim to improve readability for human readers and enable reliable automatic semantic analysis. Examples of simplified or technical languages include Caterpillar Technical English and Simplified Technical English. Writers are restricted by general rules such as keeping sentences short, avoiding pronouns, using only dictionary-approved words, and employing the active voice .
* **Examples of CNLs**: ASD Simplified Technical English, Attempto Controlled English, Aviation English, Basic English, E-Prime, and many others .
* **CNLs in Companies**: Avaya (ACE), Boeing (STE), Caterpillar (CTE, CFE), Dassault Aerospace, Ericsson, General Motors (CASL), IBM (Easy English), Kodak, Rolls-Royce, Xerox, and others utilize controlled languages .
##### 1.7.1.2 Controlled Language Rules (Example: CLOUT™)
Controlled language rules are designed to reduce ambiguity and are beneficial for machine translation. The CLOUT™ rule set, developed by Uwe Muegge, provides examples :
* **Rule 1**: Write sentences shorter than 25 words .
* **Rule 2**: Express only one idea per sentence .
* **Rule 3**: Use the same sentence structure for the same content .
* **Rule 4**: Write grammatically complete sentences .
* **Rule 5**: Use simple grammatical structures .
* **Rule 6**: Write in the active form .
* **Rule 7**: Repeat nouns instead of using pronouns .
* **Rule 8**: Use articles to identify nouns .
* **Rule 9**: Use words from a general dictionary .
* **Rule 10**: Use only words with correct spelling .
##### 1.7.1.3 When to consider pre-editing
Pre-editing ROI (Return On Investment) is most beneficial when a technical document is to be translated into more than three languages, especially when translating into dozens of languages. However, if the source quality is already high and the MT engine is well-tuned, light post-editing might suffice .
##### 1.7.1.4 Tools for pre-editing
Tools can facilitate source creation and pre-editing:
* **Source content memory**: Provides feedback to writers on content similarity .
* **Generic pre-editing plugins/rules**: Help reformulate source text before MT .
* **Simplified Technical English/Controlled Language tools**: Automate writing rules for localization .
* **Custom tools**: Identify spelling, grammar, and preferred terminology. Grammarly is cited as an example .
#### 1.7.2 Post-editing
Post-editing (or postediting) is the process of amending machine-generated translations to achieve an acceptable final product. It is distinct from editing human-generated text (revision). Post-editing aims to correct MT output to meet negotiated quality levels .
* **Light post-editing**: Aims for basic understandability, often for inbound purposes or urgent needs .
* **Full post-editing**: Aims for understandability and stylistic appropriateness, suitable for dissemination and outbound use. At its highest level, it strives for quality indistinguishable from human translation .
##### 1.7.2.1 Post-editing strategies
The required level of post-editing varies, with key considerations being time, quality, and cost .
##### 1.7.2.2 Post-editing guidelines
The effort in post-editing depends on the MT raw output quality and the expected end quality .
* **For quality similar to human translation/revision ("publishable quality")**: Full post-editing is usually recommended. This involves aiming for grammatically, syntactically, and semantically correct translation, ensuring correct terminology, no added/omitted information, and proper formatting .
* **For "good enough" or "fit for purpose" quality**: Light post-editing is recommended. This level means the text is comprehensible and accurate but may not be stylistically compelling, with potentially unusual syntax or minor grammatical imperfections. The focus is on semantic correctness, accuracy, and editing inappropriate content .
##### 1.7.2.3 Decision making in post-editing
Quick decision-making is key for successful post-editing. Linguists must promptly decide whether it's more efficient to post-edit MT suggestions or translate from scratch. Over-editing (making purely preferential or unnecessary amendments) and under-editing (leaving errors) should both be avoided. The principle is to use as much of the MT output as possible .
##### 1.7.2.4 Post-editing and the language industry
Post-editing is considered a "nascent profession". While it overlaps with translating and editing, its specific skills are still being defined. Many professional translators dislike post-editing due to lower pay rates compared to conventional translations. Efficiency gains are measured by tracking the time linguists spend correcting MT .
While precise figures are scarce, a significant percentage of language service providers offer post-editing services, though it often accounts for a small portion of their throughput. Advances in MT, partly driven by feedback from post-edited text, suggest that post-editing will become more widespread as MT quality improves .
---
# History of translation memory systems
This section details the evolution of translation memory systems from their early conceptual stages to the emergence of commercial products and the current market landscape.
### 2.1 Early steps
The foundational concepts for translation memory systems began to emerge in the mid-20th century, driven by the need to automate and streamline translation processes.
#### 2.1.1 European Coal and Steel Community (ECSC) experiments
Between 1960 and 1965, the European Coal and Steel Community in Luxembourg explored early forms of automated dictionary lookup with contextual information. This system involved translators underlining words for which they needed assistance. The entire sentence would then be keypunched and processed by a computer. The computer would search its database for sentences that most closely matched the input sentences based on their lexical items. The translator would then receive the requested terms along with their context. Crucially, the data generated from each query was added back to the database, creating a continuously growing repository of translation knowledge [11](#page=11).
#### 2.1.2 Interactive Translation System (ITS)
In the period between 1980 and 1990, Alan Melby at Brigham Young University in the USA developed the Interactive Translation System (ITS). This system represented a three-level approach to Computer-Aided Translation (CAT) [13](#page=13).
* **Level 1:** Focused on the editor, terminology management, and telecommunications aspects of translation [13](#page=13).
* **Level 2:** Involved having the source text in an electronic format, performing text analysis (using a dynamic concordance system), enabling automatic terminology lookup, and creating synchronized bilingual text files from completed translations [13](#page=13).
* **Level 3:** Aimed at integrating the translator's workstation with a machine translation system [13](#page=13).
### 2.2 First commercial systems
The development of commercially available translation memory systems began in the mid-1980s, with several key companies establishing themselves and launching influential products.
#### 2.2.1 TRADOS (TRAnslation & DOcumentation Software)
TRADOS was established in Stuttgart, Germany, in 1984 by Jochen Hummel and Iko zu Knyphausen. The company transitioned from being a translation service provider to a dedicated software developer [14](#page=14).
* **1988:** Launched TED (Translation Editor including first Translation Memory) [14](#page=14).
* **1990:** Introduced MultiTerm (DOS), a terminology management software [14](#page=14).
* **1992:** Released Translator's Workbench for DOS, which integrated Translation Memory, an editor, and MultiTerm for DOS [14](#page=14).
* **1993:** The company switched to the Windows platform (Win 3.1) [14](#page=14).
* **2005:** TRADOS was acquired by SDL International [14](#page=14).
#### 2.2.2 STAR (Software Translation Artwork Recording)
STAR was established in Stein am Rhein, Switzerland, in 1984. This company also offered translation and documentation services alongside software development [15](#page=15).
* **1991:** The first version of Transit (DOS) was launched, which included the TermStar terminology management software [15](#page=15).
* **1994:** A switch to Windows occurred with Transit/TermStar 2.0 for Windows 3.1 [15](#page=15).
#### 2.2.3 ATRIL
ATRIL, based in Madrid, Spain, was founded in 1993 [16](#page=16).
* **1993:** Launched its first translation memory tool for Windows 3.1, which featured an interface for Microsoft Word for Windows [16](#page=16).
* **1996:** The system underwent a significant redesign, becoming 32-bit Windows software. This new version formed an integrated translation environment, including a proprietary two-column editor and a terminology management module, and the Word interface was discontinued [16](#page=16).
#### 2.2.4 IBM Germany
IBM Germany in Böblingen released the Translation Manager/2 (TM/2) in 1992, operating under IBM's OS/2 operating system [17](#page=17).
* A Windows version (Windows 3.1) was launched subsequently [17](#page=17).
* TM/2 was notable for being the first system to include linguistic resources for 19 languages, such as lemmatizers, morphological data, and inflection rules [17](#page=17).
### 2.3 Market situation
By the time of this document's writing, several prominent translation memory systems were available in the market. These include [18](#page=18):
* Across
* DéjàVu
* MemoQ
* MultiTrans
* SDL Trados
* SDLX
* Wordfast
---
# Challenges in the translation industry
The translation industry faces significant challenges driven by globalization, rapid technological advancements, and evolving market demands, necessitating continuous adaptation from professionals and tools alike.
### 3.1 Globalization and market demands
Multinational companies are increasingly developing products for global markets, often aiming for simultaneous introduction across all local markets, a practice known as simship. This trend is compounded by a shrinking time-to-market, meaning product development cycles are becoming faster. To facilitate this, products must be designed in a way that avoids redesign for each local market, a process termed internationalization (I18N). Subsequently, these products and their accompanying documentation need to be adapted to the specific language and cultural nuances of the target markets, a crucial step known as localization (L10N) [21](#page=21).
### 3.2 Technological evolution in translation tools
The development and specialization of computer software have profoundly impacted the translation industry. This includes general office applications and, more specifically, translation and localization software. The landscape of these specialized tools is vast and continually expanding, encompassing technologies such as translation memory (TM), alignment tools, terminology management systems, terminology extraction software, software localization tools, project management solutions, and machine translation (MT) [22](#page=22).
The integration of these tools is further enhanced by plug-ins and interfaces, which can include filters and other utilities. The constant addition of new features, options, and variants to these software offerings, coupled with the frequency and speed of updates, presents a challenge. Users and software alike require continuous upgrading to maintain compatibility and functionality, a necessity that arises from potential compatibility problems between different versions or systems [22](#page=22).
> **Tip:** Staying abreast of the latest software developments and understanding their compatibility requirements is crucial for translators to maintain efficiency and avoid workflow disruptions.
### 3.3 File formats and evolving workflows
The proliferation of electronic file formats poses another significant challenge. The industry deals with a huge array of formats, including those from office suites, desktop publishing (DTP) software, markup languages, and software development environments. This ecosystem is dynamic, with continuous development of new file formats and modifications to existing ones in new software versions [23](#page=23).
As a result, file preparation and post-processing have emerged as new, critical areas of activity for translators. This necessitates a continuous updating of technical know-how and an adaptation of old workflows to accommodate new tools and processes. Consequently, translation strategies may need to be modified to leverage the capabilities of these new tools effectively [23](#page=23).
> **Example:** A translator might previously have worked with simple text files for a user manual. Now, with the advent of DTP software and complex markup languages, they must be proficient in preparing and post-processing files in formats like .indd or .xml, often requiring specialized software and techniques to ensure accurate translation and formatting.
---
# Metadata in computer-assisted translation
Metadata in computer-assisted translation (CAT) tools is crucial for managing, leveraging, and ensuring the quality of translation projects [85](#page=85).
### 4.1 Definition and purpose of metadata
Metadata is defined as "data that describes data, providing additional information about digital content and processes". In the context of CAT tools, it offers machine-understandable information about translation resources and the translation process itself. The primary purpose of metadata in CAT is to facilitate the efficient management and reuse of translation data, thereby increasing productivity and consistency. It allows for the filtering of previous translations to reuse more recent or trustworthy material [85](#page=85) [86](#page=86) [87](#page=87).
### 4.2 Types of metadata
There are three main types of metadata relevant to CAT tools:
* **Descriptive metadata:** Describes the content itself [85](#page=85).
* **Structural metadata:** Describes the organization of objects or components within the translation project [85](#page=85).
* **Administrative metadata:** Describes technical information, such as file types, creation dates, and usage statistics [85](#page=85).
### 4.3 Metadata within translation memory files
Translation Memory (TM) files, which are central to CAT tools, are essentially structured text files, often in XML format, that store translation and linguistic data. These files contain various types of information, including [66](#page=66):
* **Main information:**
* Segments (source and target) [67](#page=67).
* Language [67](#page=67).
* Creation dates and times [67](#page=67).
* **Additional data (metadata):**
* Author [67](#page=67).
* Usage count [67](#page=67).
* Change dates and times [67](#page=67).
* Creation tool [67](#page=67).
* Domain (field) [67](#page=67).
* Alternate translations [67](#page=67).
* Notes [67](#page=67).
The header of TM files typically contains metadata about the file and the localization process, making the files human-readable due to the semantic naming of XML tags. The body of the file contains the crucial translation units and segments [69](#page=69) [72](#page=72).
> **Tip:** Even though TM files might have specialized extensions like \*.tmx or \*.xliff, they can often be opened and inspected with standard text editors, revealing their structured nature [66](#page=66).
### 4.4 Metadata in CAT tool functions
Metadata plays a role in various functions within CAT tools:
#### 4.4.1 Analysis and Parsing
* **Textual parsing:** Recognizes punctuation (e.g., differentiating between a full stop at the end of a sentence and in an abbreviation) and handles markup, which is a form of pre-editing. This is crucial for distinguishing translatable elements from untranslatable ones like proper names or codes [62](#page=62).
* **Linguistic parsing:** Reduces words to their base form for term retrieval from term banks and uses syntactic parsing to extract multi-word terms or phraseology, helping to normalize word order variations [62](#page=62).
#### 4.4.2 Segmentation
Segmentation divides source texts into smaller, manageable units (segments) for translation. Metadata associated with segments, such as translator, date, and time, allows for tracing and managing these units effectively. This enables translators to leverage more recent material or avoid segments with outdated terminology. Incorrect manual segmentation can lead to errors being repeated in future translations if not corrected [63](#page=63) [84](#page=84).
#### 4.4.3 Alignment
Alignment establishes correspondences between source and target segments. This process is fundamental to creating translation memories from existing parallel texts [63](#page=63).
#### 4.4.4 Retrieval and Matching
* **Exact match (100% match):** Occurs when a source segment in the current document is identical character-by-character to a segment already stored in the TM [64](#page=64).
* **In-Context Exact (ICE) match or Guaranteed Match:** An exact match occurring in the same context, such as the same location within a paragraph, considering surrounding sentences and file attributes [64](#page=64).
* **Fuzzy match:** Occurs when a segment is not an exact match but shares a degree of similarity (e.g., 0% to less than 100%) with a TM entry. The percentage scoring is system-dependent and not universally comparable [64](#page=64).
* **Concordance:** Allows searching for segment pairs based on specific words or phrases within a source segment, useful for finding translations of terms and idioms when a dedicated terminology database is not available [64](#page=64).
Metadata helps filter previous translations, ensuring that more recent or trustworthy material is reused [87](#page=87).
#### 4.4.5 Updating and Management
TMs are updated with new translations once accepted by the translator. Metadata allows for modifications, deletions, or saving multiple translations for the same source segment. On-surface, CAT tools use metadata to trace segments back to translators, dates, and times, enabling effective TM resource management by language service providers. However, potential loss of important metadata during format transfers can lead to software interoperability issues and lock users into specific tools [64](#page=64) [84](#page=84).
#### 4.4.6 Networking
Networking features in CAT tools allow groups of translators to collaborate, sharing translated segments and TM data, which can accelerate the translation process and facilitate error correction among team members [65](#page=65).
### 4.5 Key TM file formats and metadata considerations
Two prevalent industry-standard file types for TM data are TMX (Translation Memory eXchange) and XLIFF (XML Localization Interchange File Format), both based on XML [68](#page=68).
* **TMX:**
* Designed for exchanging TM data between tools [76](#page=76).
* Can accommodate any number of languages within a single document [76](#page=76).
* Uses encapsulation methods for inline codes [76](#page=76).
* A collection of \`\` elements has no specific order and lacks a mechanism to rebuild the original file [76](#page=76).
* Can store time and date data at the translation unit level [76](#page=76).
* Authors sometimes prefer TMX for its time-stamping capabilities (useful for productivity analysis) and support for multiple target languages [78](#page=78).
* **XLIFF:**
* Created as a format to store extracted text and facilitate data transfer through the localization process [76](#page=76).
* Designed to work with one source and one target language at a time [76](#page=76).
* Provides both encapsulation and placeholder methods for inline codes [76](#page=76).
* Adds data types and fields not present in TMX, such as pre-translation, history, versioning, and binary objects [76](#page=76).
* More powerful for reconstructing or rebuilding the original file [78](#page=78).
While both formats are robust and widely supported, the choice often depends on the specific project, tool, or provided TM files. Regardless of the format, using translation memory is significantly more beneficial than not using it [77](#page=77).
> **Example:** A TMX file might contain metadata like `creationdate="20230115T103000Z"` for a translation unit, indicating when it was first created, which can be vital for productivity analysis. An XLIFF file, conversely, might include fields for versioning or history tracking that TMX lacks [67](#page=67) [76](#page=76).
### 4.6 Metadata and productivity/consistency
Metadata is fundamental to the efficiency and consistency gains provided by CAT tools. By storing and retrieving previously translated segments (exact matches) or similar segments (fuzzy matches), translators can significantly reduce the time and effort required for new translations. This also ensures terminology and phraseology remain consistent across projects and for different clients, especially when using project- or client-specific TMs. The ability to filter based on metadata (e.g., date, author) allows for the reuse of the most relevant and trustworthy translations [75](#page=75) [86](#page=86) [87](#page=87).
> **Tip:** Understanding the metadata stored in your TM files can help you make informed decisions about which segments to leverage and how to manage your translation resources for optimal productivity and quality [84](#page=84) [87](#page=87).
---
# Definition and basic components of computer-assisted translation
Computer-assisted translation (CAT) is defined as a set of computer applications specifically designed to efficiently assist the translator in their task. The primary objective of a CAT system is to provide the translator with all the resources they might need for their work automatically and quickly. Using CAT tools significantly enhances translator productivity, as various software programs available today facilitate these tasks. Proficiency in CAT tools is a demanded skill in agencies and institutions that hire translators. The European Commission highlights mastering CAT and terminology tools, along with common office software, as the most important translation capabilities [32](#page=32) [35](#page=35).
### 5.1 Machine Translation vs. Computer-Assisted Translation
It is important to distinguish between Machine Translation (MT) and Computer-Assisted Translation (CAT) [36](#page=36).
* **Machine Translation (MT):** The translation is performed by a machine, such as Google Translate [36](#page=36).
* **Computer-Assisted Translation (CAT):** The translation is performed by a human with the assistance of various translation tools [36](#page=36).
MT can be integrated into CAT systems, as many CAT systems have access to MT engines [36](#page=36).
### 5.2 Essential CAT Tools
Essential CAT tools, as identified by Berinstein and Mermaud include :
#### 5.2.1 Project Management Software
This software allows for:
* Control of information flow [37](#page=37).
* Assignment of translation tasks [37](#page=37).
* Quality control [37](#page=37).
* Content analysis [37](#page=37).
* Generation of reports, including full and fuzzy matches, and intra- and cross-file repetitions [37](#page=37).
* Word counts [37](#page=37).
* Final delivery to the client [37](#page=37).
> **Tip:** The generation of reports on repetitions (full and fuzzy matches) is crucial for managing client expectations regarding pricing, especially when dealing with repeated segments in a document [33](#page=33) [34](#page=34).
#### 5.2.2 Translation Memory Software
Translation memory software is used to:
* Store translations [38](#page=38).
* Establish terminological and phraseological consistency [38](#page=38).
* Retrieve translation units, thereby increasing productivity [38](#page=38).
Typical file extensions for translation memories include:
* `tmx` (Translation Memory eXchange): This is the standard open format compatible with most translation tools like Trados, MemoQ, and Wordfast [38](#page=38).
* `sdltm`: This is the proprietary format of SDL Trados Studio [38](#page=38).
* `.txt` / `.csv`: These formats are sometimes used to export memories in plain text or for manipulation in spreadsheet software like Excel [38](#page=38).
---
# Process of translation with a CAT system
This section outlines the structured process involved in translating content using a Computer-Assisted Translation (CAT) system, emphasizing a systematic approach to achieve the final translation [49](#page=49).
### 6.1 Overview of the CAT translation process
Computer-Assisted Translation (CAT) systems are designed to streamline the translation workflow by breaking it down into manageable steps. Following a defined routine, as recommended by Oliver is crucial for reaching the final translated output. The process generally involves several key stages [49](#page=49) [50](#page=50).
### 6.2 Stages of CAT translation
The process of translation with a CAT system typically includes the following steps:
#### 6.2.1 File format checking
Before any translation work begins, the format of the source file is meticulously checked to ensure compatibility with the CAT system and to identify any potential issues that might affect the translation process or the final output [50](#page=50).
#### 6.2.2 Resource assignment
In this phase, necessary resources are allocated for the translation project. This can include assigning translators, reviewers, project managers, and ensuring access to relevant translation memories, termbases, and style guides [50](#page=50).
#### 6.2.3 Segmentation
The source document is segmented into smaller, manageable units, typically sentences or phrases. The CAT tool automatically breaks down the text based on predefined rules or punctuation, presenting each segment to the translator for processing [50](#page=50).
#### 6.2.4 Translation
This is the core stage where the translator works on each segment. The CAT system aids the translator by:
* **Displaying source segments:** The original text segment is presented to the translator [50](#page=50).
* **Suggesting translations:** Based on the translation memory (TM) and termbases, the CAT system suggests previously translated segments or terminology. The translator can then either accept these suggestions, edit them, or input a new translation [49](#page=49).
* **Maintaining consistency:** By leveraging TM and termbases, CAT systems help ensure consistency in terminology and phrasing across the entire document, which is vital for professional translation [49](#page=49).
> **Tip:** The effectiveness of the translation stage heavily relies on the quality and comprehensiveness of the pre-existing translation memories and termbases.
> **Example:** If a sentence like "The quick brown fox jumps over the lazy dog" has been translated before, the CAT system will recall the previous translation of this exact segment or similar ones, saving the translator time and ensuring consistency.
The subsequent stages, while not detailed on the provided pages, would typically involve post-translation checks, such as quality assurance, proofreading, and final file formatting, before delivering the completed translation.
---
# Computer-assisted translation tools and functionalities
Computer-assisted translation (CAT) tools are software applications designed to aid human translators by automating repetitive tasks, ensuring consistency, and leveraging previously translated content.
### 7.1 General principles and functionalities
CAT tools operate through a combination of offline and online functions, managing translation memories (TMs) and providing various assistance mechanisms to translators.
#### 7.1.1 Offline functions
Offline functions in CAT tools typically involve the initial processing of texts and interaction with the translation memory before or after direct translation work.
* **Import:** This function is used to transfer a source text and its translation from a text file into the TM. This can be done from a raw format, which may require reprocessing by the user, or from the TM's native format [62](#page=62).
* **Analysis:** This process prepares texts for translation and term retrieval. It involves:
* **Textual parsing:** Correctly recognizing punctuation to distinguish between sentence-ending periods and those in abbreviations, and identifying special text elements that may require special handling or translation. Markup is often used to denote these elements [62](#page=62).
* **Linguistic parsing:** Reducing words to their base form to facilitate automatic term retrieval from term banks. Syntactic parsing can be used to extract multi-word terms or phraseology and to normalize word order variations within phrases [62](#page=62).
* **Retrieval:** Different types of matches can be retrieved from a TM when translating a new segment:
* **Exact match (100% match):** Occurs when the source segment in the TM is an exact character-by-character match to the current source segment [64](#page=64).
* **In-Context Exact (ICE) match or Guaranteed Match:** An exact match that also occurs in the same context, defined by surrounding sentences and attributes like document file name, date, and permissions [64](#page=64).
* **Fuzzy match:** Occurs when the match is not exact, with some systems assigning a percentage score between 0% and 100%. These percentages are not always comparable across different systems without specifying the scoring method [64](#page=64).
* **Concordance:** Allows the translator to search for segment pairs that match selected words or phrases, useful for finding translations of terms and idioms when a terminology database is unavailable [64](#page=64).
* **Updating:** A TM is updated with a new translation once it has been accepted by the translator. TMs can be modified by changing or deleting entries, and some systems allow for multiple translations of the same source segment to be stored [64](#page=64).
* **Export:** Transfers the text from the TM into an external text file. Import and export functions are expected to be inverse operations [63](#page=63).
#### 7.1.2 Online functions
Online functions are those that assist the translator in real-time as they work through a document.
* **Segmentation:** This function divides the source text into meaningful translation units, typically sentences, to be processed and stored in the TM. It uses superficial parsing and forms the basis for alignment. Manual correction of segmentation can lead to future errors if the system repeats its own initial segmentation [63](#page=63).
* **Alignment:** The process of establishing correspondences between source and target language segments. A good alignment algorithm should be able to correct initial segmentation and provide feedback for segmentation refinement [63](#page=63).
* **Term extraction:** This can utilize a prior dictionary or, for unknown terms, employ parsing based on text statistics. This is valuable for estimating the workload and scheduling of a translation project by counting words and repetition [63](#page=63).
* **Automatic translation:** CAT tools often provide automatic retrieval and substitution of TM results as the translator progresses through a document [65](#page=65).
* **Automatic substitution:** If an exact match is found, the software automatically inserts the stored translation. However, this can lead to the repetition of errors if the translator does not verify the translation against the source [65](#page=65).
> **Tip:** Always review automatically substituted translations to prevent the propagation of past errors.
* **Networking:** Enables collaborative translation by making sentences and phrases translated by one team member available to others. This can speed up the process and allows for error correction by other team members before the final translation [65](#page=65).
* **Text memory:** This concept is foundational to standards like the OSCAR xml:tm standard and includes author memory and translation memory [65](#page=65).
* **Translation memory (TM):** Remembers unique identifiers for text units to ensure precise alignment between source and target documents. Unchanged text units can be directly transferred to new document versions without translator intervention, embodying the concept of 'exact' or 'perfect' matching. xml:tm also supports in-document leveraged and fuzzy matching [65](#page=65).
---
# Translation memory file formats and structure
This section details the information stored within translation memory files and explores the common formats used, focusing on their structure and advantages [67](#page=67) [68](#page=68).
### 8.1 Information stored in translation memory files
Translation memory (TM) files primarily store **translation units (TUs)**, which consist of paired source and target segments. Beyond this core information, TMs can also encompass a variety of additional metadata to enrich the data and provide context [67](#page=67).
#### 8.1.1 Core information
* **Segments:** The fundamental data includes the original source text segment and its corresponding translated target text segment [67](#page=67).
* **Language:** Information about the language pair of the translation is stored [67](#page=67).
* **Creation dates and times:** Timestamps indicating when the translation unit was created are typically recorded [67](#page=67).
#### 8.1.2 Additional data
TMs can also store a range of supplementary data points, enhancing their utility and providing a more comprehensive history of the translation process [67](#page=67):
* **Author:** Identifies the person or entity responsible for creating or modifying the translation [67](#page=67).
* **Usage count:** Tracks how many times a particular translation unit has been used [67](#page=67).
* **Change dates and times:** Records when a translation unit was last modified [67](#page=67).
* **Creation tool:** Specifies the software or tool used to create or import the translation unit [67](#page=67).
* **Domain (field):** Indicates the subject matter or industry to which the translation pertains [67](#page=67).
* **Alternate translations:** May include alternative translations for a given source segment [67](#page=67).
* **Notes:** Allows for the inclusion of any relevant annotations or comments related to the translation [67](#page=67).
### 8.2 Typical formats of translation memory files
While various formats can be used, the industry primarily relies on XML-based formats for their robust structure and interoperability [68](#page=68).
#### 8.2.1 XML-based formats
The two most prevalent file types are XLIFF (XML Localization Interchange File Format) and TMX (Translation Memory eXchange). Both utilize the XML (Extensible Markup Language) format, which offers significant advantages over simpler text-based files [68](#page=68).
**Advantages of XML for TM storage:**
* **Well-defined structure:** XML files possess a clear and predictable structure, making them easy to parse by software [68](#page=68).
* **Semantic tags:** The use of tags such as `` and `` helps to clearly indicate the meaning and type of data they enclose, enhancing human readability [68](#page=68).
* **Software support:** A wide array of software tools are designed to work with XML, facilitating validation, import, parsing, and searching operations [68](#page=68).
* **Interoperability:** The standardized and well-defined structure of XML files enables seamless data exchange between different applications and systems [68](#page=68).
#### 8.2.2 Other formats
TMs can also be stored in spreadsheet formats like Excel (XLS) or comma-separated value (CSV) text files. However, these formats typically store less information per translation unit, often limiting data to just the source and target segments and their languages. While XLS and CSV files tend to be smaller, this reduced data storage is a significant drawback [68](#page=68).
### 8.3 Structure of TM files: Header and Body
Translation memory files, particularly those in XML formats like TMX and XLIFF, generally consist of two main sections: the header and the body [69](#page=69).
#### 8.3.1 The header
The header section contains metadata pertaining to the file itself and the localization process. This metadata can include information about the translation memory file, its creation, and the project it belongs to. The use of semantic XML tags makes the header human-readable and understandable even without deep technical knowledge of the specification [69](#page=69).
##### 8.3.1.1 Example header in TMX
[ ] shows an example of a header section within a TMX file [70](#page=70).
##### 8.3.1.2 Example header in XLIFF
[ ] provides an illustration of a header section in an XLIFF file [71](#page=71).
#### 8.3.2 The body
Following the header is the body of the file, which contains the most critical data: the translation units (TUs) and their constituent segments. This section is where the source and target language pairs are stored, forming the core of the translation memory [72](#page=72).
##### 8.3.2.1 Example body in TMX
An example of the body section of a TMX file, illustrating how translation units are structured, can be found on [ ] [73](#page=73).
##### 8.3.2.2 Example body in XLIFF
[ ] presents an example of the body section within an XLIFF file, demonstrating its structure for translation units [74](#page=74).
> **Tip:** Understanding the header and body structure of TM files is crucial for effectively managing, importing, and exporting translation data. The semantic nature of XML tags greatly aids in deciphering the content of these files.
---
# Understanding translation memory files and their importance
Translation memory (TM) files are crucial tools for translators, enabling greater efficiency and consistency in their work by leveraging previously translated content [75](#page=75).
### 9.1 The function and benefits of translation memory files
Translation memory files are utilized within Computer-Assisted Translation (CAT) or Translation Environment Tool (TEnT) software to enhance translator productivity. When a segment in the current translation matches a segment previously translated and stored in the TM, the tool alerts the translator, offering automatic suggestions or partial matches. This process significantly speeds up translation by reducing the need to re-translate identical or similar content [75](#page=75).
Beyond efficiency, TM files are instrumental in maintaining consistency across translations, especially when working on various projects for different clients. By using "client-based" or "project-based" translation memories, translators can ensure that specific terminology and phrasing are applied accurately and uniformly throughout their work [75](#page=75).
> **Tip:** Always consider the project's specific requirements when deciding which translation memory to use to ensure adherence to client-specific terminology.
### 9.2 Industry standard file formats: TMX and XLIFF
TMX (Translation Memory eXchange) and XLIFF (XML Localization Interchange File Format) are both industry-standard, XML-based file types commonly used in translation workflows. While they share commonalities, including support for inline markup elements, they possess distinct structures and were developed for slightly different primary purposes [76](#page=76).
#### 9.2.1 Key differences between TMX and XLIFF
* **Purpose:** XLIFF was initially designed to store extracted text and facilitate data transfer throughout the localization process. In contrast, TMX was created specifically for exchanging translation memory data between different tools [76](#page=76).
* **Language Support:** TMX files can accommodate multiple languages within a single document. XLIFF, however, is structured to handle one source and one target language at a time [76](#page=76).
* **Inline Code Handling:** TMX exclusively uses encapsulation methods for inline codes, where native codes are enclosed within distinct elements. XLIFF offers both encapsulation (similar to TMX) and a placeholder method. In the placeholder method, native codes are removed to a "Skeleton file" and replaced with short elements that reference them, akin to OpenTag's approach [76](#page=76).
* **Structural Organization:** A TMX file is a collection of `` (translation unit) elements that lack a specific order and do not contain mechanisms to reconstruct the original file [76](#page=76).
* **Additional Data Fields:** XLIFF includes data types and fields not present in TMX, such as pretranslation, history, versioning, and binary objects [76](#page=76).
* **Time and Date Data:** TMX files can store time and date information at the translation unit level, a capability that XLIFF files do not possess [76](#page=76).
#### 9.2.2 Choosing between TMX and XLIFF
Both TMX and XLIFF are robust and widely supported by translation software. The choice often depends on the specific project requirements or the software being used. In many instances, translators do not need to actively choose, as their CAT tools can export translation memories in either format. Ultimately, utilizing any form of translation memory is vastly superior to not using one at all [77](#page=77).
However, for new translation projects, some professionals prefer TMX for two main reasons [78](#page=78):
* **Time-stamped translation units:** This feature enables productivity analysis of one's work at a later stage [78](#page=78).
* **Support for multiple target languages:** Multiple target languages can be stored within a single file [78](#page=78).
Conversely, if the ability to reconstruct or rebuild the original file from the TM file is a priority, XLIFF is the more powerful format for this purpose [78](#page=78).
### 9.3 The role of metadata in translation memory files
Translation software divides texts into segments, and each segment is associated with metadata. This metadata can trace a segment back to the translator, the date, and the time of its creation. This information is valuable, allowing translators to opt for more recent or relevant content when leveraging TM data and to identify and remove segments with outdated terminology [84](#page=84).
Metadata also plays a critical role for Language Service Providers (LSPs) in effectively managing their TM resources. However, potential loss of essential metadata during format transfers can lead to interoperability issues, potentially restricting users to specific software tools [84](#page=84).
Metadata, in essence, is data that describes other data, providing supplementary information about digital content and processes. As defined by Berners-Lee in the context of the World Wide Web, it is "data about data" or "machine understandable information about web resources or other things." There are three primary types of metadata [85](#page=85):
* **Descriptive metadata:** Describes the content itself [85](#page=85).
* **Structural metadata:** Describes how objects or components are organized [85](#page=85).
* **Administrative metadata:** Provides technical information, such as file type [85](#page=85).
---
# Interchange formats for translation data
Interchange formats for translation data facilitate the seamless transfer of various types of linguistic information between different tools and stages of the localization process [88](#page=88).
### 10.1 Translation memory exchange formats
#### 10.1.1 TMX (Translation Memory eXchange)
TMX is designed for the transfer of translation memories between different translation tools. A translation memory (TM) itself is a repository of source text segments and their corresponding translations into one or more target languages. TMX enables users to maintain their organized TM data across various software solutions. The development and maintenance of TMX are overseen by the OSCAR Special Interest Group at LISA (the Localisation Industry Standards Association) [89](#page=89).
### 10.2 Localization data exchange formats
#### 10.2.1 XLIFF (XML Localization Interchange File Format)
XLIFF facilitates the transfer of localizable data that has been extracted from original files. This format supports the movement of data through different phases of the localization workflow, including merging the localized content back into its original structure. XLIFF also aids in maintaining organized localization projects. The OLIF Consortium is involved with XLIFF, working closely with the SALT group [90](#page=90).
### 10.3 Lexical and terminological data exchange formats
#### 10.3.1 OLIF (Open Lexicon Interchange Format)
OLIF is a format specifically for transferring terminological and lexical data between translation tools. While similar in purpose to TBX, OLIF is more focused on Natural Language Processing (NLP) data, such as machine translation lexicons. The OLIF Consortium is responsible for its development and maintenance [91](#page=91).
#### 10.3.2 TBX (TermBase eXchange format)
TBX, also known as DXLT (Default XLT format), is used for transferring glossaries between translation tools. The format is based on the ISO 12200 standard, which is the Machine-Readable Terminology Interchange Format (MARTIF). The SALT (Standards-based Access service to multilingual Lexicons and Terminologies) group at BYU is involved with TBX [92](#page=92).
> **Tip:** Understanding these interchange formats is crucial for managing translation assets effectively and ensuring interoperability between different localization tools and workflows. They prevent vendor lock-in and allow for more flexible project management [88](#page=88).
---
# Translation memory alignment for legacy text
Alignment is a process used to create translation memories from existing legacy text when it is not already in TM format [98](#page=98).
### 11.1 The need for alignment
Translation Memory (TM) systems are essential in the translation and localization industry for reusing previously translated text. When a new project begins, existing translations are stored in a TM database, which can then be accessed as either 100% matches or fuzzy matches, leading to time and cost savings, as well as improved consistency. However, not all legacy material is available in TM format. This can occur for several reasons [97](#page=97):
* Translations were performed by in-country offices without access to TM systems [98](#page=98).
* A linguistic vendor did not deliver a TM as part of the project handover [98](#page=98).
* A linguistic vendor provided a TM, but its quality was poor, and subsequent improvements were only made to the translated files, not the TM itself [98](#page=98).
In such scenarios, existing translated work is not lost; TMs can be created from legacy text through the alignment process [98](#page=98).
### 11.2 What is alignment?
Alignment involves taking a source file and its corresponding translation and matching segments to each other. This process builds a repository of translation units that are then saved as a TM for use in future translation projects [99](#page=99).
### 11.3 The alignment process
The initial alignment is typically performed using automated alignment tools. These tools require a set of source and target files, which are loaded and linked based on their filenames. An automatic alignment is then executed for each file pair [100](#page=100).
Alignment tools analyze the structure of both the source and target files, matching source text with probable translations on a sentence-by-sentence basis. These tools have advanced significantly over time, and the results of automated alignment are generally very good. Some tools also produce a report that includes a quality score, calculated using internal algorithms, to indicate the success of the alignment [100](#page=100).
> **Tip:** Automated alignment tools can produce excellent results, but it's always advisable to review the quality score and, if possible, spot-check some of the aligned segments to ensure accuracy.
[ ] includes a mention of seeing examples of these reports, but no specific examples are provided in the document .
---
# Automated sentence alignment process
The automated sentence alignment process involves using specialized software tools to match sentences in a source text with their corresponding translations in a target text. This process is crucial for creating bilingual corpora that are essential for various natural language processing tasks [100](#page=100) .
### 12.1 General principles of automated alignment
Automated alignment tools operate by analyzing the structural properties of both the source and target files. They then proceed to match sentences in the source text with their most probable translations in the target text on a sentence-by-sentence basis. The sophistication of these tools has advanced significantly over time, leading to generally high-quality alignment results [100](#page=100).
### 12.2 Reporting and quality assessment
Many alignment tools are capable of generating reports that include a quality score. This score is typically derived from internal algorithms and serves as an indicator of the success of the automated alignment process [100](#page=100).
### 12.3 Historical projects and evaluation metrics
Recognizing the significance of sentence alignment, several international projects have been initiated to develop robust evaluation metrics and corpora for this task. Notable examples include :
* **Project ARCADE (1995-1996):** This project focused on producing a bilingual French-English corpus specifically designed for sentence alignment and its evaluation .
* **MULTEXT-East Project:** This initiative involved sentence-aligning six translations of George Orwell's novel "1984" with the original English text. The resulting alignments were subsequently validated manually .
* **Egypt Statistical Machine Translation Toolkit:** This project, along with its successor GIZA++, contributed to the development of tools and methodologies for training statistical translation models, which often rely on accurately aligned corpora .
---
# Linguistic verification and alignment factors
This section details the process of linguistic verification after an alignment project is generated and outlines key factors that influence the quality and accuracy of alignment results.
### 13.1 Linguistic verification
Linguistic verification is the crucial step where a linguist meticulously reviews each segment of an alignment project. The primary goal is to approve correct matches between source and target segments and to identify and rectify any incorrect matches .
#### 13.1.1 Handling incorrect matches
Incorrect matches can arise for various reasons. A common scenario is when the number of source sentences does not directly correspond to the number of target sentences due to linguistic flow or stylistic choices in translation. For instance, two English source sentences might be translated into a single German sentence to ensure proper sentence structure and readability. Alignment tools, which often rely on segment counts, may fail to recognize such discrepancies, leading to subsequent segments being misaligned .
> **Tip:** When an incorrect match is identified, the linguist can make the necessary corrections. After these adjustments, it is often possible to re-run the automatic alignment process from the point of correction onwards. This re-run updates all subsequent incorrect matches, ensuring a more accurate alignment .
#### 13.1.2 Exporting translated memories
Once the linguistic verification and any necessary corrections are complete, the approved segments from all files are exported into a Translation Memory (TM) format. This TM is then ready for subsequent use in translation workflows .
### 13.2 Factors to consider for improved alignment results
Several factors significantly impact the quality and success of the alignment process:
#### 13.2.1 File format consistency
It is essential that both the source and target files are in the same format. Translation Memory (TM) systems process different file formats, such as InDesign and Word, in slightly varied ways. Formatting and variable information are converted into tags within the file for translation purposes. While alignment tools can utilize these tags as guides, disparities in tag structures between the source and target files can hinder accurate segment matching .
> **Example:** If a source document is in InDesign and its translated counterpart is in Word, the way formatting is converted into tags might differ. This difference could lead to the alignment tool struggling to accurately map segments, even if the text content is identical .
#### 13.2.2 File version consistency
The source and target files should ideally be the same version. Updates to a source file after the initial translation (e.g., adding information or removing redundant text) can complicate the alignment process if the translated file is not simultaneously updated to reflect these changes .
#### 13.2.3 Quality of translated files
The alignment process, by default, does not typically include a linguistic review of the existing translations. Therefore, it is crucial that the client is satisfied with the quality of these prior translations before undertaking alignment. While a linguist can review the files during the alignment process, this adds to the overall time required to complete the task .
---
# Using term bases and glossaries in translation
Term bases and glossaries serve as crucial databases for managing specific terminology in translation projects, enhancing consistency, quality, and efficiency .
### 14.1 What is a term base?
A term base or glossary is defined as a database containing single words or expressions pertinent to a particular subject matter. These terms are frequently presented in a bilingual or even multilingual format .
### 14.2 How does a term base work?
Term bases are integrated features within many Computer-Assisted Translation (CAT) tools. Users can import existing glossaries and subsequently add or update terms as they translate. It is possible to merge multiple bilingual glossaries into a single multilingual one, and to flag specific terms as forbidden .
### 14.3 How to use a term base
#### 14.3.1 Creating and maintaining a term base
When constructing a term base, it is essential to identify key terminology. To ensure a high-quality term base, it is important to utilize final source texts, approved translations, and carefully researched contextual information. A term base can either be created in conjunction with a new translation project or imported from previous translation endeavors. Once established, a term base requires ongoing maintenance to incorporate changes in source texts, translations, or contextual information. Neglecting this maintenance can negatively impact translation quality .
> **Tip:** Prioritize using final source texts and approved translations when building a term base to ensure accuracy and relevance .
> **Tip:** Regular maintenance of term bases is crucial for their effectiveness and to prevent a decline in translation quality .
#### 14.3.2 Benefits of using a term base
Using a term base offers several advantages for the translation process:
* **Increases consistency:** A well-constructed term base ensures that the core message remains consistent across multiple translation projects and collaborators within an organization .
* **Improves translation quality:** By managing terminology and defining forbidden terms, term bases help prevent the use of undesirable words or expressions by translators .
* **Speeds up translation:** The term base functionality in CAT tools is designed for easy and direct access to terminology resources, thereby accelerating the translation process .
* **Correct usage and spelling of corporate terminology:** Term bases guarantee the correct spelling of product or company names, which are often case-sensitive. They also inform translators about terms that should not be translated but retained in the source language .
### 14.4 Enhancing the translation process with terminology management
Language Service Providers (LSPs) should include term bases and specific instructions when delivering translation assignments to translators, directing them to align their work with the glossary. LSPs may also request translators to submit new terms to the glossary for review, thus transforming the glossary into a valuable asset that could potentially be offered to clients. Providing translators with all available terminology ensures they use client-preferred terms, saving time and money on proofreading and increasing client satisfaction through consistent translation output .
In large projects, Project Managers (PMs) may not have knowledge of every language involved. However, a term base can be used to verify terminological consistency even without linguistic expertise in a specific language. If a translation deviates from the glossary, PMs can return it to the translator for correction, further reducing proofreading costs and time .
> **Example:** A Project Manager who does not speak German can still verify the consistency of a German translation against a provided glossary, identifying any terminological discrepancies .
### 14.5 The case of Wordfast Anywhere
#### 14.5.1 Imposing specific terminology
The use of incorrect terminology can undermine an otherwise excellent translation. Many clients possess well-defined jargon, often compiled into glossaries, which they can provide to translators to enforce specific terminology. This approach, common in technical translation, aims to harmonize the translator's linguistic skills with the client's terminological requirements .
#### 14.5.2 Creating glossaries during translation
Sometimes, clients request translators to create a glossary of terms discovered during the translation research phase. In such instances, the translator must compile a glossary and incorporate specific terminology. This glossary building can occur either before the translation begins (during an initial terminology research phase) or concurrently with the translation process .
#### 14.5.3 Utilizing client-provided glossaries
In many scenarios, clients furnish a bilingual glossary that has been previously compiled from past translations. The translator's responsibility is then to adhere strictly to this glossary and, when appropriate, to contribute their own additions .
#### 14.5.4 Glossary function for general translations
For translations of a more general nature, or when a translator is still developing their general vocabulary in a source language, the glossary function in tools like Wordfast Anywhere (WFA) can be employed to itemize terminology encountered during the translation process .
#### 14.5.5 Wordfast Anywhere's glossary implementation
Wordfast Anywhere is designed to assist translators in all these situations through its glossary function. The WFA glossary is structured as a simple tab-delimited text document. Similar to translation memories, this glossary can be uploaded and downloaded from WFA and shared with other CAT programs as needed .
---
# Managing glossaries in WFA
This section details how to utilize, add to, and manage glossaries within the WFA translation environment to enhance efficiency and consistency .
### 15.1 Using existing glossaries
When WFA encounters a term present in an active glossary within the source segment, it highlights this term with a blue background. These highlighted terms function as "placeables," meaning they can be manipulated using dedicated navigation icons (Previous, Next) or keyboard shortcuts (Ctrl+Alt+Right, Ctrl+Alt+Left). Clicking on the term with the mouse or typing its initial letter followed by the Tab key also allows for manipulation. A key advantage is that using the Copy icon or the Ctrl+Alt+Down shortcut will copy the corresponding **translation** from the glossary to the target segment. The Auto-suggest feature, enabled by default, is a highly efficient method for copying target terms, proposing them as you type the first letter of the target term or the first three letters of the source term .
#### 15.1.1 Viewing glossary information
To preview the translation of a highlighted glossary term, the glossary panel can be activated. This can be achieved via the keyboard shortcut Ctrl+Alt+H or by navigating to the View tab and selecting the "Show/Hide Glossary" button. For more detailed information about a term, such as comments or data entered in the F1, F2, and F3 fields, users can simply hover their mouse pointer over the source term in the segment. This action will display the associated information in a pop-up bubble, similar to how translations are shown in the glossary panel .
> **Tip:** Activating the glossary panel is a quick way to verify translations without interrupting the translation flow .
### 15.2 Adding terms to the glossary
WFA facilitates the dynamic incorporation of new terms and their translations into an existing glossary directly within the translation process, eliminating the need to exit the software. This feature is invaluable for reinforcing a linguist's memory and preventing repetitive research for the same words or phrases .
#### 15.2.1 Steps for adding a new term
1. **Select the source term:** In the source text, click on the desired word or phrase. If it's a single word, simply clicking on it will suffice. Alternatively, use the Tab key (or Shift+Tab for backwards movement) to navigate to the term. The selected source term will be visually indicated with a red border .
2. **Select the target term:** In the target segment, click on the corresponding translated term. This selected target term will appear with a blue background .
3. **Invoke the Glossary Dialog Box:** Press Ctrl+Alt+T or click the "Add Term" button .
4. **Populate fields:**
* If the term consists of a single word, the selected source and target terms should automatically populate the "Source" and "Target" fields in the dialog box .
* For terms comprising multiple words, you may need to paste the text from your computer's clipboard or manually type it into the respective fields .
5. **Add additional information (optional):**
* **Comment:** A comment field is available to record supplementary information that might be useful later, such as the specific context in which a translation was used .
* **F1, F2, F3 fields:** These fields can be utilized to store information regarding the word's role, grammatical form, context, or any other relevant text-based data .
6. **Save the term:** Click the "Save" button to confirm the addition of the new term to the glossary .
> **Example:** You encounter the term "user interface" in the source text and translate it to "interfaz de usuario." You can select both terms, invoke the Add Term dialog, and save "user interface" as the source and "interfaz de usuario" as the target. You might also add a comment like "Standard UI term for web applications." .
> **Tip:** Regularly adding new terms to your glossary helps build a personalized and highly relevant translation memory, significantly reducing future research time .
---
# Localization versus translation
This topic explores the distinction between translation and localization, emphasizing that localization involves adapting content for specific local audiences beyond mere linguistic conversion .
### 16.1 Understanding translation
Translation is the process of converting content from a source language into a target language. This process adheres to the grammar rules and syntax of the target language, moving beyond literal word-for-word conversion. The goal of translation is to ensure the original meaning of the source text is accurately preserved in the target language .
Translation is required for various types of content, including user manuals, medical documents, technical publications, scientific journals, and literature .
### 16.2 Understanding localization
Localization is a more extensive process than translation, focusing on adapting content to resonate with local audiences. While translation is a component of localization, localization encompasses a broader scope of adaptation .
Localization is particularly relevant for digital and interactive content such as websites, mobile applications, software, video games, multimedia content, and voiceovers .
> **Tip:** Localization acknowledges that even within a single language, variations exist across different regions or countries, necessitating tailored approaches .
### 16.3 Key differences and the localization process
The primary distinction lies in the depth of adaptation. For example, Spanish used in Argentina, Mexico, and Spain requires different content strategies, similar to how English varies across the US, Australia, and Canada .
The localization process involves more than just a team of translators. It requires collaboration with local marketers and consultants to ensure that cultural aspects and local laws of each target market are respected .
> **Example:** To achieve business success in local markets, a client needs to localize their content, not just translate it, to gain the trust of the local public and overcome more than just language barriers. This involves crafting a customized message for each specific local audience .
In essence, translation addresses linguistic barriers, while localization tackles the broader challenge of creating a customized and culturally relevant message to connect with diverse local audiences .
---
# Localization goes beyond translation to adapt content for local audiences
Localization is a comprehensive process of adapting content for local audiences, extending beyond mere linguistic translation to encompass cultural, legal, and market-specific nuances. It aims to ensure that a message resonates deeply with a particular target demographic, fostering trust and understanding .
### 17.1 Understanding the core difference: Translation vs. Localization
#### 17.1.1 Translation: The linguistic foundation
Translation involves converting content from a source language to a target language while adhering to grammatical rules and syntax. It is crucial for technical documentation, scientific journals, literature, and user manuals, demanding accuracy to preserve the original meaning .
#### 17.1.2 Localization: Adapting for impact
Localization, conversely, focuses on tailoring the message to local audiences. This process is essential for digital content such as websites, mobile applications, software, video games, multimedia, and voiceovers. It acknowledges that even within a single language, significant regional variations exist .
> **Tip:** Think of translation as the first step, providing the fundamental linguistic bridge, while localization builds upon this to create a culturally congruent and relevant experience.
### 17.2 The multifaceted nature of localization
#### 17.2.1 Addressing linguistic diversity within a language
Even countries sharing an official language, such as Spanish-speaking nations like Argentina, Mexico, and Spain, require distinct approaches. Similarly, English varies considerably across the United States, Australia, and Canada. Localization requires attention to these local versions and dialects for effective marketing strategies .
#### 17.2.2 The broader localization team
Successful localization demands more than just skilled translators. It necessitates collaboration with local marketers and consultants to ensure respect for cultural aspects and local laws in each target market .
#### 17.2.3 Building trust and gaining market penetration
For businesses to succeed in foreign markets, regular translation is often insufficient. Localization is vital for earning the trust of the local populace, as selling in a new country involves overcoming more than just language barriers. It requires crafting a customized message specifically designed for each local audience .
### 17.3 Going beyond words: Cultural adaptation in action
#### 17.3.1 Overcoming cultural barriers
Cultural barriers can significantly impede the understanding of an original message. Localization strategies must address these inherent differences to ensure clarity and connection .
#### 17.3.2 Case study: KitKat in Japan
A prime example of successful localization is Nestlé's KitKat campaign in Japan. Instead of a direct translation of "Have a break, have a KitKat," the slogan was adapted to "Kitto Katsu," which translates to "surely win" in Japanese. Furthermore, exotic chocolate bar flavors were introduced to cater to local tastes .
> **Example:** This strategic adaptation by KitKat transformed their Japanese campaign into a localization triumph, illustrating the power of using language and cultural references that resonate with local consumers .
---
# Comprehensive localization involves numerous changes beyond text rewriting
Comprehensive localization extends far beyond mere text translation, requiring significant adjustments to various elements to ensure a website or product is culturally appropriate and user-friendly for target audiences .
### 18.1 Beyond translation: Adapting content for local markets
Effective localization aims to make a client's website appealing to diverse audiences by addressing numerous details that can overcome cultural barriers and enhance usability .
### 18.2 Key areas requiring adaptation
Beyond translation, several additional changes are necessary for an improved user experience:
#### 18.2.1 Color meanings
Colors carry diverse cultural connotations; what is acceptable in one region may be offensive or convey a different message in another. For instance, red can signify danger in some cultures, white death, and orange mourning or loss. Thorough research is essential before translation, particularly for new target audiences .
#### 18.2.2 Layout flexibility
Different languages require varying amounts of space to express the same concepts. Therefore, a flexible layout is crucial to accommodate text of different lengths resulting from translation .
> **Tip:** Anticipate text expansion during translation. Languages like German or French can expand English text by 30% to 100%, necessitating adaptable design .
#### 18.2.3 Visual content adaptation
Photographs and other visuals must be adapted to local cultures. For example, imagery like blond mothers hugging their children might not resonate with a Chinese audience and could even offend customers in the Middle East .
#### 18.2.4 Units of measurement conversion
Most countries utilize the metric system. Converting units of measurement to the metric system is vital for making content easily understandable and followable for the target audience .
#### 18.2.5 Currency unit localization
Currency also needs localization, which involves changing from one currency to another, such as from 100 dollars to 100 pounds sterling. To show equivalent amounts, currency conversion is necessary, for example, "100 dollars (65 pounds sterling)" .
> **Tip:** When localizing currency, ensure you accurately reflect the equivalent value in the target currency.
#### 18.2.6 Paper size considerations
Printed documents may be designed for specific paper sizes, such as the European A4 (210 by 297 mm, or 8.27 by 11.7 inches) instead of the American letter size (8.5 by 11 inches). These subtle differences can affect document formatting and page breaks .
#### 18.2.7 Date format standardization
Understanding variations in date formats is critical. For instance, "4/5/15" could mean April 5 in the U.S. or May 4 in the UK, leading to crucial misunderstandings if not clarified .
#### 18.2.8 Legal compliance
When conducting business internationally, adherence to local regulations is mandatory. Respecting these rules is essential to avoid legal complications, potential penalties, or even website bans .
> **Example:** Failing to adapt contracts and agreements to the legal framework of a foreign country can lead to significant financial penalties or legal action.
---
# Localization versus translation and its importance
Localization is crucial for international business success as it involves adapting content and products to specific local cultures and expectations, going beyond mere translation to ensure resonance and engagement with the target audience .
### 19.1 Understanding the core concepts
#### 19.1.1 Translation vs. Localization
Translation focuses on converting text from one language to another, preserving the original meaning as closely as possible. Localization, however, is a broader process that encompasses translation but also adapts a product or content to a specific locale, taking into account cultural nuances, local customs, and user expectations. The primary goal of localization is to make the product or content feel as if it were originally created for the target audience, irrespective of geographic location, culture, or language .
> **Tip:** Think of translation as a component of localization, not as its entirety.
#### 19.1.2 The necessity of localization
In today's globalized marketplace, simply translating content is often insufficient to connect with a target market. To increase engagement and drive sales, businesses must tailor their offerings to meet local beliefs, traditions, and expectations. This strategic adaptation helps companies stand out in competitive markets and establish themselves as "local" entities .
### 19.2 The process and scope of localization
#### 19.2.1 Strategic planning and execution
Achieving a "local" feel requires significant strategic planning and the collaboration of individuals with diverse skill sets, both within the company and from external suppliers. This multi-faceted approach ensures that the final product is culturally relevant and appealing .
#### 19.2.2 Areas of application
Localization is applied across a wide range of products and content to enhance their appeal to specific markets. Common areas include:
* Websites .
* Video games .
* Movies .
* Product information .
* Mobile applications .
* Software .
* Whitepapers .
* Tech support pages .
* Help files .
* Newsletters .
### 19.3 The overarching objective of localization
The ultimate objective of localization is to create an experience for the end-user that feels tailor-made for their specific locale. This involves making the product or content resonate with the user's cultural background and linguistic preferences, thereby maximizing the client's investment and improving the likelihood of increased sales and global business growth .
---
# Technical considerations in language localization
Technical considerations are crucial for adapting products and content to specific target locales, ensuring they feel locally created and relevant. This involves a strategic planning process that goes beyond simple translation to encompass cultural adaptation and technical adjustments .
### 20.1 The objective and scope of language localization
The primary objective of language localization is to provide a product with the appearance and feel of being specifically designed for the target locale, irrespective of geographical location, culture, or language. This process is applied to a wide range of content and products, including websites, video games, movies, product information, mobile applications, software, whitepapers, tech support pages, help files, and newsletters .
### 20.2 Core components of language localization
The translation itself is often the most time-consuming component of language localization. However, localization involves several other key technical considerations :
#### 20.2.1 Translation of text and media
* **Subtitles and Dubbing:** For video, audio, and film, spoken words or music lyrics are translated for subtitles or dubbing .
* **Digital and Printed Materials:** Text in all printed materials and digital media, including documentation and error messages, requires translation .
* **Logos and Images:** Logos and images containing text may need alteration or replacement with more generic icons and pictures if their text requires translation .
#### 20.2.2 Design and layout adjustments
* **Content Adaptation:** Website design or written content may need to be altered to accommodate differences in character sizes and translation lengths between languages .
* **Complex Text Layout:** Some languages utilize complex text layouts where character shapes change based on context .
#### 20.2.3 Linguistic and stylistic considerations
* **Variety, Register, and Dialect:** For audio materials, localization must consider differences in variety, register, and specific dialects .
* **Writing Systems and Direction:** Different writing systems use distinct scripts or characters (symbols, logograms, syllograms, letters). Writing direction can vary, with some languages going left-to-right (e.g., European languages), right-to-left (e.g., Arabic, Hebrew), or using boustrophedon scripts. Some Asian languages can be written vertically .
* **Capitalization:** Some languages require capitalization, which may not be present in others .
* **Sorting Rules:** Different writing systems and languages have varying text sorting rules .
* **Numeral Systems:** Translators must account for languages that use different numeral systems .
* **Grammar and Pluralization:** Attention to detail is vital as pluralization and other grammatical rules vary across languages .
* **Punctuation:** The use of punctuation can differ; for example, French may use guillemets, akin to English double quotes .
#### 20.2.4 Data and format conventions
* **Number Formats:** Writing conventions for number formats, including digit grouping and decimal separators, need consideration .
* **Time and Date Formats:** Time and date formats, including the use of different calendars, must be adapted .
* **Standard Data:** Standard data relevant to the target locale should be incorporated .
#### 20.2.5 Economic and practical conventions
* **Cultural Differences:** Economic conventions vary significantly, impacting elements like paper sizes, preferred storage media, broadcast TV systems, phone number formats, delivery services, and postal address formats .
* **Currency:** Currency symbols, their position, and the use of currency markers must be adapted .
> **Tip:** Always write currency amounts in full letters and never use currency symbols. For example, use "100 dollars" or "50 USD", not "$100" or "50$".
* **Measurement Systems:** Localization requires consideration of different measurement systems .
* **Electrical Standards:** Standards for battery sizes, electric current, and voltage may need to be adjusted .
* **Third-Party Providers:** Variations in payment service providers, weather reports, and the presentation of online maps from third-party providers should be addressed .
* **Time Zones:** Translators must carefully consider variations in time zones .
#### 20.2.6 Legal and regulatory compliance
* **Varying Legal Requirements:** Legal requirements differ by country, necessitating product customization or complete changes to fit specific regulatory compliance. This includes :
* Compliance with privacy laws .
* Additional disclaimers on packaging or websites .
* Different consumer labeling requirements .
* Regulations on encryption and export restrictions .
* Conformity with subpoena procedures or internet censorship .
* Accessibility requirements .
* Tax collections, such as customs duties, value-added tax, and sales tax .
---
# Considerations for linguistic and cultural adaptation in localization
Localization requires careful attention to linguistic and cultural nuances to ensure content is appropriate and effective for target audiences.
### 21.1 Linguistic adaptation
Adapting content for a new language involves more than direct translation, encompassing variations in writing systems, grammar, and punctuation.
#### 21.1.1 Writing systems and directionality
Different writing systems employ distinct scripts or characters, which can be symbols, logograms, syllograms, or letters. The direction of writing also varies significantly; European languages typically flow left-to-right, while Arabic and Hebrew are written right-to-left. Some scripts, like boustrophedon, alternate direction, and certain Asian languages can be written vertically .
#### 21.1.2 Text layout and formatting
Complex text layouts are common, where characters change shape based on context. Capitalization rules differ across languages, and some require it where others do not. Text sorting rules also vary between writing systems and languages .
#### 21.1.3 Numerals and grammar
Translators must account for different numeral systems used in various languages. Grammatical rules, including pluralization, also vary widely and require careful consideration .
#### 21.1.4 Punctuation
Punctuation usage can differ, with examples like the French language using guillemets (similar to double quotes) in some publications .
### 21.2 Cultural adaptation
Beyond language, localization involves adapting to a wide array of cultural and economic conventions, as well as legal and political considerations.
#### 21.2.1 Economic and technical conventions
Economic conventions vary by country and can affect practical aspects like paper sizes, preferred storage media, broadcast TV systems, phone number formats, and postal address structures. This also extends to currency symbols, their placement and usage, measurement systems, battery sizes, and electricity standards (current and voltage). Variations in payment service providers, weather report presentations, and online map displays from third parties also need consideration .
#### 21.2.2 Time zones and legal requirements
Time zone differences are critical and must be carefully managed by translators. Legal requirements can necessitate significant product customization or even complete product redesign to ensure regulatory compliance. These include adherence to privacy laws, additional disclaimers on packaging or websites, different consumer labeling requirements, regulations on encryption and export restrictions, conformity with subpoena procedures or internet censorship, accessibility standards, and tax collections such as customs duties, value-added tax, and sales tax .
#### 21.2.3 Political and social sensitivities
Localization efforts must be sensitive to political issues, including disputed borders and geographical naming disputes. Numbers assigned by governments, such as national identification numbers, Social Security Numbers, and passport information, also require careful handling .
#### 21.2.4 Personal and aesthetic considerations
Translators should be mindful of local holidays, title conventions, and personal name conventions. Aesthetics play a role, influencing the appropriateness of colors and images, local architecture, socioeconomic status, clothing, and ethnicity of people depicted .
#### 21.2.5 Local customs and taboos
Special care must be given to local customs, superstitions, religious practices, and social taboos to avoid causing offense .
> **Tip:** Understanding the target audience's cultural context is paramount. What is acceptable or even desirable in one culture may be offensive or misunderstood in another. This requires thorough research and often input from local experts.
---
# Principles and distinctions of website internationalization and localization
Internationalizing a website involves designing and developing it to be easily adapted for different languages and cultural preferences .
### 22.1 Internationalization (#i18n) principles
Internationalization focuses on building a flexible foundation that supports future localization efforts. Key principles include:
#### 22.1.1 Unicode standard
Utilizing the Unicode standard is essential for compatibility with various writing systems, enabling the representation of diverse languages .
#### 22.1.2 Separation of content and code
Keeping content separate from the source code allows for easier translation without extensive coding modifications .
#### 22.1.3 Flexible user interface (UI)
Designing a flexible UI accommodates varying text lengths and supports languages with different reading directions .
#### 22.1.4 Date, time, and number formats
Adapting to locale-specific formats for dates, times, and numbers is crucial for cultural relevance .
#### 22.1.5 Images and icons
Selecting culturally neutral images and icons, or providing alternatives for different regions, ensures inclusivity .
### 22.2 Localization (#l10n) process for websites
Localization is the process of adapting an internationalized website to a specific locale, considering linguistic and cultural aspects. This process involves several key steps:
#### 22.2.1 Translation of content
This involves converting text and multimedia elements into the target language, paying close attention to linguistic nuances and cultural sensitivities .
#### 22.2.2 Adaptation of graphics and multimedia
Ensuring that images, videos, and other multimedia elements are culturally appropriate and resonate with the target audience is a critical step .
#### 22.2.3 Adjustment of layout and design
Modifying the layout and design is necessary to accommodate variations in text length, font styles, and other language-specific considerations .
#### 22.2.4 Integration of local regulations
Compliance with legal requirements and local regulations concerning content, privacy, and accessibility is paramount .
#### 22.2.5 Testing and quality assurance
Rigorous testing of the localized website is performed to ensure functionality, linguistic accuracy, and cultural appropriateness .
### 22.3 Web localization vs. other audiovisual products
While the core principles of localization apply broadly, websites present unique challenges compared to other media like applications or games:
* **Dynamic content:** Websites often feature dynamic content requiring real-time updates, which complicates the localization process more than with static products .
* **SEO considerations:** Effective localization of metadata, keywords, and tags is vital for search engine optimization (SEO), directly impacting a website's visibility in different regions .
* **Cultural sensitivity:** As public-facing platforms, websites demand careful attention to cultural nuances to prevent misunderstandings or unintentional offense .
* **Continuous updates:** The frequent updating of websites necessitates ongoing localization efforts to maintain current and culturally relevant content .
> **Tip:** Internationalization is the proactive design and development phase to prepare for localization, whereas localization is the reactive adaptation to a specific locale. Both are crucial for global web presence.
---
# Translation for search engine optimization and effective international web presence
This topic explores the vital role of translators in enhancing a client's global online visibility through Search Engine Optimization (SEO) and effective international web presence .
### 23.1 Key considerations for translators in international SEO
#### 23.1.1 Keyword research
Thorough keyword research in the target language and region is essential to identify terms and phrases that local audiences use in search queries. This includes considering linguistic variations, synonyms, and colloquial expressions .
> **Tip:** Understanding the specific search behaviors of the target audience is paramount for effective keyword selection.
#### 23.1.2 Cultural relevance
Translators must grasp cultural nuances and preferences to select keywords that genuinely resonate with the target audience. Literal translations should be avoided if they fail to capture the intended meaning or sound unnatural in the target language .
#### 23.1.3 Localized content
Translated content needs to be not only linguistically accurate but also culturally appropriate. This involves adapting the content to align with local customs, traditions, and market trends to enhance its relevance .
#### 23.1.4 Metadata optimization
Particular attention should be paid to translating and optimizing meta titles, meta descriptions, and URL slugs. Crafting compelling and concise meta descriptions that incorporate relevant keywords is crucial for encouraging click-throughs .
#### 23.1.5 Multilingual link building
Collaboration with web developers and marketers is key to building a network of high-quality, multilingual backlinks. Identifying reputable local websites and influencers for potential collaborations can significantly improve search engine rankings .
#### 23.1.6 Content structure and formatting
Ensuring that translated content maintains a user-friendly structure and formatting is important. Utilizing headers, bullet points, and other formatting elements enhances readability and SEO, as search engines value well-organized content .
#### 23.1.7 Mobile optimization
Recognizing the increasing importance of mobile search, translators must ensure that translated content is mobile-friendly. Optimizing images and other media for fast loading times on mobile devices positively impacts SEO rankings .
#### 23.1.8 Regular updates
Staying informed about changes in search engine algorithms and adapting SEO strategies accordingly is vital. Regularly updating translated content to reflect current trends ensures sustained visibility in international markets .
#### 23.1.9 Analytics and reporting
Working closely with clients to monitor website analytics and assess the performance of localized content is a crucial step. Providing insights and recommendations based on data analysis allows for the continuous refinement of SEO strategies .
#### 23.1.10 Communication with clients
Establishing clear communication channels with clients is essential to understand their business goals, target audience, and specific SEO objectives. Collaboration on a strategy that aligns translation efforts with broader marketing initiatives leads to a comprehensive international SEO approach .
---
# The role of professional translators in optimizing web page metadata for foreign markets
Professional translators are crucial in adapting web page metadata for foreign markets, ensuring effective optimization and global competitiveness .
### 24.1 Understanding web page metadata
Metadata, including meta titles and descriptions, is essential for improving a website's visibility and search engine ranking. These elements help search engines understand and index webpage content accurately .
### 24.2 The translator's contribution to metadata optimization
Professional translators play a pivotal role in optimizing metadata for international audiences by addressing several key areas:
#### 24.2.1 Enhancing search engine visibility
Translating metadata ensures that web content is accessible and understandable to global search engines, thereby improving a website's discoverability in foreign markets .
#### 24.2.2 Increasing user click-through rates
Skilled translators create compelling and linguistically accurate meta titles and descriptions. This localized content is more likely to resonate with users, encouraging them to click on search result links .
#### 24.2.3 Ensuring local relevance and cultural alignment
Translators possess an understanding of cultural nuances and audience preferences. By localizing metadata, they ensure that the content aligns with local expectations, making it more appealing and relevant to users in specific foreign markets .
> **Tip:** Localizing metadata goes beyond direct translation; it involves adapting content to fit the cultural context of the target audience.
#### 24.2.4 Optimizing for keywords in target languages
Effective Search Engine Optimization (SEO) relies on relevant keywords. Translators with expertise in keyword research within target languages can incorporate region-specific terms into metadata, increasing the chances of a webpage appearing in relevant search results .
#### 24.2.5 Maintaining global brand consistency
For businesses expanding internationally, maintaining a cohesive brand image is vital. Professional translators ensure that translated metadata aligns with the brand's established tone and message, presenting a unified global presence .
#### 24.2.6 Adhering to technical limitations
Search engines impose character limits on meta titles and descriptions. Translators are skilled in crafting concise yet impactful translations that fit within these constraints, preventing truncation in search engine result pages .
#### 24.2.7 Building credibility and trust
Inaccurate or poorly translated metadata can damage a website's credibility. Professional translators safeguard the integrity of the content, contributing to the establishment of trust with users in foreign markets .
#### 24.2.8 Adapting to market trends
Linguistic and cultural landscapes are constantly evolving. Translators who stay informed about these shifts can update and adapt metadata to reflect current market trends, ensuring sustained optimization .
> **Tip:** Regular review and adaptation of translated metadata are necessary to remain relevant in dynamic international markets.
### 24.3 The collaborative approach
The collaboration between professional translators and web developers or marketers is fundamental for a holistic approach to web page optimization in foreign markets. Translators effectively bridge linguistic and cultural gaps, ensuring that metadata is not just translated but truly optimized for the specific nuances of each target audience, ultimately boosting a website's global competitiveness .
---
# Pre-editing in machine translation processes
Pre-editing involves revising technical documentation before machine translation (MT) to improve the source text and enhance the raw MT output quality, thereby reducing post-editing effort .
### 25.1 The role of humans in MT processes
Humans play a crucial role in machine translation workflows, primarily through two distinct tasks: pre-editing and post-editing .
### 25.2 What is pre-editing?
Pre-editing is the process of modifying source text to improve its suitability for machine translation. The primary objective is to enhance the quality of the raw MT output by making the source text more accessible to the MT engine. Effective pre-editing can significantly reduce or even eliminate the need for post-editing .
#### 25.2.1 The pre-editor's perspective and actions
Ideally, a specialized human editor performs pre-editing. This editor analyzes text from the viewpoint of an MT engine to anticipate potential errors in the output. The pre-editor's actions are aimed at facilitating MT by implementing several strategies :
* **Sentence length reduction:** Shorter sentences are generally easier for MT systems to process accurately .
* **Simplifying syntax:** Avoiding complex or ambiguous grammatical structures helps prevent misinterpretations by the MT engine .
* **Ensuring term consistency:** Maintaining consistent terminology across the source text is vital for accurate translation .
* **Article usage:** Proper and consistent use of articles can improve MT accuracy .
#### 25.2.2 Utilizing automated tools in pre-editing
Pre-editors can leverage automated revision tools to enhance their work:
* **Spell-checking:** Ensuring the source text is free of spelling errors, especially against a project-specific glossary .
* **Advanced grammar-checking:** Employing sophisticated grammar-checking tools to identify and correct grammatical issues .
* **Tagging untranslatable elements:** Identifying and marking parts of the source document that should not be translated .
> **Tip:** Implementing pre-editing techniques not only benefits machine translation projects but also offers advantages for human translation projects. Many organizations incorporate similar processes into their localization best practices when developing extensive mono- and multilingual materials. Writing with MT in mind from the outset has positive downstream effects on overall quality and productivity .
---
# Controlled natural languages for improved documentation
Controlled natural languages (CNLs) are subsets of natural languages intentionally restricted in grammar and vocabulary to minimize ambiguity and complexity, thereby enhancing clarity and facilitating automatic semantic analysis. These languages are crucial for improving the quality of technical documentation and simplifying translation processes, both human and automatic .
### 26.1 Types of controlled languages
Traditionally, controlled languages are categorized into two main types:
* Those designed to enhance readability for human readers, particularly non-native speakers .
* Those engineered to enable reliable automatic semantic analysis of the language .
The first category, often referred to as "simplified" or "technical" languages, includes prominent examples used across industries. These languages impose restrictions on writers through general guidelines such as maintaining short sentences, avoiding pronouns, using only approved vocabulary, and employing the active voice .
### 26.2 Examples of simplified technical languages
Several simplified and technical languages are employed in industry to elevate the quality of technical documentation and to streamline (semi-)automatic translation. These include:
* Caterpillar Technical English .
* Simplified Technical English (STE) .
* IBM's Easy English .
#### 26.2.1 Caterpillar fundamental English
Caterpillar Inc., a global manufacturer of heavy equipment, utilizes a controlled language to ensure consistency and high quality in its extensive technical documentation. This documentation covers a wide array of products and subsystems, necessitating materials like operations and maintenance manuals, testing and adjusting guides, disassembly and assembly instructions, and specifications. To achieve this, Caterpillar employs a restricted vocabulary of approximately 850 words, known as Caterpillar Fundamental English (CFE) .
### 26.3 List of controlled natural languages
A broad range of controlled natural languages exists, reflecting diverse approaches and applications. Some notable examples include:
* ASD Simplified Technical English .
* Attempto Controlled English .
* Aviation English .
* Basic English .
* ClearTalk .
* Common Logic Controlled English .
* Distributed Language Translation Esperanto .
* E-Prime .
* Français fondamental .
* Gellish Formal English .
* Interlingua-IL sive Latino sine flexione (Giuseppe Peano) .
* ModeLang .
* Newspeak .
* Processable English (PENG) .
* Seaspeak .
* Semantics of Business Vocabulary and Business Rules .
* Special English .
* PLAIN LANGUAGE MOVEMENT (Lenguaje claro) .
### 26.4 Companies utilizing controlled languages
Numerous companies across various sectors have adopted controlled languages to enhance their documentation processes. These adoptions often align with specific industry standards or internal initiatives. Prominent examples include:
* **Avaya:** Avaya Controlled English (ACE) .
* **Boeing:** Simplified Technical English (STE), ASD-STE100 .
* **Caterpillar:** Caterpillar Technical English (CTE), Caterpillar Fundamental English (CFE) .
* **Dassault Aerospace:** Français Rationalisé .
* **European Aeronautic Defence and Space Company (EADS):** Simplified Technical English (STE), ASD-STE100 .
* **Ericsson:** Ericsson English .
* **General Motors (GM):** Controlled Automotive Service Language (CASL) .
* **IBM:** Easy English .
* **Kodak:** International Service Language .
* **Nortel:** Nortel Standard English (NSE) .
* **Océ:** Controlled English .
* **Rolls-Royce:** Simplified Technical English (STE), ASD-STE100 .
* **Saab Systems:** Simplified Technical English (STE), ASD-STE100 .
* **Scania:** Scania Swedish .
* **Sun Microsystems:** Sun Controlled English .
* **Xerox:** Xerox Multilingual Customized English .
### 26.5 Rules for controlled languages
The specific grammar rules for controlled languages are not universal and vary significantly from one language to another, as optimal results cannot be achieved for all languages with a single set of rules. However, implementing a set of controlled language rules can substantially reduce ambiguities in most texts, across many languages. Texts that are free from ambiguity are considered ideal for machine translation .
The rules for controlled languages are exemplified by the CLOUT™ rule set. CLOUT is an acronym for Controlled Language Optimized for Uniform Translation, and this rule set was developed by Uwe Muegge .
> **Tip:** Pre-editing technical documentation before machine translation, using techniques similar to those in controlled languages, can significantly improve raw output quality and reduce post-editing efforts .
> **Tip:** Automated revision tools, such as spell checkers against project-specific glossaries and advanced grammar checkers, are valuable components of the pre-editing process .
---
# Rules for writing controlled languages
This section outlines ten rules for writing controlled languages, designed to reduce ambiguity and facilitate machine translation. These rules are exemplified by the CLOUT™ rule set, developed by Uwe Muegge .
### 27.1 General principles of controlled language rules
The rules for controlled languages are language-specific and aim to minimize ambiguities, making texts more suitable for machine translation. The following ten rules are provided as examples to achieve this goal .
### 27.2 The ten controlled language rules
#### 27.2.1 Rule 1: Sentence length
**Rule:** Write sentences that are shorter than 25 words .
> **Example:**
> **Write:** The author performs the following tasks: Collect the necessary information. Analyze and evaluate the information. Write a structured draft.
> **Do not write:** Authors will approach any writing project by collecting the necessary information first, and after carefully analyzing and evaluating it, they will create a structured draft .
#### 27.2.2 Rule 2: Single idea per sentence
**Rule:** Write sentences that express only one idea .
> **Example:**
> **Write:** Authors who optimize their texts for easy comprehension facilitate the translation process. These texts enable machine translation systems to produce better translation results.
> **Do not write:** By optimizing their texts for easy comprehension, authors facilitate the translation process, and doing so enables machine translation systems to create better translation results .
#### 27.2.3 Rule 3: Consistent sentence structure for repeated content
**Rule:** Write the same sentence if you want to express the same content .
> **Example:**
> **Write:** Printer Installation. 1) Remove the printer from the carton. 2) Remove the plastic wrapping.
> **Do not write:** Instructions for installing the printer. After unpacking the printer from the shipping carton, take the printer out of the plastic bag .
#### 27.2.4 Rule 4: Grammatically complete sentences
**Rule:** Write sentences that are grammatically complete .
> **Example:**
> **Write:** Do you wish to continue the installation of the software?
> **Do not write:** Continue installing software ?
#### 27.2.5 Rule 5: Simple grammatical structure
**Rule:** Write sentences that have a simple grammatical structure .
> **Example:**
> **Write:** Show that you can organize your thoughts by using a simple sentence structure in your texts.
> **Do not write:** You, in your texts, to show that you can organize your thoughts, should use a simple sentence structure .
#### 27.2.6 Rule 6: Active voice
**Rule:** Write sentences in the active form .
> **Example:**
> **Write:** The program manager will send a summary of all questions to the responsible coworkers.
> **Do not write:** A summary of questions will be sent to the responsible individuals .
#### 27.2.7 Rule 7: Noun repetition over pronouns
**Rule:** Write sentences that repeat the noun instead of using a pronoun .
> **Example:**
> **Write:** You must check the spelling of your text before you publish your text.
> **Do not write:** You must check the spelling of your text before publishing it .
#### 27.2.8 Rule 8: Use of articles
**Rule:** Write sentences that use articles to identify nouns .
> **Example:**
> **Write:** Test the installation.
> **Do not write:** Test installation .
#### 27.2.9 Rule 9: General dictionary words
**Rule:** Write sentences that use words from a general dictionary .
> **Example:**
> **Write:** Avoid ambiguity.
> **Do not write:** Eschew obfuscation .
#### 27.2.10 Rule 10: Correct spelling
**Rule:** Write sentences that use only words with correct spelling .
> **Example:**
> **Write:** Texts that contain spelling errors complicate the translation process.
> **Do not write:** Texts that contein speling misstakes complicate the translation procces .
---
# Understanding post-editing in machine translation
Post-editing is the process of amending machine-generated translation to produce an acceptable final product .
### 28.1 Definition and Scope
Post-editing (or postediting) involves human translators correcting machine translation output to meet a pre-defined quality level negotiated with the client. It is distinct from editing, which improves human-generated text, and revision, which proofreads human-generated text for simple mistakes .
A person who performs post-editing is called a post-editor. The process may involve pre-editing the source text (e.g., using controlled language principles) before post-editing the machine output for optimal results .
> **Tip:** Post-editing aims to improve machine translation (MT) output to a usable level, distinguishing it from editing human-generated text.
### 28.2 Use Cases and Efficiency
Post-editing is employed when raw machine translation is insufficient but full human translation is not strictly necessary. Industry recommendations suggest using post-editing when it can at least double the productivity of manual translation, potentially quadrupling it for light post-editing tasks .
However, predicting the efficiency of post-editing can be challenging. While studies and industry reports generally indicate that post-editing is faster than translating from scratch, regardless of language pairs or translator experience, there is no consensus on the exact time savings achievable. Industry figures often cite around 40% time savings, whereas some academic studies suggest practical savings in actual working conditions are more likely between 0–20%. Some professionals have even reported negative productivity gains, where corrections take longer than translating from scratch .
> **Tip:** While generally faster than translating from scratch, the actual time savings from post-editing can vary significantly and are a subject of ongoing debate.
### 28.3 Post-editing Strategies and Quality Levels
The amount of post-editing required is project-dependent, making it crucial to define expectations early. Key considerations guiding the post-editing strategy are time, quality, and cost .
#### 28.3.1 Light Post-editing
Light post-editing requires minimal intervention from the post-editor. The primary goal is to make the machine output understandable to the end-user. This approach is typically used for inbound purposes only, especially when texts are needed urgently or have a short time span .
#### 28.3.2 Full Post-editing
Full post-editing involves a more substantial level of intervention. The objective is to achieve a quality level that is not only understandable but also stylistically appropriate. The resulting text can then be used for assimilation and dissemination, suitable for both inbound and outbound purposes .
At the higher end of full post-editing, the quality expectation is that it becomes indistinguishable from human translation .
> **Example:** For a client needing to quickly grasp the gist of a large volume of internal documents, light post-editing might be sufficient. For a marketing brochure intended for external publication, full post-editing would be necessary to ensure stylistic accuracy and brand consistency.
### 28.4 The Evolving Landscape of Post-editing
Historically, it was often assumed that translating directly from the source text required less effort than post-editing a machine-generated version. However, advancements in machine translation and artificial intelligence are changing this perception. For specific language pairs, tasks, and with MT engines customized using high-quality, domain-specific data, some clients are now requesting post-editing over manual translation, anticipating similar quality at a reduced cost .
> **Tip:** The increasing sophistication of MT engines means that for certain use cases, post-editing is becoming a viable and cost-effective alternative to traditional human translation.
---
# Guidelines for achieving different post-editing quality levels
This section outlines the guidelines for achieving different post-editing quality levels, distinguishing between "good enough" and quality similar to human translation. The effort required for post-editing is primarily dictated by the initial machine translation (MT) output quality and the desired final quality .
### 29.1 Defining post-editing quality levels
The two primary quality levels for post-editing are:
* **"Good enough" quality:** This level ensures the content is comprehensible and accurate, meaning the core message is understood and aligns with the source text. However, stylistic improvements are not prioritized, and the text might retain a machine-generated feel with potentially unusual syntax or imperfect grammar, as long as the meaning is conveyed accurately .
* **Quality similar to human translation:** This level aims for content that is not only comprehensible and accurate but also stylistically acceptable, with normal syntax, correct grammar, and punctuation. While it may not reach the highest standards of a native speaker, it is polished and error-free .
### 29.2 Guidelines for achieving "good enough" quality
To achieve a "good enough" quality level, the following guidelines should be applied:
* Prioritize semantic correctness in the translation .
* Verify that no information has been unintentionally added or omitted .
* Edit any content that is offensive, inappropriate, or culturally unacceptable .
* Maximize the use of the raw MT output, making minimal necessary changes .
* Adhere to basic spelling rules .
* Refrain from implementing corrections that are solely stylistic in nature .
* Avoid restructuring sentences solely to improve the natural flow of the text .
> **Tip:** The focus for "good enough" quality is on conveying the correct meaning accurately, with minimal stylistic intervention .
### 29.3 Guidelines for achieving quality similar or equal to human translation
To attain a quality level comparable to human translation, the following guidelines are essential:
* Ensure the translation is grammatically, syntactically, and semantically correct .
* Confirm that key terminology is translated accurately and that any untranslated terms are on the client's approved "Do Not Translate" list .
* Guarantee that no information has been accidentally added or omitted .
* Edit any offensive, inappropriate, or culturally unacceptable content .
* Utilize as much of the raw MT output as possible, making necessary corrections .
* Apply basic rules for spelling, punctuation, and hyphenation .
* Ensure that the final formatting is correct .
> **Tip:** Achieving human-equivalent quality requires meticulous attention to grammatical correctness, stylistic consistency, and adherence to client-specific terminology .
---
## Common mistakes to avoid
- Review all topics thoroughly before exams
- Pay attention to formulas and key definitions
- Practice with examples provided in each section
- Don't memorize without understanding the underlying concepts
Glossary
| Term | Definition |
|------|------------|
| Translation Memory (TM) | A database that stores previously translated segments of text, enabling translators to reuse existing translations for new projects. This significantly improves efficiency, reduces costs, and ensures consistency across different documents and versions. |
| Computer-Aided Translation (CAT) | A set of software applications specifically designed to assist translators in their tasks. CAT tools provide resources like translation memories, terminology management, and project management to enhance productivity and quality. |
| Segmentation | The process of dividing a source text into smaller units, typically sentences or phrases, based on punctuation and user-defined rules. These segments are then processed individually for translation and storage in a translation memory. |
| Alignment | The process of matching corresponding segments between a source text and its translation. This is crucial for creating translation memories from existing translated documents that were not originally created using CAT tools. |
| Term Base (Glossary) | A specialized database containing single words or expressions related to a specific subject, often in multiple languages. Term bases are essential for maintaining terminological consistency and ensuring the correct usage of industry-specific jargon. |
| Localization (L10N) | The process of adapting a product or content to a specific local market and its culture. This goes beyond simple translation to include modifications to language, graphics, and other elements to meet local expectations and preferences. |
| Fuzzy Match | A type of match found in a translation memory where the current source segment is not an exact character-by-character replica of a stored segment. These matches are typically assigned a percentage of similarity, indicating how much of the existing translation can be leveraged. |
| Exact Match (100% Match) | A match found in a translation memory where the current source segment is identical to a stored segment. This allows for the direct reuse of the previously translated text, offering the highest level of efficiency. |
| Metadata | Data that describes other data, providing additional information about digital content and processes. In CAT tools, metadata can include details about the translator, creation dates, usage counts, and the tool used to create a translation unit. |
| Translation Unit (TU) | A pair of source and target text segments stored in a translation memory. Each TU represents a translated piece of content that can be reused in future translation projects. |
| Terminology Management | The systematic process of identifying, collecting, organizing, and maintaining terms and their translations within a specific domain or for a particular client. This is crucial for ensuring consistency and accuracy in translations. |
| Internationalization (I18N) | The design and development of a product or software in a way that allows it to be easily adapted to various languages and regions without requiring significant re-engineering. This is a prerequisite for effective localization. |
| Automatic Dictionary Look-up | A system where a computer assists translators by searching for and presenting relevant dictionary entries based on input words, often including contextual information. |
| Dynamic Concordance System | A feature within translation software that allows for the real-time searching and retrieval of previously translated segments or terms based on their occurrence in the source text. |
| Interactive Translation System (ITS) | An early approach to Computer-Aided Translation that involved multiple levels of functionality, from basic editing and terminology management to integration with machine translation systems. |
| Keypunched | The process of encoding data onto punched cards, a method used in early computing to input information into a machine. |
| Lexical Items | The individual words or vocabulary units that constitute a language. |
| Machine Translation (MT) | The use of computer software to translate text or speech from one language to another automatically, often used in conjunction with other CAT tools. |
| Translator's Workstation | A comprehensive software environment designed for translators, typically integrating a translation memory, editor, and terminology management tools. |
| Simship | The simultaneous release of a product across all local markets, requiring rapid and coordinated translation efforts to meet global launch deadlines. |
| Time-to-market | The duration from product conception to its availability on the market, a critical factor that necessitates faster translation and localization processes. |
| Software Localization | The adaptation of software applications to specific languages and cultures, including the translation of user interfaces, help files, and other textual elements. |
| File Preparation | The process of converting source files into a format suitable for translation tools and then reassembling them after translation, often becoming a new area of expertise for translators. |
| Post-processing | The steps taken after the translation and file assembly process to ensure the final translated content is accurate, formatted correctly, and ready for publication. |
| Computer-Assisted Translation (CAT) | A system that uses specialized software applications designed to efficiently assist translators in their tasks, aiming to provide them with necessary resources automatically and quickly. |
| Terminology Management Software | Software used for creating and managing glossaries or term bases, ensuring consistent use of specific terms and phrases across translations. |
| TermBase eXchange (TBX) | An open, industry-standard format for exchanging terminological data between different software tools, facilitating the sharing of glossaries. |
| Translation Memory eXchange (TMX) | An open, industry-standard format for exchanging Translation Memory data between different CAT tools, ensuring interoperability. |
| Extensible Markup Language (XML) | A text-based markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable, commonly used for structured data like TM files. |
| Project Management Software | Software used to control the flow of information in translation projects, including task assignment, quality control, content analysis, report generation, word counts, and final delivery to the client. |
| Full Matches | Segments of text that are identical to previously translated segments stored in a translation memory. |
| Fuzzy Matches | Segments of text that are similar but not identical to previously translated segments stored in a translation memory, requiring human review and adjustment. |
| Intra-file Repetitions | Identical segments of text that appear multiple times within the same document. |
| Cross-file Repetitions | Identical segments of text that appear in different documents. |
| Computer-Assisted Translation (CAT) System | A software system designed to aid human translators in the translation process by providing tools such as translation memory, terminology management, and quality assurance checks. |
| Translation Process | The series of steps undertaken by a translator to produce a final translated text, which, when using a CAT system, involves specific technological integrations. |
| File Format Check | An initial step in the CAT translation process that involves verifying the compatibility and integrity of the source file format to ensure it can be processed correctly by the CAT software. |
| Resource Assignment | The allocation of necessary resources, such as specific translation memories, termbases, or project instructions, to a translation task within a CAT system to ensure consistency and efficiency. |
| Translation | The core activity of converting text from a source language to a target language, which in a CAT system is facilitated by the software's tools and functionalities. |
| Import | This function is used to transfer a text and its translation from an external text file into a Translation Memory (TM). It can be performed from raw formats, where the source text and its translation are provided, or from the TM's native format, which is used for saving translation memories. |
| Textual Parsing | A process within analysis that focuses on correctly identifying punctuation to differentiate between sentence-ending periods and periods used in abbreviations. This often involves markup, which is a form of pre-editing, to distinguish special text elements that may or may not require translation or conversion. |
| Linguistic Parsing | A process that involves reducing words to their base forms to facilitate automatic term retrieval from a term bank. Syntactic parsing can also be employed to extract multi-word terms or phrases from a source text, normalizing word order variations to identify potential phrases. |
| Term Extraction | A function that can utilize a pre-existing dictionary or employ parsing based on text statistics to identify unknown terms. This process is valuable for estimating the workload of a translation project, aiding in planning and scheduling by counting words and assessing text repetition. |
| Export | The function that transfers translated text from a Translation Memory (TM) into an external text file. Ideally, the export function should be the inverse operation of the import function. |
| Exact Match | Occurs when a current source segment in a document precisely matches a stored segment in the Translation Memory (TM) character by character. This is also referred to as a "100% match" and indicates that the identical sentence has been translated previously. |
| In-Context Exact (ICE) Match | An exact match that also occurs in the identical context, meaning it is found in the same location within a paragraph. Context is often determined by surrounding sentences and attributes like document file names, dates, and permissions. |
| Concordance | A feature where the system retrieves segment pairs that match specified search criteria when a translator selects one or more words in a source segment. This is particularly useful for finding translations of terms and idioms when a dedicated terminology database is unavailable. |
| Updating | The process of adding a new translation to a Translation Memory (TM) after it has been accepted by the translator. This can involve modifying or deleting existing entries, and some systems allow for the storage of multiple translations for the same source segment. |
| Segment | A unit of text, typically a sentence or a phrase, that is translated and stored within a translation memory. |
| Source Segment | The original text segment in the source language that has been translated. |
| Target Segment | The translated text segment in the target language corresponding to the source segment. |
| XLIFF (XML Localization Interchange File Format) | An XML-based file format designed for the exchange of localizable software information between different tools and services, commonly used for translation memory. |
| TMX (Translation Memory eXchange) | An XML-based file format specifically designed for the exchange of translation memory data between different translation memory tools. |
| XML (Extensible Markup Language) | A markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable, widely used for data exchange due to its structured nature. |
| Header | The section of a translation memory file that contains metadata about the file itself and the localization process, such as language pairs and creation information. |
| Body | The main section of a translation memory file that contains the actual translation units, including source and target segments. |
| Semantic Tags | XML tags that describe the meaning or purpose of the data they enclose, such as `` or ``, making the file structure more understandable. |
| Computer-Assisted Translation (CAT) Tool | Software designed to assist human translators by providing features such as translation memory integration, terminology management, and quality assurance checks. |
| Translation Environment Tool (TEnT) | An alternative term for a Computer-Assisted Translation (CAT) tool, encompassing software that supports the translation process. |
| Consistency | The uniformity of terminology, style, and phrasing across a translation project or between different projects for the same client, which TM files help to maintain. |
| Inline Markup Elements | Special codes or tags within a translation file that represent formatting, tags, or other non-translatable elements of the source text. |
| Encapsulation Method | A method used in file formats where native codes are enclosed within specific elements of the file structure. |
| Placeholder Method | A method where native codes are removed from the main text and replaced by short elements that refer to them, often used in formats like XLIFF. |
| Skeleton File | In some localization formats, a file that holds the original structure and non-translatable elements, with placeholders in the main file referring to it. |
| TMX | Transfer of translation memories between different translation tools. A translation memory is a database containing source text segments and their corresponding translations in one or more target languages. |
| XLIFF | Transfer of localizable data extracted from original files through various stages of the localization process, including merging localized data back into its original format. |
| OLIF | Open Lexicon Interchange Format, designed for the transfer of terminological and lexical data between translation tools, particularly geared towards Natural Language Processing (NLP) data such as machine translation lexicons. |
| TBX | TermBase eXchange format, also known as DXLT, which facilitates the transfer of glossaries between translation tools. It is based on the ISO 12200 standard, MARTIF. |
| Glossary | A collection of terms and their definitions, often specific to a particular domain or project, used to ensure consistency in terminology. |
| Localization | The process of adapting a product or content to a specific locale or market, including translation, cultural adaptation, and technical adjustments. |
| Natural Language Processing (NLP) | A field of artificial intelligence and linguistics concerned with the interactions between computers and human language, enabling computers to understand, interpret, and generate human language. |
| Legacy Text | Existing translated material that is not currently in a Translation Memory (TM) format, often requiring a specific process to be converted into a usable TM for future projects. |
| Translation Unit | A pair of matched source and target text segments that form the fundamental building block of a Translation Memory (TM), representing a translated concept or phrase. |
| Automated Alignment Tool | Software used to automatically match source and target text segments by analyzing file structures and sentence content, forming the initial stage of the alignment process. |
| 100% Match | A translation segment found in a Translation Memory (TM) that is an exact match to the current source segment, allowing for direct reuse without modification. |
| Automated Sentence Alignment | The process of automatically matching sentences in a source text with their corresponding translations in a target text using specialized software tools. This process analyzes the structure of both texts to identify probable sentence correspondences. |
| Alignment Tool | Software designed to perform automated sentence alignment by analyzing source and target files, linking them based on filenames, and then matching sentences based on structural similarities and probable translations. |
| Quality Score | A metric generated by some alignment tools, based on internal algorithms, to provide an indication of the success and accuracy of the automated sentence alignment process. |
| Bilingual Corpus | A collection of texts that exists in two languages, specifically prepared or suited for tasks like sentence alignment and its evaluation, as exemplified by Project ARCADE. |
| Sentence Alignment Task | The specific objective of matching sentences between a source text and its translation(s) in a target language, which is a crucial step in many natural language processing and machine translation workflows. |
| Linguistic Verification | The process where a linguist reviews each segment of an alignment project, approving correct matches and correcting or deleting any incorrect matches to ensure accuracy. |
| Incorrect Match | A scenario where the alignment tool fails to correctly pair source and target segments, potentially due to structural differences like one source sentence being translated into multiple target sentences. |
| Formatting/Variable Information | Data within a file that is converted into tags by a Translation Memory system, which the alignment tool can use as a guide for matching segments. |
| Source and Target Files | The original document (source) and its translated version (target) that are processed for alignment. Consistency in format and version between these files is crucial for optimal results. |
| Quality of Translated Files | The standard of the existing translations significantly impacts the alignment process, as the alignment tool typically does not include a linguistic review of the translations themselves. |
| Term Base | A database designed to store single words or expressions pertinent to a specific subject matter, often presented in bilingual or multilingual formats. |
| CAT Tool | Computer-Assisted Translation tool that integrates features like term bases to aid translators in their work, often allowing for import and export of terminology resources. |
| Forbidden Terms | Specific words or expressions that translators are explicitly instructed not to use in their translations, helping to maintain brand consistency and avoid errors. |
| LSP | Language Service Provider, an organization that offers translation and localization services, often utilizing term bases and glossaries to manage client terminology. |
| Project Manager (PM) | The individual responsible for overseeing a translation project, who can use term bases to verify the consistency of terminology even without fluency in all target languages. |
| Placeables | Terms recognized from the glossary within the source segment, highlighted in blue. These terms can be manipulated using specific shortcuts or icons. |
| Auto-suggest Feature | A default feature in WFA that proposes target terms as the user types the first few letters of the source or target term, facilitating faster translation. |
| Glossary Panel | A panel within WFA that displays glossary information, including translations and associated comments, which can be activated via a keyboard shortcut or menu option. |
| Glossary Dialog Box | A pop-up window invoked by a specific shortcut or button, used for adding new terms and their translations, along with supplementary information, to the glossary. |
| F1, F2, F3 Fields | Designated areas within the Glossary Dialog Box for storing additional, text-based information about a term, such as its role, context, or grammatical form, to enhance understanding. |
| Source Language | The original language of a text or content that is being translated or localized. |
| Target Language | The language into which content is translated or localized. |
| Local Versions and Dialects | Variations of a language spoken in specific geographic regions or by particular groups, which may include differences in vocabulary, grammar, and pronunciation that need to be considered during localization. |
| Cultural Aspects | The social behaviors, customs, beliefs, and values specific to a particular region or group of people, which must be considered during localization. |
| Local Laws | The legal regulations and statutes that are specific to a particular country or region, which content must comply with during localization. |
| Cultural Barriers | Differences in customs, beliefs, values, and social norms between different cultures that can impede understanding or acceptance of content or products. |
| User Experience | The overall feeling and satisfaction a user has when interacting with a product, website, or service, influenced by factors like usability, accessibility, and cultural appropriateness. |
| Units of Measurement | Standard quantities used to express physical properties, such as length, weight, or volume, which often need to be converted to local standards (e.g., metric vs. imperial) for clarity. |
| Currency Conversion | The process of exchanging one currency for another, often necessary for financial transactions or displaying equivalent monetary values in different regions. |
| Date Formats | The various conventions used to represent dates, which differ across countries and can lead to misinterpretation if not localized (e.g., MM/DD/YY vs. DD/MM/YY). |
| Text Expansion | The phenomenon where translated text can increase in length compared to the source text, requiring flexible design to accommodate these variations. |
| Target Locale | The specific geographical region or cultural group for which a product or content is being adapted; localization aims to make the product feel native to this locale. |
| Engagement | The degree to which a target audience interacts with and connects to a message or product, which can be increased by tailoring marketing efforts to local expectations. |
| Globalization | The increasing interconnectedness and interdependence of the world's economies, cultures, and populations, which necessitates approaches like localization to effectively reach diverse markets. |
| Language Localization | A specific type of localization that focuses on adapting linguistic elements of a product or service, primarily involving translation, to ensure cultural and linguistic appropriateness for the target audience. |
| Subtitling | A method of presenting translated dialogue from audio or video content as text displayed at the bottom of the screen, allowing the original audio to remain audible. |
| Dubbing | The process of replacing the original voice-overs in audio or video content with translated dialogue performed by new voice actors, synchronized with the on-screen action. |
| Writing Systems | The various methods used to represent language in written form, employing different scripts, characters (symbols, logograms, syllograms, letters), and writing directions (e.g., left-to-right, right-to-left, vertical). |
| Boustrophedon | A style of writing where lines alternate in direction, with one line proceeding from left to right and the next from right to left, resembling the path of an ox plowing a field. |
| Complex Text Layout | A feature in some languages where the shape of characters changes based on their context within a word or sentence, requiring advanced rendering capabilities. |
| Sorting Rules | Different algorithms and conventions used to order text alphabetically or lexicographically within various writing systems and languages, which can vary significantly. |
| Numeral System | A system for representing numbers, which can differ across languages and cultures, including the symbols used and the base of the system (e.g., decimal, binary). |
| Pluralization | The grammatical process of forming the plural of a noun, which varies in complexity and rules across different languages, often requiring specific attention during translation. |
| Guillemets | A type of quotation mark, specifically angle quotes (`« »`), commonly used in French and other European languages, serving a similar purpose to double quotes in English. |
| Economic Conventions | Variations in common practices and standards related to economic aspects across different countries, such as paper sizes, preferred storage media, currency formats, and measurement systems. |
| Text Layout | Encompasses the complex arrangement of characters in a language, where their shapes might change based on context, and includes considerations for capitalization rules, text sorting, and the use of punctuation, which can differ significantly across languages. |
| Numeral Systems | Refers to the distinct sets of numbers and their associated symbols used by different languages and cultures, requiring translators to be aware of these variations during the localization process. |
| Grammatical Rules | Pertains to the specific structures and conventions of a language, including variations in pluralization and other grammatical elements, which necessitate careful attention to detail to ensure accurate and natural-sounding translations. |
| Time Zones | Refers to the different standard times observed across the world, which translators must carefully consider to ensure that temporal information in localized content is accurate and relevant to the target audience's location. |
| Legal Requirements | Pertains to the laws and regulations specific to a country or region that may necessitate customization or complete alteration of a product to ensure compliance, including privacy laws, disclaimers, labeling, encryption regulations, censorship, accessibility standards, and tax collection procedures. |
| Political Issues | Involves considerations related to sensitive political matters such as disputed borders and geographical naming disputes, which require careful handling to avoid offense or misrepresentation in localized content. |
| Local Customs | Encompasses the established practices, traditions, superstitions, religious beliefs, and social taboos of a particular community or region, which must be understood and respected during the localization process to ensure cultural appropriateness. |
| Personal Name Conventions | Refers to the established practices for naming individuals within a culture, including title conventions and the structure of personal names, which need to be accurately represented in localized content. |
| Unicode Standard | A universal character encoding standard that ensures compatibility with a wide range of writing systems, enabling the representation of diverse languages on a website. |
| Separation of Content and Code | A design principle where website content is kept distinct from the underlying source code, simplifying the translation process and reducing the need for extensive modifications to the code itself. |
| Flexible User Interface (UI) | A website interface designed to accommodate variations in text length and support languages with different reading directions (e.g., left-to-right or right-to-left). |
| Translation of Content | The act of converting textual and multimedia elements of a website into the language of the target audience, paying close attention to linguistic accuracy and cultural appropriateness. |
| Adaptation of Graphics and Multimedia | Ensuring that visual elements such as images and videos are culturally suitable for the target audience and effectively communicate the intended message without causing offense. |
| Adjustment of Layout and Design | Modifying the visual arrangement and aesthetic elements of a website to accommodate differences in text length, font styles, and other language-specific requirements. |
| Integration of Local Regulations | The process of ensuring that a website complies with the legal requirements, privacy policies, and accessibility standards specific to the target region or country. |
| Dynamic Content | Website content that is generated or updated in real-time, presenting a unique challenge for localization due to the need for continuous adaptation and consistency. |
| SEO Considerations | The practice of optimizing localized website elements, such as metadata, keywords, and tags, to improve search engine visibility and ranking within specific geographic regions. |
| Cultural Sensitivity | The careful consideration of cultural nuances, customs, and values when localizing a website to prevent misunderstandings, avoid unintentional offense, and foster positive user experiences. |
| Search Engine Optimization (SEO) | The practice of optimizing website content and structure to improve its visibility and ranking in search engine results pages, thereby increasing organic traffic. |
| Website Localization | The process of adapting a website's content, design, and functionality to a specific target market or region, considering linguistic and cultural differences to enhance user experience and relevance. |
| Keyword Research | The process of identifying and analyzing search terms and phrases that potential customers use when looking for products or services related to a business, crucial for targeting the right audience in international markets. |
| Cultural Nuances | Subtle differences in customs, traditions, values, and social behaviors that are specific to a particular culture, which must be understood and respected when translating content for international audiences. |
| Localized Content | Website material that has been translated and adapted to suit the linguistic, cultural, and market-specific preferences of a target audience, ensuring it resonates effectively and feels natural. |
| Meta Titles | The HTML title element that appears in search engine results pages and browser tabs, serving as a brief description of a page's content and a key factor in SEO. |
| Meta Descriptions | A short summary of a webpage's content that appears in search engine results, designed to entice users to click through to the page by accurately reflecting its content and including relevant keywords. |
| URL Slugs | The part of a URL that identifies a particular page on a website in human-readable form, often including keywords to improve SEO and user understanding. |
| Multilingual Link Building | The strategic process of acquiring backlinks from reputable websites in various languages, which helps to improve a website's authority and search engine rankings across different international markets. |
| Mobile Optimization | The process of ensuring that a website's content and design are effectively displayed and function well on mobile devices, which is critical for SEO given the prevalence of mobile search. |
| Search Engine Algorithms | The complex sets of rules and calculations used by search engines to determine the ranking of websites in search results, which are constantly updated and require ongoing SEO strategy adjustments. |
| Website Analytics | The process of collecting, measuring, analyzing, and reporting website data to understand user behavior and website performance, essential for assessing the effectiveness of SEO strategies. |
| Search Engine Visibility | The degree to which a webpage is discoverable and appears in relevant search engine results for a given query, which is enhanced by accurately translated and optimized metadata for foreign markets. |
| User Click-Through Rates | The percentage of users who click on a specific link in search engine results or on a webpage, which can be significantly influenced by compelling and culturally relevant translated meta titles and descriptions. |
| Local Relevance | The degree to which webpage content, particularly metadata, aligns with the cultural nuances, preferences, and expectations of a specific target audience in a foreign market, making it more appealing and relevant. |
| Keyword Optimization | The process of strategically incorporating relevant keywords, including region-specific terms, into metadata to improve a webpage's ranking in search engine results for targeted queries in a foreign language. |
| Global Brand Consistency | The maintenance of a unified brand tone, message, and identity across all international markets, ensured by professional translators who align translated metadata with the overall brand strategy. |
| Adherence to Character Limits | The practice of ensuring that translated metadata, such as meta titles and descriptions, fits within the specific character constraints imposed by search engines to prevent truncation and maintain readability in search results. |
| Credibility and Trust | The perception of a website's reliability and authenticity by users, which can be positively impacted by accurate and professional translations of metadata, fostering confidence in foreign markets. |
| Adaptation to Market Trends | The continuous process of updating and refining translated metadata to reflect current linguistic, cultural, and market shifts, ensuring sustained optimization and relevance for foreign audiences. |
| Pre-editing | The process of revising technical documentation before it is processed by a Machine Translation (MT) engine, with the goal of improving the source text to enhance the quality of the raw output. Effective pre-editing can significantly reduce or even eliminate the subsequent post-editing workload. |
| Post-editing | The process of revising the output generated by a Machine Translation (MT) system to improve its quality, accuracy, and fluency, making it suitable for its intended purpose. |
| Source Text | The original document or text that is intended to be translated into another language. |
| Raw Output | The initial, unedited translation produced by a Machine Translation (MT) system, which typically requires further human review and correction. |
| Term Consistency | The practice of using the same translation for a specific term throughout a document or a set of related documents to ensure uniformity and clarity. |
| Automated Revision Tools | Software applications designed to assist in the review and correction of text, such as spell checkers, grammar checkers, and tools that verify term consistency against a glossary. |
| Project-Specific Glossary | A curated list of terms and their approved translations relevant to a particular project, used to ensure consistency and accuracy in both pre-editing and translation. |
| Controlled Natural Language (CNL) | A subset of a natural language that is created by imposing restrictions on its grammar and vocabulary to minimize or eradicate ambiguity and complexity, facilitating both human comprehension and reliable automatic semantic analysis. |
| Simplified Technical English | A type of controlled language, often referred to as "simplified" or "technical" languages, used in industries to elevate the quality of technical documentation and potentially streamline the process of semi-automatic translation by adhering to specific writing rules. |
| Caterpillar Fundamental English | A restricted vocabulary of approximately 850 words developed by Caterpillar Inc. to ensure consistency and high quality in the authoring and translation of technical documents for their complex products across various target languages. |
| CLOUT™ rule set | An acronym for Controlled Language Optimized for Uniform Translation, this is a collection of grammar rules developed to reduce ambiguities in texts across many languages, making them more suitable for machine translation. |
| Ambiguity | The quality of being open to more than one interpretation; inexactness, which controlled natural languages aim to eliminate to improve clarity and machine processability. |
| Controlled Language | A language that adheres to a specific set of rules designed to reduce ambiguity and improve clarity, particularly for machine translation. |
| Machine Translation | The use of computer software to translate text or speech from one language to another automatically. Controlled languages are optimized to enhance the accuracy of machine translation. |
| Sentence Length Rule | A guideline within controlled language writing that recommends keeping sentences below a certain word count (e.g., 25 words) to enhance readability and comprehension. |
| Single Idea Rule | A principle in controlled language writing that advocates for sentences to express only one distinct thought or concept, preventing complex sentence structures that can lead to misinterpretation. |
| Grammatically Complete Sentences | Sentences that contain all the necessary components to form a complete thought and follow the established rules of grammar, avoiding fragments or incomplete statements. |
| Simple Grammatical Structure | The use of straightforward sentence construction, typically involving a clear subject-verb-object order, to make texts easier to understand and process, especially for non-native speakers or machines. |
| Active Voice | A grammatical construction where the subject of the sentence performs the action of the verb, as opposed to the passive voice where the subject receives the action. Active voice generally leads to clearer and more direct communication. |
| Noun Repetition | A rule in controlled languages that encourages repeating a noun rather than using a pronoun to refer to it, thereby eliminating potential confusion about the antecedent. |
| Article Usage | The practice of employing articles (e.g., "a," "an," "the") to clearly identify nouns, which helps in specifying whether a noun is general or specific, thus reducing ambiguity. |
| General Dictionary Words | The use of vocabulary that is commonly understood and found in standard dictionaries, avoiding specialized jargon, slang, or obscure terms that might not be universally recognized. |
| Post-editor | A person who performs the task of post-editing, correcting machine translation output to meet agreed-upon quality standards. |
| Revision | The process of improving human-generated text, often referred to as editing in the field of translation, and distinct from post-editing. |
| Light Post-editing | A type of post-editing that focuses on making machine translation output simply understandable, requiring minimal intervention to convey the core meaning. |
| Full Post-editing | A comprehensive approach to post-editing that aims to make machine translation output not only understandable but also stylistically appropriate for various uses. |
| Computer-Assisted Translation (CAT) Tools | Software applications designed to assist human translators in the translation process, many of which now support the post-editing of machine translated output. |
| Raw Machine Translation | The direct output from a machine translation engine without any subsequent human correction or refinement. |
| Assimilation | In the context of post-editing, refers to the process of integrating translated content for internal understanding or use within an organization. |
| Dissemination | In the context of post-editing, refers to the process of distributing translated content to a wider audience or for external communication. |
| Raw MT output | The initial, unedited text generated by a machine translation system, which serves as the basis for post-editing. |
| Publishable quality | A high standard of translation quality, equivalent to that produced by a human translator and subsequently revised, suitable for immediate publication without further review. |
| Good enough quality | A lower standard of translation quality that ensures the message is comprehensible and accurate, but may not be stylistically perfect or sound entirely natural, often referred to as "fit for purpose." |
| Semantically correct translation | A translation where the meaning of the source text is accurately conveyed, ensuring that the intended message is understood without distortion or misinterpretation. |
| Stylistically compelling | Refers to text that is not only accurate and comprehensible but also possesses a natural flow, engaging tone, and appropriate linguistic nuances, similar to well-written human prose. |