Skip to content

API Reference

Welcome to the API reference for EsTranslator. This section provides detailed information about the classes, methods, and attributes available in the EsTranslator library.

Core

es_translator.EsTranslator

EsTranslator(options)

Orchestrates translation of Elasticsearch documents.

Manages the translation workflow including searching for documents, parallel translation using worker pools, and updating translated documents.

Attributes:

Name Type Description
url

Elasticsearch URL.

index

Index name to search and update.

source_language

Source language code.

target_language

Target language code.

intermediary_language

Optional intermediary language for indirect translation.

source_field

Field name containing source text.

target_field

Field name to store translated text.

query_string

Optional Elasticsearch query string.

data_dir

Directory for storing interpreter data.

scan_scroll

Scroll timeout for search.

dry_run

If True, skip saving translated documents.

force

Force re-translation of already translated documents.

pool_size

Number of parallel worker processes.

pool_timeout

Timeout for worker pool operations.

throttle

Throttle for rate limiting.

progressbar

Show progress bar during translation.

interpreter_name

Name of translation interpreter to use.

max_content_length

Maximum content length to translate (-1 for unlimited).

plan

If True, queue translations for later execution.

interpreter

Instantiated interpreter instance.

Parameters:

Name Type Description Default
options dict[str, Any]

Dictionary of configuration options.

required
Source code in es_translator/es_translator.py
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
def __init__(self, options: dict[str, Any]) -> None:
    """Initialize the Elasticsearch translator.

    Args:
        options: Dictionary of configuration options.
    """
    self.url = options['url']
    self.index = options['index']
    self.source_language = options['source_language']
    self.target_language = options['target_language']
    self.intermediary_language = options['intermediary_language']
    self.source_field = options['source_field']
    self.target_field = options['target_field']
    self.query_string = options['query_string']
    self.data_dir = options['data_dir']
    self.scan_scroll = options['scan_scroll']
    self.dry_run = options.get('dry_run', False)
    self.force = options['force']
    self.pool_size = options['pool_size']
    self.pool_timeout = options['pool_timeout']
    self.throttle = options['throttle']
    self.progressbar = options.get('progressbar', False)
    self.interpreter_name = options['interpreter']
    self.max_content_length = options.get('max_content_length', -1)
    self.plan = options.get('plan', False)
    self.device = options.get('device', 'auto')

no_progressbar property

no_progressbar

Check if the progressbar option is set to False.

Returns:

Name Type Description
bool bool

True if the progressbar option is False, else False.

options property

options

Get configuration options as a dictionary.

Returns:

Type Description
dict[str, Any]

Dictionary containing all configuration options.

search_source property

search_source

Gets the list of fields to use in the search.

Returns:

Type Description
list[str]

List[str]: list of fields to use in the search.

stdout_loglevel property

stdout_loglevel

Gets the log level of stdout.

Returns:

Name Type Description
int int

The log level of stdout.

configure_search()

Configures the search object.

Returns:

Name Type Description
Search Search

A configured search object.

Source code in es_translator/es_translator.py
176
177
178
179
180
181
182
183
184
185
def configure_search(self) -> Search:
    """Configures the search object.

    Returns:
        Search: A configured search object.
    """
    search = self.search()
    search = search.source(self.search_source)
    search = search.params(scroll=self.scan_scroll, size=self.pool_size)
    return search

create_client

create_client()

Create an Elasticsearch client instance.

Returns:

Type Description
Elasticsearch

Configured Elasticsearch client.

Source code in es_translator/es_translator.py
401
402
403
404
405
406
407
def create_client(self) -> Elasticsearch:
    """Create an Elasticsearch client instance.

    Returns:
        Configured Elasticsearch client.
    """
    return Elasticsearch(self.url)

create_translated_hit

create_translated_hit(hit)

Create a TranslatedHit wrapper for a document hit.

Parameters:

Name Type Description Default
hit ObjectBase

Document hit object.

required

Returns:

Type Description
TranslatedHit

TranslatedHit instance ready for translation.

Source code in es_translator/es_translator.py
390
391
392
393
394
395
396
397
398
399
def create_translated_hit(self, hit: ObjectBase) -> TranslatedHit:
    """Create a TranslatedHit wrapper for a document hit.

    Args:
        hit: Document hit object.

    Returns:
        TranslatedHit instance ready for translation.
    """
    return TranslatedHit(hit, self.source_field, self.target_field, self.force)

create_translation_queue

create_translation_queue()

Creates a queue that can translate documents in parallel.

Returns:

Name Type Description
JoinableQueue JoinableQueue

A queue for parallel document translation.

Source code in es_translator/es_translator.py
198
199
200
201
202
203
204
def create_translation_queue(self) -> JoinableQueue:
    """Creates a queue that can translate documents in parallel.

    Returns:
        JoinableQueue: A queue for parallel document translation.
    """
    return JoinableQueue(self.pool_size)

find_document

find_document(params)

Find a document by ID and routing.

Parameters:

Name Type Description Default
params dict[str, str]

Dictionary containing 'index', 'id', and optionally 'routing'.

required

Returns:

Type Description
Document

The found Document object.

Source code in es_translator/es_translator.py
216
217
218
219
220
221
222
223
224
225
226
227
def find_document(self, params: dict[str, str]) -> Document:
    """Find a document by ID and routing.

    Args:
        params: Dictionary containing 'index', 'id', and optionally 'routing'.

    Returns:
        The found Document object.
    """
    using = self.create_client()
    routing = getattr(params, 'routing', params['id'])
    return Document.get(index=params['index'], id=params['id'], routing=routing, using=using)

init_interpreter

init_interpreter()

Initializes the interpreter.

Returns:

Name Type Description
Any Any

The initialized interpreter.

Source code in es_translator/es_translator.py
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
def init_interpreter(self) -> Any:
    """Initializes the interpreter.

    Returns:
        Any: The initialized interpreter.
    """
    pack_dir = path.join(self.data_dir, 'packs', self.interpreter_name)
    interpreters = (
        Apertium,
        Argos,
    )
    Interpreter = next(i for i in interpreters if i.name.lower() == self.interpreter_name.lower())
    # Pass device option only to Argos (Apertium doesn't support GPU)
    if Interpreter == Argos:
        return Interpreter(
            self.source_language, self.target_language, self.intermediary_language, pack_dir, self.device
        )
    return Interpreter(self.source_language, self.target_language, self.intermediary_language, pack_dir)

instantiate_interpreter

instantiate_interpreter()

Instantiates the interpreter.

Returns:

Name Type Description
Any Any

An instance of the interpreter.

Source code in es_translator/es_translator.py
165
166
167
168
169
170
171
172
173
174
def instantiate_interpreter(self) -> Any:
    """Instantiates the interpreter.

    Returns:
        Any: An instance of the interpreter.
    """
    if not hasattr(self, 'interpreter'):
        with self.print_done(f'Instantiating {self.interpreter_name} interpreter'):
            self.interpreter = self.init_interpreter()
    return self.interpreter

print_done

print_done(string)

Print progress message and yield, showing done/error status.

Parameters:

Name Type Description Default
string str

The status message to be printed.

required

Returns:

Type Description
None

Generator for wrapping operations with status output.

Source code in es_translator/es_translator.py
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
@contextmanager
def print_done(self, string: str) -> Generator[None, None, None]:
    """Print progress message and yield, showing done/error status.

    Args:
        string: The status message to be printed.

    Returns:
        Generator for wrapping operations with status output.
    """
    logger.info(string)
    if self.stdout_loglevel > 20:
        string = f'\r{string}...'
        self.print_flush(string)
        try:
            yield
            print(f'{string} \033[92mdone\033[0m')
        except (FatalTranslationException, ElasticsearchException, Full) as error:
            logger.error(error, exc_info=True)
            print(f'{string} \033[91merror\033[0m')
            sys.exit(1)
    else:
        yield

print_flush

print_flush(string)

Print and flush a string to stdout.

Parameters:

Name Type Description Default
string str

The string to be printed.

required
Source code in es_translator/es_translator.py
344
345
346
347
348
349
350
351
def print_flush(self, string: str) -> None:
    """Print and flush a string to stdout.

    Args:
        string: The string to be printed.
    """
    sys.stdout.write(f'\r{string}')
    sys.stdout.flush()

process_document

process_document(
    translation_queue,
    hit,
    progress,
    task,
    shared_fatal_error,
)

Processes a document.

Parameters:

Name Type Description Default
translation_queue JoinableQueue

A queue for parallel document translation.

required
hit Any

The document to be translated.

required
index int

The index of the document.

required
progress Progress

A progress object.

required
task Any

The current task.

required
shared_fatal_error Manager

A shared manager for fatal errors.

required
Source code in es_translator/es_translator.py
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
def process_document(
    self, translation_queue: JoinableQueue, hit: Any, progress: Progress, task: Any, shared_fatal_error: Manager
) -> None:
    """Processes a document.

    Args:
        translation_queue (JoinableQueue): A queue for parallel document translation.
        hit (Any): The document to be translated.
        index (int): The index of the document.
        progress (Progress): A progress object.
        task (Any): The current task.
        shared_fatal_error (Manager): A shared manager for fatal errors.
    """
    translation_queue.put((self, hit), True, self.pool_timeout)
    progress.advance(task)
    if shared_fatal_error.value:
        raise FatalTranslationException(shared_fatal_error.value)

search

search()

Executes a search query.

Returns:

Name Type Description
Search Search

The search result.

Source code in es_translator/es_translator.py
313
314
315
316
317
318
319
320
321
322
323
def search(self) -> Search:
    """Executes a search query.

    Returns:
        Search: The search result.
    """
    using = self.create_client()
    search = Search(index=self.index, using=using)
    if self.query_string:
        search = search.query('query_string', query=self.query_string)
    return search

start

start()

Starts or plans the translation process.

Source code in es_translator/es_translator.py
102
103
104
105
106
107
def start(self) -> None:
    """Starts or plans the translation process."""
    if self.plan:
        self.start_later()
    else:
        self.start_now()

start_later

start_later()

Queue translation tasks for later execution via Celery.

Source code in es_translator/es_translator.py
124
125
126
127
128
129
130
131
132
133
134
def start_later(self) -> None:
    """Queue translation tasks for later execution via Celery."""
    self.instantiate_interpreter()
    total = self.search().count()
    plural = 's' if total != 1 else ''
    desc = f'Planning translation for {total} document{plural}'
    with self.print_done(desc):
        search = self.configure_search()
        for hit in search.scan():
            logger.info(f'Planned translation for doc {hit.meta.id}')
            translate_document_task.delay(self.options, hit.meta.to_dict())

start_now

start_now()

Start the translation process immediately.

Source code in es_translator/es_translator.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
def start_now(self) -> None:
    """Start the translation process immediately."""
    self.instantiate_interpreter()
    total = self.search().count()
    desc = f'Translating {total} document(s)'
    with self.print_done(desc):
        search = self.configure_search()
        if self.pool_size == 1:
            # Direct translation without multiprocessing (better for GPU)
            self.translate_documents_direct(search, total)
        else:
            translation_queue = self.create_translation_queue()
            with self.with_shared_fatal_error() as shared_fatal_error:
                self.translate_documents_in_pool(search, translation_queue, shared_fatal_error, total)

translate_document

translate_document(hit)

Translate a single document.

Parameters:

Name Type Description Default
hit ObjectBase

Document hit object to translate.

required
Source code in es_translator/es_translator.py
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
def translate_document(self, hit: ObjectBase) -> None:
    """Translate a single document.

    Args:
        hit: Document hit object to translate.
    """
    self.instantiate_interpreter()
    # Translate the document
    logger.info(f'Translating doc {hit.meta.id}')
    translated_hit = self.create_translated_hit(hit)
    translated_hit.add_translation(self.interpreter, max_content_length=self.max_content_length)
    logger.info(f'Translated doc {hit.meta.id}')
    # Save the translated document if not in dry run mode
    if not self.dry_run:
        translated_hit.save(self.create_client())
        logger.info(f'Saved translation for doc {hit.meta.id}')

translate_documents_direct

translate_documents_direct(search, total)

Translate documents directly without multiprocessing.

Used when pool_size=1 for simpler execution and better GPU compatibility.

Parameters:

Name Type Description Default
search Search

A search object.

required
total int

The total number of documents.

required
Source code in es_translator/es_translator.py
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
def translate_documents_direct(self, search: Search, total: int) -> None:
    """Translate documents directly without multiprocessing.

    Used when pool_size=1 for simpler execution and better GPU compatibility.

    Args:
        search: A search object.
        total: The total number of documents.
    """
    from time import sleep

    with Progress(disable=self.no_progressbar, transient=True) as progress:
        plural = 's' if total != 1 else ''
        task = progress.add_task(f'Translating {total} document{plural}', total=total)
        for hit in search.scan():
            try:
                self.translate_document(hit)
                sleep(self.throttle / 1000)
            except ElasticsearchException as error:
                logger.error(f'An error occurred when saving doc {hit.meta.id}')
                logger.error(error)
                raise FatalTranslationException(error)
            except Exception as error:
                logger.warning(f'Unable to translate doc {hit.meta.id}')
                logger.warning(error)
            finally:
                progress.advance(task)

translate_documents_in_pool

translate_documents_in_pool(
    search, translation_queue, shared_fatal_error, total
)

Translates documents using multiprocessing pool.

Parameters:

Name Type Description Default
search Search

A search object.

required
translation_queue JoinableQueue

A queue for parallel document translation.

required
shared_fatal_error Manager

A shared manager for fatal errors.

required
total int

The total number of documents.

required
Source code in es_translator/es_translator.py
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
def translate_documents_in_pool(
    self, search: Search, translation_queue: JoinableQueue, shared_fatal_error: Manager, total: int
) -> None:
    """Translates documents using multiprocessing pool.

    Args:
        search (Search): A search object.
        translation_queue (JoinableQueue): A queue for parallel document translation.
        shared_fatal_error (Manager): A shared manager for fatal errors.
        total (int): The total number of documents.
    """
    with (
        Pool(self.pool_size, translation_worker, (translation_queue, shared_fatal_error)),
        Progress(disable=self.no_progressbar, transient=True) as progress,
    ):
        plural = 's' if total != 1 else ''
        task = progress.add_task(f'Translating {total} document{plural}', total=total)
        for hit in search.scan():
            self.process_document(translation_queue, hit, progress, task, shared_fatal_error)
        translation_queue.join()

with_shared_fatal_error

with_shared_fatal_error()

Creates a context manager for managing shared fatal errors.

Returns:

Type Description
None

Generator yielding a shared manager value.

Source code in es_translator/es_translator.py
206
207
208
209
210
211
212
213
214
@contextmanager
def with_shared_fatal_error(self) -> Generator[Any, None, None]:
    """Creates a context manager for managing shared fatal errors.

    Returns:
        Generator yielding a shared manager value.
    """
    with Manager() as manager:
        yield manager.Value('b', None)

Interpreters

EsTranslator supports multiple translation backends (interpreters). Each interpreter has its own strengths and supported language pairs.

Argos

Argos Translate is a neural machine translation library that provides high-quality translations using offline models.

es_translator.interpreters.Argos

Argos(
    source=None,
    target=None,
    intermediary=None,
    pack_dir=None,
    device=None,
)

Bases: AbstractInterpreter

Argos translation interpreter using argostranslate.

This class handles translation tasks using the Argos neural machine translation engine. Note that Argos does not support intermediary languages or custom package directories.

Attributes:

Name Type Description
name

Identifier for this interpreter ('ARGOS').

Parameters:

Name Type Description Default
source Optional[str]

Source language code.

None
target Optional[str]

Target language code.

None
intermediary Optional[str]

Intermediary language code (not supported, will warn if provided).

None
pack_dir Optional[str]

Directory for language packs (not supported, will warn if provided).

None
device Optional[str]

Device for translation ('cpu', 'cuda', or 'auto'). Defaults to config value.

None

Raises:

Type Description
Exception

If the necessary language pair is not available.

Source code in es_translator/interpreters/argos/argos.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
def __init__(
    self,
    source: Optional[str] = None,
    target: Optional[str] = None,
    intermediary: Optional[str] = None,
    pack_dir: Optional[str] = None,
    device: Optional[str] = None,
) -> None:
    """Initialize the Argos interpreter.

    Args:
        source: Source language code.
        target: Target language code.
        intermediary: Intermediary language code (not supported, will warn if provided).
        pack_dir: Directory for language packs (not supported, will warn if provided).
        device: Device for translation ('cpu', 'cuda', or 'auto'). Defaults to config value.

    Raises:
        Exception: If the necessary language pair is not available.
    """
    super().__init__(source, target)
    # Configure device BEFORE any argostranslate imports
    self._device_preference = device or DEFAULT_DEVICE
    self._device_configured = False
    # Raise an exception if an intermediary language is provided
    if intermediary is not None:
        logger.warning('Argos interpreter does not support intermediary language')
    if pack_dir is not None:
        logger.warning('Argos interpreter does not support custom pack directory')
    # Check pair availability - this will trigger argostranslate import
    # so we configure device first
    self._ensure_device_configured()
    if not self.is_pair_available and self.has_pair:
        try:
            self.download_necessary_languages()
        except ArgosPairNotAvailable:
            raise Exception(f'The pair {self.pair} is not available')
    else:
        logger.info(f'Existing package(s) found for pair {self.pair}')

is_pair_available property

is_pair_available

Check if the necessary language pair is available in installed packages.

Returns:

Type Description
bool

True if the language pair is available, False otherwise.

local_languages property

local_languages

Get the codes for the installed languages.

Returns:

Type Description
list[str]

List of installed language codes. Returns empty list if languages cannot be retrieved.

translation property

translation

Get Translation object for the source and target languages.

Returns:

Type Description
Any

Translation object configured for source to target language.

Raises:

Type Description
IndexError

If either the source or target language is not installed.

download_and_install_package

download_and_install_package(package)

Download and install a language package.

Uses file locking to prevent concurrent downloads of the same package. Skips installation if the package is already installed.

Parameters:

Name Type Description Default
package Any

The package to download and install.

required

Returns:

Type Description
Optional[Any]

Installation result or None if package was already installed.

Raises:

Type Description
ArgosPackageDownloadLockTimeout

If lock cannot be acquired within timeout.

Source code in es_translator/interpreters/argos/argos.py
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
def download_and_install_package(self, package: Any) -> Optional[Any]:
    """Download and install a language package.

    Uses file locking to prevent concurrent downloads of the same package.
    Skips installation if the package is already installed.

    Args:
        package: The package to download and install.

    Returns:
        Installation result or None if package was already installed.

    Raises:
        ArgosPackageDownloadLockTimeout: If lock cannot be acquired within timeout.
    """
    argospackage = _get_argos_package()
    try:
        temp_dir = Path(tempfile.gettempdir())
        lock_path = temp_dir / f'{package.from_code}_{package.to_code}.lock'

        with FileLock(lock_path, timeout=600).acquire(timeout=600):
            if self.is_package_installed(package):
                return None
            download_path = package.download()
            logger.info(f'Installing Argos package {package}')
            return argospackage.install_from_path(download_path)
    except Timeout as exc:
        raise ArgosPackageDownloadLockTimeout(
            f'Another instance of the program is downloading the package {package}. Please try again later.'
        ) from exc

download_necessary_languages

download_necessary_languages()

Download necessary language packages if not installed.

Steps: 1. Updates the package index. 2. Finds the necessary package. 3. Downloads and installs the package.

Raises:

Type Description
ArgosPairNotAvailable

If the necessary language package could not be found.

ArgosPackageDownloadLockTimeout

If lock cannot be acquired within timeout.

Source code in es_translator/interpreters/argos/argos.py
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
def download_necessary_languages(self) -> None:
    """Download necessary language packages if not installed.

    Steps:
    1. Updates the package index.
    2. Finds the necessary package.
    3. Downloads and installs the package.

    Raises:
        ArgosPairNotAvailable: If the necessary language package could not be found.
        ArgosPackageDownloadLockTimeout: If lock cannot be acquired within timeout.
    """
    self.update_package_index()
    necessary_package = self.find_necessary_package()
    self.download_and_install_package(necessary_package)

find_necessary_package

find_necessary_package()

Find the necessary language package.

Searches available packages for one matching the source and target languages.

Returns:

Type Description
Any

The necessary language package object.

Raises:

Type Description
ArgosPairNotAvailable

If the necessary language package could not be found.

Source code in es_translator/interpreters/argos/argos.py
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
def find_necessary_package(self) -> Any:
    """Find the necessary language package.

    Searches available packages for one matching the source and target languages.

    Returns:
        The necessary language package object.

    Raises:
        ArgosPairNotAvailable: If the necessary language package could not be found.
    """
    argospackage = _get_argos_package()
    for package in argospackage.get_available_packages():
        if package.from_code == self.source_alpha_2 and package.to_code == self.target_alpha_2:
            return package
    raise ArgosPairNotAvailable

is_package_installed

is_package_installed(package)

Check if a package is installed.

Parameters:

Name Type Description Default
package Any

The package to check.

required

Returns:

Type Description
bool

True if the package is installed, False otherwise.

Source code in es_translator/interpreters/argos/argos.py
190
191
192
193
194
195
196
197
198
199
200
def is_package_installed(self, package: Any) -> bool:
    """Check if a package is installed.

    Args:
        package: The package to check.

    Returns:
        True if the package is installed, False otherwise.
    """
    argospackage = _get_argos_package()
    return package in argospackage.get_installed_packages()

translate

translate(text_input)

Translate input text from source language to target language.

Parameters:

Name Type Description Default
text_input str

The input text in the source language.

required

Returns:

Type Description
str

The translated text in the target language.

Source code in es_translator/interpreters/argos/argos.py
265
266
267
268
269
270
271
272
273
274
275
276
def translate(self, text_input: str) -> str:
    """Translate input text from source language to target language.

    Args:
        text_input: The input text in the source language.

    Returns:
        The translated text in the target language.
    """
    # Always configure device before translation (needed for multiprocessing workers)
    self._ensure_device_configured()
    return self.translation.translate(text_input)

update_package_index

update_package_index()

Update the Argos package index to fetch latest available packages.

Source code in es_translator/interpreters/argos/argos.py
168
169
170
171
def update_package_index(self) -> None:
    """Update the Argos package index to fetch latest available packages."""
    argospackage = _get_argos_package()
    argospackage.update_package_index()

Apertium

Apertium is a rule-based machine translation platform that supports a wide variety of language pairs, especially for related languages.

es_translator.interpreters.Apertium

Apertium(
    source=None,
    target=None,
    intermediary=None,
    pack_dir=None,
)

Bases: AbstractInterpreter

Apertium translation interpreter.

Provides translation capabilities using the Apertium translation engine, with support for direct translation and intermediary language pairs.

Attributes:

Name Type Description
name

Identifier for this interpreter ('APERTIUM').

repository

ApertiumRepository instance for package management.

Parameters:

Name Type Description Default
source Optional[str]

Source language code.

None
target Optional[str]

Target language code.

None
intermediary Optional[str]

Optional intermediary language for indirect translation.

None
pack_dir Optional[str]

Directory for storing translation packages.

None

Raises:

Type Description
Exception

If the language pair is not available in the repository.

Source code in es_translator/interpreters/apertium/apertium.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
def __init__(
    self,
    source: Optional[str] = None,
    target: Optional[str] = None,
    intermediary: Optional[str] = None,
    pack_dir: Optional[str] = None,
) -> None:
    """Initialize the Apertium interpreter.

    Args:
        source: Source language code.
        target: Target language code.
        intermediary: Optional intermediary language for indirect translation.
        pack_dir: Directory for storing translation packages.

    Raises:
        Exception: If the language pair is not available in the repository.
    """
    super().__init__(source, target, intermediary, pack_dir)
    # A class to download necessary pair package
    self.repository = ApertiumRepository(self.pack_dir)
    # Raise an exception if the language pair is unknown
    # Note: has_pair must be checked first to avoid accessing local_pairs when no pair is set
    if self.has_pair and not self.is_pair_available:
        try:
            self.download_necessary_pairs()
        except StopIteration:
            raise Exception('The pair is not available')
    else:
        logger.info(f'Existing package(s) found for pair {self.pair}')

any_pair_variant_in_packages property

any_pair_variant_in_packages

Check if any variant of the current pair exists in packages.

Returns:

Type Description
bool

True if the pair is available in remote repository.

intermediary_pairs property

intermediary_pairs

Get intermediary language pairs for indirect translation.

Automatically finds an intermediary language if not specified by building a language tree and finding a path from source to target.

Returns:

Type Description
list[str]

List of two language pair strings for indirect translation.

intermediary_source_pair property

intermediary_source_pair

Get source-to-intermediary language pair.

Returns:

Type Description
str

Language pair string (e.g., 'eng-spa').

intermediary_source_pair_package property

intermediary_source_pair_package

Get package name for source-to-intermediary pair.

Returns:

Type Description
Optional[str]

Package name string or None if not found.

intermediary_target_pair property

intermediary_target_pair

Get intermediary-to-target language pair.

Returns:

Type Description
str

Language pair string (e.g., 'spa-fra').

intermediary_target_pair_package property

intermediary_target_pair_package

Get package name for intermediary-to-target pair.

Returns:

Type Description
Optional[str]

Package name string or None if not found.

is_pair_available property

is_pair_available

Check if the language pair is available locally.

Returns:

Type Description
bool

True if pair is available without intermediary.

local_pairs property

local_pairs

Get locally installed language pairs.

Returns:

Type Description
list[str]

List of locally available language pair codes.

pair_package property

pair_package

Get the package name for the current language pair.

Returns:

Type Description
Optional[str]

Package name string or None if not found.

pairs_pipeline property

pairs_pipeline

Get the translation pipeline (direct or via intermediary).

Returns:

Type Description
list[str]

List of language pair codes to process sequentially.

remote_pairs cached property

remote_pairs

Get remotely available language pairs from repository.

Returns:

Type Description
list[str]

List of available language pair codes from the repository.

download_intermediary_pairs

download_intermediary_pairs()

Download both intermediary language pairs for indirect translation.

Source code in es_translator/interpreters/apertium/apertium.py
260
261
262
263
def download_intermediary_pairs(self) -> None:
    """Download both intermediary language pairs for indirect translation."""
    for pair in self.intermediary_pairs:
        self.download_pair(pair)

download_necessary_pairs

download_necessary_pairs()

Download required language pair packages.

Downloads either a direct pair or intermediary pairs depending on availability in the repository.

Source code in es_translator/interpreters/apertium/apertium.py
226
227
228
229
230
231
232
233
234
235
236
def download_necessary_pairs(self) -> None:
    """Download required language pair packages.

    Downloads either a direct pair or intermediary pairs depending on
    availability in the repository.
    """
    logger.info(f'Downloading necessary package(s) for {self.pair}')
    if self.any_pair_variant_in_packages:
        self.download_pair()
    else:
        self.download_intermediary_pairs()

download_pair

download_pair(pair=None)

Download and install a specific language pair package.

Parameters:

Name Type Description Default
pair Optional[str]

Language pair to download. If None, uses current pair.

None

Returns:

Type Description
str

Path to the installed package directory.

Source code in es_translator/interpreters/apertium/apertium.py
238
239
240
241
242
243
244
245
246
247
248
249
def download_pair(self, pair: Optional[str] = None) -> str:
    """Download and install a specific language pair package.

    Args:
        pair: Language pair to download. If None, uses current pair.

    Returns:
        Path to the installed package directory.
    """
    pair = self.pair_alpha_3 if pair is None else to_alpha_3_pair(pair)
    # All commands must be run from the pack dir
    return self.repository.install_pair_package(pair)

first_pairs_path

first_pairs_path(leaf, lang)

Find the first path from a tree leaf to a target language.

Parameters:

Name Type Description Default
leaf dict

Tree node dictionary with 'lang' and 'children' keys.

required
lang str

Target language to find path to.

required

Returns:

Type Description
list[str]

List of language codes forming the path.

Source code in es_translator/interpreters/apertium/apertium.py
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
def first_pairs_path(self, leaf: dict, lang: str) -> list[str]:
    """Find the first path from a tree leaf to a target language.

    Args:
        leaf: Tree node dictionary with 'lang' and 'children' keys.
        lang: Target language to find path to.

    Returns:
        List of language codes forming the path.
    """
    path = []
    for child_leaf in leaf['children'].values():
        if self.leaf_has_lang(child_leaf, lang):
            path.append(child_leaf['lang'])
            path = path + self.first_pairs_path(child_leaf, lang)
            break
    return path

lang_tree

lang_tree(lang, pairs, depth=2)

Build a tree of language connections from available pairs.

Parameters:

Name Type Description Default
lang str

Root language for the tree.

required
pairs list[list[str]]

List of language pair lists.

required
depth int

Maximum depth to traverse (default: 2).

2

Returns:

Type Description
dict

Dictionary tree structure with 'lang' and 'children' keys.

Source code in es_translator/interpreters/apertium/apertium.py
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
def lang_tree(self, lang: str, pairs: list[list[str]], depth: int = 2) -> dict:
    """Build a tree of language connections from available pairs.

    Args:
        lang: Root language for the tree.
        pairs: List of language pair lists.
        depth: Maximum depth to traverse (default: 2).

    Returns:
        Dictionary tree structure with 'lang' and 'children' keys.
    """
    tree = {'lang': lang, 'children': {}}
    for pair in pairs:
        if lang in pair and depth > 0:
            child_lang = next(item for item in pair if item != lang)
            tree['children'][child_lang] = self.lang_tree(child_lang, pairs, depth - 1)
    return tree

leaf_has_lang

leaf_has_lang(leaf, lang)

Check if a tree leaf contains or leads to a target language.

Parameters:

Name Type Description Default
leaf dict

Tree node dictionary with 'lang' and 'children' keys.

required
lang str

Target language to search for.

required

Returns:

Type Description
bool

True if the language is found in the leaf or its descendants.

Source code in es_translator/interpreters/apertium/apertium.py
301
302
303
304
305
306
307
308
309
310
311
312
def leaf_has_lang(self, leaf: dict, lang: str) -> bool:
    """Check if a tree leaf contains or leads to a target language.

    Args:
        leaf: Tree node dictionary with 'lang' and 'children' keys.
        lang: Target language to search for.

    Returns:
        True if the language is found in the leaf or its descendants.
    """
    children = leaf['children'].values()
    return lang in leaf['children'] or any(self.leaf_has_lang(child_leaf, lang) for child_leaf in children)

pair_to_pair_package

pair_to_pair_package(pair)

Convert language pair to package name.

Checks both the pair and its reverse for availability in remote packages.

Parameters:

Name Type Description Default
pair str

Language pair string (e.g., 'en-es').

required

Returns:

Type Description
Optional[str]

Package name if found, None otherwise.

Source code in es_translator/interpreters/apertium/apertium.py
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
def pair_to_pair_package(self, pair: str) -> Optional[str]:
    """Convert language pair to package name.

    Checks both the pair and its reverse for availability in remote packages.

    Args:
        pair: Language pair string (e.g., 'en-es').

    Returns:
        Package name if found, None otherwise.
    """
    pair_inversed = '-'.join(pair.split('-')[::-1])
    combinations = [to_alpha_3_pair(pair), to_alpha_3_pair(pair_inversed)]
    try:
        return next(p for p in self.remote_pairs if p in combinations)
    except StopIteration:
        return None

translate

translate(input)

Translate text through the translation pipeline.

If using an intermediary language, translates through multiple pairs.

Parameters:

Name Type Description Default
input str

Text to translate.

required

Returns:

Type Description
str

Translated text string.

Source code in es_translator/interpreters/apertium/apertium.py
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
def translate(self, input: str) -> str:
    """Translate text through the translation pipeline.

    If using an intermediary language, translates through multiple pairs.

    Args:
        input: Text to translate.

    Returns:
        Translated text string.
    """
    for pair in self.pairs_pipeline:
        # Create a sub-process which can receive an input
        input = self.translate_with_apertium(input, pair)
    return input

translate_with_apertium

translate_with_apertium(input, pair)

Translate text using Apertium for a specific language pair.

Parameters:

Name Type Description Default
input str

Text to translate.

required
pair str

Language pair code (e.g., 'eng-spa').

required

Returns:

Type Description
str

Translated text string.

Raises:

Type Description
Exception

If translation fails.

Source code in es_translator/interpreters/apertium/apertium.py
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
def translate_with_apertium(self, input: str, pair: str) -> str:
    """Translate text using Apertium for a specific language pair.

    Args:
        input: Text to translate.
        pair: Language pair code (e.g., 'eng-spa').

    Returns:
        Translated text string.

    Raises:
        Exception: If translation fails.
    """
    apertium = _get_apertium()
    try:
        # Works with a temporary file as buffer (opened in text mode)
        with NamedTemporaryFile(mode='w+t') as temp_input_file:
            temp_input_file.writelines(input)
            temp_input_file.seek(0)
            input_translated = apertium('-ud', self.pack_dir, pair, temp_input_file.name)
    except ErrorReturnCode:
        raise Exception('Unable to translate this string.')
    return str(input_translated)

Repository Management

es_translator.interpreters.apertium.repository.ApertiumRepository

ApertiumRepository(cache_dir=None, arch=None)

Manages Apertium package repository operations.

Handles downloading, extracting, and installing Apertium translation pairs from the official repository.

Attributes:

Name Type Description
cache_dir

Directory path for caching downloaded packages.

arch

System architecture ('amd64', 'i386', etc.).

Parameters:

Name Type Description Default
cache_dir Optional[str]

Directory for caching downloaded packages. Defaults to None.

None
arch Optional[str]

Architecture string ('amd64', 'i386'). If None, auto-detect.

None
Source code in es_translator/interpreters/apertium/repository.py
77
78
79
80
81
82
83
84
85
def __init__(self, cache_dir: Optional[str] = None, arch: Optional[str] = None):
    """Initialize the Apertium repository handler.

    Args:
        cache_dir: Directory for caching downloaded packages. Defaults to None.
        arch: Architecture string ('amd64', 'i386'). If None, auto-detect.
    """
    self.cache_dir = abspath(cache_dir) if cache_dir else abspath('.')
    self.arch = arch

control_file_content cached property

control_file_content

Fetch and cache the Packages control file content.

Returns:

Type Description
str

Decoded UTF-8 content of the Packages file.

Raises:

Type Description
URLError

If the URL cannot be accessed.

HTTPError

If HTTP request fails.

packages cached property

packages

Parse and cache the list of available packages.

Returns:

Type Description
list[dict]

List of package metadata dictionaries.

packages_file_url property

packages_file_url

Get the Packages file URL for the configured architecture.

pair_packages cached property

pair_packages

Get filtered list of Apertium translation pair packages.

Returns:

Type Description
list[dict]

List of package dictionaries that are translation pairs.

clear_modes

clear_modes()

Remove all mode files from the cache directory.

Source code in es_translator/interpreters/apertium/repository.py
375
376
377
378
def clear_modes(self) -> None:
    """Remove all mode files from the cache directory."""
    with pushd(self.cache_dir):
        rm('-Rf', 'modes')

create_pair_package_alias

create_pair_package_alias(package_dir)

Create symbolic links for alternative language code formats.

Creates aliases between ISO 639-1 (2-letter) and ISO 639-3 (3-letter) codes.

Parameters:

Name Type Description Default
package_dir str

Directory containing the extracted package.

required

Returns:

Type Description
str

Path to the created alias directory.

Source code in es_translator/interpreters/apertium/repository.py
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
def create_pair_package_alias(self, package_dir: str) -> str:
    """Create symbolic links for alternative language code formats.

    Creates aliases between ISO 639-1 (2-letter) and ISO 639-3 (3-letter) codes.

    Args:
        package_dir: Directory containing the extracted package.

    Returns:
        Path to the created alias directory.
    """
    extraction_dir = dirname(package_dir) + '/'
    source, target = basename(package_dir).split('apertium-')[-1].split('-')

    # Determine alias codes based on current format
    if len(source) == 2:
        aliases = (to_alpha_3(source), to_alpha_3(target))
    else:
        aliases = (to_alpha_2(source), to_alpha_2(target))

    # Build the alias dir using the alias codes
    alias_dir = join(extraction_dir, f'apertium-{aliases[0]}-{aliases[1]}')
    mode_file = join(extraction_dir, 'modes', f'{source}-{target}.mode')
    mode_alias_file = join(extraction_dir, 'modes', f'{aliases[0]}-{aliases[1]}.mode')

    # Use symbolic links for aliases
    create_symlink(package_dir, alias_dir)
    create_symlink(mode_file, mode_alias_file)

    return alias_dir

download_package

download_package(name, force=False)

Download a package from the repository.

Parameters:

Name Type Description Default
name str

Package name to download.

required
force bool

If True, re-download even if package already exists.

False

Returns:

Type Description
str

Path to the downloaded .deb file.

Raises:

Type Description
PackageNotFoundError

If package cannot be found.

Source code in es_translator/interpreters/apertium/repository.py
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
def download_package(self, name: str, force: bool = False) -> str:
    """Download a package from the repository.

    Args:
        name: Package name to download.
        force: If True, re-download even if package already exists.

    Returns:
        Path to the downloaded .deb file.

    Raises:
        PackageNotFoundError: If package cannot be found.
    """
    package = self.find_package(name)
    if package is None:
        raise PackageNotFoundError(name)

    package_dir = join(self.cache_dir, name)
    package_file = join(package_dir, 'package.deb')
    mkdir('-p', package_dir)

    # Don't download the file twice
    if force or not isfile(package_file):
        logger.info(f'Downloading package {name}')

        # Try the URL from Packages file first
        package_url = f'{REPOSITORY_URL}/{package["Filename"]}'
        try:
            request.urlretrieve(package_url, package_file)
        except (URLError, HTTPError, OSError) as e:
            # If that fails, try to find the latest version in the pool directory
            logger.warning(f'Failed to download from Packages file URL: {e}')
            package_url = self.find_latest_package_in_pool(name, package['Filename'])
            request.urlretrieve(package_url, package_file)

    return package_file

download_pair_package

download_pair_package(pair)

Download a translation pair package.

Parameters:

Name Type Description Default
pair str

Language pair in format 'source-target'.

required

Returns:

Type Description
str

Path to the downloaded .deb file.

Raises:

Type Description
PairPackageNotFoundError

If no pair package is available for the given languages.

Source code in es_translator/interpreters/apertium/repository.py
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
def download_pair_package(self, pair: str) -> str:
    """Download a translation pair package.

    Args:
        pair: Language pair in format 'source-target'.

    Returns:
        Path to the downloaded .deb file.

    Raises:
        PairPackageNotFoundError: If no pair package is available for the given languages.
    """
    pair_package = self.find_pair_package(pair)
    if pair_package is not None:
        return self.download_package(pair_package.get('Package'))
    else:
        raise PairPackageNotFoundError(pair)

extract_pair_package

extract_pair_package(file, extraction_dir='.')

Extract a translation pair .deb package.

Parameters:

Name Type Description Default
file str

Path to the .deb file to extract.

required
extraction_dir str

Directory to extract files into. Defaults to '.'.

'.'

Returns:

Type Description
str

Path to the working directory containing extracted files.

Source code in es_translator/interpreters/apertium/repository.py
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
def extract_pair_package(self, file: str, extraction_dir: str = '.') -> str:
    """Extract a translation pair .deb package.

    Args:
        file: Path to the .deb file to extract.
        extraction_dir: Directory to extract files into. Defaults to '.'.

    Returns:
        Path to the working directory containing extracted files.
    """
    workdir = dirname(file)
    with pushd(workdir):
        # Extract the file from the .deb
        dpkg_deb('-x', file, extraction_dir)
        # Copy the files we need
        cp('-rlf', glob('usr/share/apertium/*'), extraction_dir)
        # Remove everything else
        rm('-Rf', 'usr')
        # Rewrite paths in modes files to point to the working directory
        for mode in glob('modes/*.mode'):
            self.replace_in_file(mode, '/usr/share/apertium', workdir)
    return workdir

find_latest_package_in_pool

find_latest_package_in_pool(package_name, filename)

Find latest package version from pool directory.

Used as fallback when the Packages file lists an outdated version.

Parameters:

Name Type Description Default
package_name str

Name of the package to find.

required
filename str

Original filename from Packages file (used to determine pool directory).

required

Returns:

Type Description
str

URL of the latest package version found.

Raises:

Type Description
Exception

If no matching package is found in the pool directory.

URLError

If the pool directory cannot be accessed.

Source code in es_translator/interpreters/apertium/repository.py
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
def find_latest_package_in_pool(self, package_name: str, filename: str) -> str:
    """Find latest package version from pool directory.

    Used as fallback when the Packages file lists an outdated version.

    Args:
        package_name: Name of the package to find.
        filename: Original filename from Packages file (used to determine pool directory).

    Returns:
        URL of the latest package version found.

    Raises:
        Exception: If no matching package is found in the pool directory.
        URLError: If the pool directory cannot be accessed.
    """
    logger.info('Attempting to find latest version from pool directory')

    # Extract the pool directory path
    filename_parts = filename.split('/')
    pool_dir_url = f'{REPOSITORY_URL}/{"/".join(filename_parts[:-1])}/'

    try:
        # Fetch the directory listing
        response = request.urlopen(pool_dir_url)
        html_content = response.read().decode('utf-8')
    except (URLError, HTTPError) as e:
        logger.error(f'Failed to access pool directory {pool_dir_url}: {e}')
        raise

    # Find all .deb files for this package
    pattern = rf'href="({re.escape(package_name)}_[^"]+\.deb)"'
    matches = re.findall(pattern, html_content)

    if matches:
        # Use the last one (likely the newest based on alphabetical sorting)
        latest_file = matches[-1]
        package_url = pool_dir_url + latest_file
        logger.info(f'Found latest version: {latest_file}')
        return package_url
    else:
        raise PackageNotFoundError(package_name)

find_package

find_package(package)

Find a package by name or provided name.

Parameters:

Name Type Description Default
package str

Package name to search for.

required

Returns:

Type Description
Optional[dict]

Package metadata dictionary if found, None otherwise.

Source code in es_translator/interpreters/apertium/repository.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
def find_package(self, package: str) -> Optional[dict]:
    """Find a package by name or provided name.

    Args:
        package: Package name to search for.

    Returns:
        Package metadata dictionary if found, None otherwise.
    """

    def is_package(c: dict) -> bool:
        return c.get('Package') == package or c.get('Provides') == package

    try:
        return next(filter(is_package, self.packages))
    except StopIteration:
        logger.warning(f'Unable to find package {package}')
        return None

find_pair_package

find_pair_package(pair)

Find a translation pair package.

Searches for both forward (source-target) and reverse (target-source) pairs.

Parameters:

Name Type Description Default
pair str

Language pair in format 'source-target'.

required

Returns:

Type Description
Optional[dict]

Package metadata dictionary if found, None otherwise.

Source code in es_translator/interpreters/apertium/repository.py
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
def find_pair_package(self, pair: str) -> Optional[dict]:
    """Find a translation pair package.

    Searches for both forward (source-target) and reverse (target-source) pairs.

    Args:
        pair: Language pair in format 'source-target'.

    Returns:
        Package metadata dictionary if found, None otherwise.
    """
    pair = to_alpha_3_pair(pair)
    pair_inversed = '-'.join(pair.split('-')[::-1])

    def is_pair(c: dict) -> bool:
        package_name = c.get('Package', '')
        return package_name.endswith(pair) or package_name.endswith(pair_inversed)

    try:
        return next(filter(is_pair, self.pair_packages))
    except StopIteration:
        return None

import_modes

import_modes(clear=True)

Import all mode files from installed packages into the modes directory.

Parameters:

Name Type Description Default
clear bool

If True, clear existing modes before importing. Defaults to True.

True
Source code in es_translator/interpreters/apertium/repository.py
380
381
382
383
384
385
386
387
388
389
390
391
392
393
def import_modes(self, clear: bool = True) -> None:
    """Import all mode files from installed packages into the modes directory.

    Args:
        clear: If True, clear existing modes before importing. Defaults to True.
    """
    with pushd(self.cache_dir):
        if clear:
            self.clear_modes()
        mkdir('-p', 'modes')
        # Copy all the mode files from installed packages
        mode_files = glob('./*/modes/*.mode')
        if mode_files:
            cp(mode_files, './modes')

install_pair_package

install_pair_package(pair)

Download, extract, and install a translation pair package.

Parameters:

Name Type Description Default
pair str

Language pair in format 'source-target'.

required

Returns:

Type Description
str

Path to the installed package directory.

Source code in es_translator/interpreters/apertium/repository.py
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
def install_pair_package(self, pair: str) -> str:
    """Download, extract, and install a translation pair package.

    Args:
        pair: Language pair in format 'source-target'.

    Returns:
        Path to the installed package directory.
    """
    logger.info(f'Installing pair package {pair}')
    package_file = self.download_pair_package(pair)
    package_dir = self.extract_pair_package(package_file)
    self.create_pair_package_alias(package_dir)
    self.import_modes(clear=False)
    return package_dir

is_apertium_pair

is_apertium_pair(control)

Check if a package is an Apertium translation pair.

Parameters:

Name Type Description Default
control dict

Package metadata dictionary.

required

Returns:

Type Description
bool

True if package is a translation pair (format: apertium-XX-YY).

Source code in es_translator/interpreters/apertium/repository.py
180
181
182
183
184
185
186
187
188
189
190
191
192
193
def is_apertium_pair(self, control: dict) -> bool:
    """Check if a package is an Apertium translation pair.

    Args:
        control: Package metadata dictionary.

    Returns:
        True if package is a translation pair (format: apertium-XX-YY).
    """
    try:
        parts = control['Package'].split('-')
        return len(parts) == 3 and parts[0] == 'apertium'
    except KeyError:
        return False

replace_in_file

replace_in_file(file, target, replacement)

Replace all occurrences of target string in a file.

Parameters:

Name Type Description Default
file str

Path to the file to modify.

required
target str

String to search for.

required
replacement str

String to replace with.

required
Source code in es_translator/interpreters/apertium/repository.py
293
294
295
296
297
298
299
300
301
302
303
def replace_in_file(self, file: str, target: str, replacement: str) -> None:
    """Replace all occurrences of target string in a file.

    Args:
        file: Path to the file to modify.
        target: String to search for.
        replacement: String to replace with.
    """
    with FileInput(file, inplace=True) as fileinput:
        for line in fileinput:
            print(line.replace(target, replacement), end='')