Basic concepts and definitions¶
Before starting, here are a few definitions of concepts that we'll regularly use in this documentation.
The following concepts are important for the rest of this tutorial, make sure you understand them properly !
Definitions¶
Tasks¶
Tasks (a.k.a. "async tasks" or "asynchronous tasks") are units of work that can be executed asynchronously.
Datashare has its own built-in tasks such as indexing documents, finding named entities, performing search or download by batches... Tasks are visible in on Datashares's tasks page.
The goal of this documentation is to let you implement your own custom tasks. They could virtually be anything:
- classifying documents
- extracting named entities from documents
- extracting structured content from documents
- translating documents
- tagging documents
- ...
Asynchronous¶
In our context asynchronous mean "executed in the background". Since tasks can be long, getting their result is not as simple as calling an API endpoint.
Instead, executing a task asynchronously implies:
- requesting the task execution by publish the task name and arguments (parameters needed to perform the task) on the broker
- receive the task name and arguments from the broker and perform the actual task in the background inside a task worker (optionally publishing progress updates on the broker)
- monitor the task progress
- saving task results or errors
- accessing the task results or errors
Workers¶
Workers (a.k.a. "async apps") are infinite loop Python programs running async tasks.
They pseudo for the worker loop is:
while True:
task_name, task_args = get_next_task()
task_fn = get_task_function_by_name(task_name)
try:
result = task_fn(**task_args)
except Exception as e:
save_error(e)
continue
save_result(result)
Task Manager¶
The task manager is the primary interface to interact with tasks. The task manager lets us:
- create tasks and send them to workers
- post task task state and progress updates
- monitor task state and progress
- get task results and errors
- cancel task
- ...
App¶
Apps (a.k.a. "async apps") are collections of tasks, they act as a registry and bind a task name to an actual unit of work (a.k.a. a Python function).
Next¶
- skip directly to learn more about tasks
- or continue to learn about advanced concepts