Skip to content

Datashare

Better analyze information, in all its forms



Implement your own Datashare tasks, written in Python

Most AI, Machine Learning, Data Engineering happens in Python. Datashare now lets you extend its backend with your own tasks implemented in Python.

Turning your own data processing pipelines into a Datashare worker running your pipeline is very simple.

Let's turn this dummy pipeline function into a Datashare worker:

def hello(person: str) -> str:
    return f"Hello {person}"

We start by running the datashare-python's CLI to create a worker project from a template:

curl -LsSf https://astral.sh/uv/install.sh | shuvx datashare-python project init hello-worldInitializing hello-world worker project in .
Project hello-world initialized !
cd hello-world

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"uvx datashare-python project init hello-worldInitializing hello-world worker project in .
Project hello-world initialized !
cd hello-world

Datashare's asynchronous execution is backed by the temporal durable execution framework. In temporal workflows are described in plain Python code in which some tasks (activities in Temporal's terminology) are executed.

Then we implement a simple HelloWorld workflow, running a single hello activity. Here is what our new activity should look like:

hello_world/activities.py
from temporalio import activity


@activity.defn(name="hello")
def hello(person: str) -> str:
    return f"Hello {person}"

Next, we integrate our task/activity into a workflow:

hello_world/workflows.py
from datetime import timedelta

from temporalio import workflow

from .activities import hello


@workflow.defn(name="hello-world")
class HelloWorld:
    @workflow.run
    async def run(self, person: str) -> str:
        return await workflow.execute_activity(
            hello,
            person,
            start_to_close_timeout=timedelta(seconds=10),
        )

Finally we set up dependencies and run our async Datashare worker:

uv run --frozen datashare-python worker start --activities hello --workflows hello-world --queue hello

you'll then be able to execute workflow using the datashare-python CLI.

Learn

Learn how to integrate Data Processing and Machine Learning pipelines to Datashare following our tutorial.

Get started

Follow our get started guide an learn how to clone the template repository and implement your own Datashare tasks !

Refine your knowledge

Follow our guides to learn how to implement complex tasks and deploy Datashare workers running your own tasks.