Implement your own Datashare tasks, written in Python¶
Most AI, Machine Learning, Data Engineering happens in Python. Datashare now lets you extend its backend with your own tasks implemented in Python.
Turning your own data processing pipelines into a Datashare worker running your pipeline is very simple.
Let's turn this dummy pipeline function into a Datashare worker:
We start by running the datashare-python's CLI to create a worker project from a template:
Project hello-world initialized !cd hello-world
Project hello-world initialized !cd hello-world
Datashare's asynchronous execution is backed by the temporal durable execution framework. In temporal workflows are described in plain Python code in which some tasks (activities in Temporal's terminology) are executed.
Then we implement a simple HelloWorld workflow, running a single hello activity. Here is what our new activity should look like:
from temporalio import activity
@activity.defn(name="hello")
def hello(person: str) -> str:
return f"Hello {person}"
Next, we integrate our task/activity into a workflow:
from datetime import timedelta
from temporalio import workflow
from .activities import hello
@workflow.defn(name="hello-world")
class HelloWorld:
@workflow.run
async def run(self, person: str) -> str:
return await workflow.execute_activity(
hello,
person,
start_to_close_timeout=timedelta(seconds=10),
)
Finally we set up dependencies and run our async Datashare worker:
you'll then be able to execute workflow using the datashare-python CLI.
Learn¶
Learn how to integrate Data Processing and Machine Learning pipelines to Datashare following our tutorial.
Get started¶
Follow our get started guide an learn how to clone the template repository and implement your own Datashare tasks !
Refine your knowledge¶
Follow our guides to learn how to implement complex tasks and deploy Datashare workers running your own tasks.