def make_monthly_predictions(input_data):
# Fill your actual logic here!
= input_data
output_data return output_data
Understanding Kedro’s Namespace Pipelines
Kedro’s Namespace Pipeline is a powerful feature that allows for flexible pipeline reuse, especially handy for tasks like time series forecasting.
Getting Started
This blog post is based on this example Kedro project. Consider a basic pipeline node for monthly predictions:
Now, let’s create a simple one node pipeline using Kedro, this will be the building block of our pipelines:
from kedro.pipeline.modular_pipeline import pipeline
from kedro.pipeline import node
= pipeline(
base_pipeline
[
node(=make_monthly_predictions,
func=["input_data"],
inputs=["output_data"]
outputs
)
] )
It basically just call the make_monthly_prediction
, and define what are the “inputs” and “outputs”.
Utilizing Namespace for Efficiency
Now that we have our base pipeline, For time series forecasting, where predictions depend on previous results, we can efficiently handle this with [Namespace Pipelines]((https://docs.kedro.org/en/0.18.0/tutorial/namespace_pipelines.html). Start by creating a Namespace Pipeline:
= pipeline(
namespace_pipeline
[
node(=make_monthly_predictions,
func=["input_data"],
inputs=["output_data"]
outputs
)
],="namespace"
namespace )
The namespace
argument automatically adds a prefix to inputs and outputs. You can inspect the pipeline by printing it.
namespace_pipeline
Pipeline([
Node(make_monthly_predictions, ['input_data'], ['namespace.output_data'], None)
])
If you want to keep some datasets from namespacing, you can specify the inputs
or outputs
argument of the pipeline
function to overide it:
(https://docs.kedro.org/en/stable/nodes_and_pipelines/modular_pipelines.html#using-the-modular-pipeline-wrapper-to-provide-overrides).
= pipeline(
namespace_pipeline
[
node(
=make_monthly_predictions,
func=["input_data"],
inputs=["output_data"]
outputs
)
],=["input_data"], # Escape from namespace
inputs="namespace"
namespace )
namespace_pipeline
Pipeline([
Node(make_monthly_predictions, ['input_data'], ['namespace.output_data'], None)
])
Building the Time-Series Pipeline
Now that we understand the mechanics of namespace, let’s build a time-series pipeline by iterating through months and connecting pipelines:
= ["jan", "feb", "mar", "apr"]
months
def create_pipeline(months):
= []
pipelines for i in range(len(months) - 1):
next = months[i], months[i+1]
curr,
pipelines.append(pipeline(base_pipeline,={"output_data": f"{next}.input_data"},
outputs=curr))
namespacereturn pipeline(pipelines)
= create_pipeline(months) final_pipeline
final_pipeline
Pipeline([
Node(make_monthly_predictions, ['jan.input_data'], ['feb.input_data'], None),
Node(make_monthly_predictions, ['feb.input_data'], ['mar.input_data'], None),
Node(make_monthly_predictions, ['mar.input_data'], ['apr.input_data'], None)
])
By visualizing the pipeline with kedro viz
, you can observe the connections between each step.