Steps to debug Kedro pipeline in a notebook

Read from stack trace - find out the line of code that produce the error
Find which node this function belongs to
Trying to rerun the pipeline just before this node
If it’s not a persisted dataset, you need to change it in catalog.yml, and re-run the pipeline, error is thrown again
session has already been used once, so if you call session again it will throw error. (so he had a wrapper function that recreate session and do something similar to session.run
Create a new session or %reload_kedro?
Now catalog.load that persisted dataset, i.e. func(catalog.load("some_data"))
Copy the source code of func to notebook, it would work if the function itself is the node function, but if it is some function buried deep down, that’s a lot more copy-pasting and change of import maybe.
Change the source code and make it work in the notebook
Rerun the pipeline to ensure everything works

Running Session as Usual

%reload_kedro

[11/08/22 16:44:22] INFO     Resolved project path as:                                              __init__.py:132
                             /Users/Nok_Lam_Chan/dev/kedro_gallery/jupyter-debug-demo.                             
                             To set a different path, run '%reload_kedro <project_root>'

[11/08/22 16:44:24] INFO     Kedro project jupyter_debug_demo                                       __init__.py:101

                    INFO     Defined global variable 'context', 'session', 'catalog' and            __init__.py:102
                             'pipelines'

                    INFO     Registered line magic 'run_viz'                                        __init__.py:108

session

<kedro.framework.session.session.KedroSession object at 0x7fc47a1a0be0>

pipelines

{'__default__': Pipeline([
Node(split_data, ['example_iris_data', 'parameters'], ['X_train', 'X_test', 'y_train', 'y_test'], 'split'),
Node(make_predictions, ['X_train', 'X_test', 'y_train'], 'y_pred', 'make_predictions'),
Node(report_accuracy, ['y_pred', 'y_test'], None, 'report_accuracy')
])}

session.run()

                    INFO     Kedro project jupyter-debug-demo                                        session.py:340

[11/08/22 16:44:25] INFO     Loading data from 'example_iris_data' (CSVDataSet)...              data_catalog.py:343

                    INFO     Loading data from 'parameters' (MemoryDataSet)...                  data_catalog.py:343

                    INFO     Running node: split: split_data([example_iris_data,parameters]) ->         node.py:327
                             [X_train,X_test,y_train,y_test]

                    INFO     Saving data to 'X_train' (MemoryDataSet)...                        data_catalog.py:382

                    INFO     Saving data to 'X_test' (MemoryDataSet)...                         data_catalog.py:382

                    INFO     Saving data to 'y_train' (MemoryDataSet)...                        data_catalog.py:382

                    INFO     Saving data to 'y_test' (PickleDataSet)...                         data_catalog.py:382

                    INFO     Completed 1 out of 3 tasks                                     sequential_runner.py:85

                    INFO     Loading data from 'X_train' (MemoryDataSet)...                     data_catalog.py:343

                    INFO     Loading data from 'X_test' (MemoryDataSet)...                      data_catalog.py:343

                    INFO     Loading data from 'y_train' (MemoryDataSet)...                     data_catalog.py:343

                    INFO     Running node: make_predictions: make_predictions([X_train,X_test,y_train]) node.py:327
                             -> [y_pred]

                    INFO     Saving data to 'y_pred' (PickleDataSet)...                         data_catalog.py:382

                    INFO     Completed 2 out of 3 tasks                                     sequential_runner.py:85

                    INFO     Loading data from 'y_pred' (PickleDataSet)...                      data_catalog.py:343

                    INFO     Loading data from 'y_test' (PickleDataSet)...                      data_catalog.py:343

                    INFO     Running node: report_accuracy: report_accuracy([y_pred,y_test]) -> None    node.py:327

                    ERROR    Node 'report_accuracy: report_accuracy([y_pred,y_test]) -> None' failed    node.py:352
                             with error:                                                                           
                             Simulate some bug here

                    WARNING  There are 1 nodes that have not run.                                     runner.py:202
                             You can resume the pipeline run from the nearest nodes with persisted                 
                             inputs by adding the following argument to your previous command:                     
                               --from-nodes "report_accuracy"

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /var/folders/dv/bz0yz1dn71d2hygq110k3xhw0000gp/T/ipykernel_7863/833844929.py:1 in <cell line: 1> │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ '/var/folders/dv/bz0yz1dn71d2hygq110k3xhw0000gp/T/ipykernel_7863/833844929.py'                   │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/framework/session/session.py:404 in run                   │
│                                                                                                  │
│   401 │   │   )                                                                                  │
│   402 │   │                                                                                      │
│   403 │   │   try:                                                                               │
│ ❱ 404 │   │   │   run_result = runner.run(                                                       │
│   405 │   │   │   │   filtered_pipeline, catalog, hook_manager, session_id                       │
│   406 │   │   │   )                                                                              │
│   407 │   │   │   self._run_called = True                                                        │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/runner/runner.py:88 in run                                │
│                                                                                                  │
│    85 │   │   │   self._logger.info(                                                             │
│    86 │   │   │   │   "Asynchronous mode is enabled for loading and saving data"                 │
│    87 │   │   │   )                                                                              │
│ ❱  88 │   │   self._run(pipeline, catalog, hook_manager, session_id)                             │
│    89 │   │                                                                                      │
│    90 │   │   self._logger.info("Pipeline execution completed successfully.")                    │
│    91                                                                                            │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/runner/sequential_runner.py:70 in _run                    │
│                                                                                                  │
│   67 │   │                                                                                       │
│   68 │   │   for exec_index, node in enumerate(nodes):                                           │
│   69 │   │   │   try:                                                                            │
│ ❱ 70 │   │   │   │   run_node(node, catalog, hook_manager, self._is_async, session_id)           │
│   71 │   │   │   │   done_nodes.add(node)                                                        │
│   72 │   │   │   except Exception:                                                               │
│   73 │   │   │   │   self._suggest_resume_scenario(pipeline, done_nodes, catalog)                │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/runner/runner.py:304 in run_node                          │
│                                                                                                  │
│   301 │   if is_async:                                                                           │
│   302 │   │   node = _run_node_async(node, catalog, hook_manager, session_id)                    │
│   303 │   else:                                                                                  │
│ ❱ 304 │   │   node = _run_node_sequential(node, catalog, hook_manager, session_id)               │
│   305 │                                                                                          │
│   306 │   for name in node.confirms:                                                             │
│   307 │   │   catalog.confirm(name)                                                              │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/runner/runner.py:398 in _run_node_sequential              │
│                                                                                                  │
│   395 │   )                                                                                      │
│   396 │   inputs.update(additional_inputs)                                                       │
│   397 │                                                                                          │
│ ❱ 398 │   outputs = _call_node_run(                                                              │
│   399 │   │   node, catalog, inputs, is_async, hook_manager, session_id=session_id               │
│   400 │   )                                                                                      │
│   401                                                                                            │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/runner/runner.py:366 in _call_node_run                    │
│                                                                                                  │
│   363 │   │   │   is_async=is_async,                                                             │
│   364 │   │   │   session_id=session_id,                                                         │
│   365 │   │   )                                                                                  │
│ ❱ 366 │   │   raise exc                                                                          │
│   367 │   hook_manager.hook.after_node_run(                                                      │
│   368 │   │   node=node,                                                                         │
│   369 │   │   catalog=catalog,                                                                   │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/runner/runner.py:356 in _call_node_run                    │
│                                                                                                  │
│   353 ) -> Dict[str, Any]:                                                                       │
│   354 │   # pylint: disable=too-many-arguments                                                   │
│   355 │   try:                                                                                   │
│ ❱ 356 │   │   outputs = node.run(inputs)                                                         │
│   357 │   except Exception as exc:                                                               │
│   358 │   │   hook_manager.hook.on_node_error(                                                   │
│   359 │   │   │   error=exc,                                                                     │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/pipeline/node.py:353 in run                               │
│                                                                                                  │
│   350 │   │   # purposely catch all exceptions                                                   │
│   351 │   │   except Exception as exc:                                                           │
│   352 │   │   │   self._logger.error("Node '%s' failed with error: \n%s", str(self), str(exc))   │
│ ❱ 353 │   │   │   raise exc                                                                      │
│   354 │                                                                                          │
│   355 │   def _run_with_no_inputs(self, inputs: Dict[str, Any]):                                 │
│   356 │   │   if inputs:                                                                         │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/pipeline/node.py:344 in run                               │
│                                                                                                  │
│   341 │   │   │   elif isinstance(self._inputs, str):                                            │
│   342 │   │   │   │   outputs = self._run_with_one_input(inputs, self._inputs)                   │
│   343 │   │   │   elif isinstance(self._inputs, list):                                           │
│ ❱ 344 │   │   │   │   outputs = self._run_with_list(inputs, self._inputs)                        │
│   345 │   │   │   elif isinstance(self._inputs, dict):                                           │
│   346 │   │   │   │   outputs = self._run_with_dict(inputs, self._inputs)                        │
│   347                                                                                            │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/pipeline/node.py:384 in _run_with_list                    │
│                                                                                                  │
│   381 │   │   │   │   f"{sorted(inputs.keys())}."                                                │
│   382 │   │   │   )                                                                              │
│   383 │   │   # Ensure the function gets the inputs in the correct order                         │
│ ❱ 384 │   │   return self._func(*(inputs[item] for item in node_inputs))                         │
│   385 │                                                                                          │
│   386 │   def _run_with_dict(self, inputs: Dict[str, Any], node_inputs: Dict[str, str]):         │
│   387 │   │   # Node inputs and provided run inputs should completely overlap                    │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro_gallery/jupyter-debug-demo/src/jupyter_debug_demo/nodes.py:74 in   │
│ report_accuracy                                                                                  │
│                                                                                                  │
│   71 │   │   y_pred: Predicted target.                                                           │
│   72 │   │   y_test: True target.                                                                │
│   73 │   """                                                                                     │
│ ❱ 74 │   raise ValueError("Simulate some bug here")                                              │
│   75 │   accuracy = (y_pred == y_test).sum() / len(y_test)                                       │
│   76 │   logger = logging.getLogger(__name__)                                                    │
│   77 │   logger.info("Model has accuracy of %.3f on test data.", accuracy)                       │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Simulate some bug here

Read from stack trace - find out the line of code that produce the error
Find which node this function belongs to
Trying to rerun the pipeline just before this node
If it’s not a persisted dataset, you need to change it in catalog.yml, and re-run the pipeline, error is thrown again
session has already been used once, so if you call session again it will throw error. (so he had a wrapper function that recreate session and do something similar to session.run
Create a new session or %reload_kedro and re-run?

This is not efficient because in interactive workflow, these intermdiate variables is likely store in the catalog already.

%reload_kedro

[11/08/22 16:46:49] INFO     Resolved project path as:                                              __init__.py:132
                             /Users/Nok_Lam_Chan/dev/kedro_gallery/jupyter-debug-demo.                             
                             To set a different path, run '%reload_kedro <project_root>'

[11/08/22 16:46:50] INFO     Kedro project jupyter_debug_demo                                       __init__.py:101

                    INFO     Defined global variable 'context', 'session', 'catalog' and            __init__.py:102
                             'pipelines'

                    INFO     Registered line magic 'run_viz'                                        __init__.py:108

session.run()

[11/08/22 16:46:53] INFO     Kedro project jupyter-debug-demo                                        session.py:340

[11/08/22 16:46:54] INFO     Loading data from 'example_iris_data' (CSVDataSet)...              data_catalog.py:343

                    INFO     Loading data from 'parameters' (MemoryDataSet)...                  data_catalog.py:343

                    INFO     Running node: split: split_data([example_iris_data,parameters]) ->         node.py:327
                             [X_train,X_test,y_train,y_test]

                    INFO     Saving data to 'X_train' (MemoryDataSet)...                        data_catalog.py:382

                    INFO     Saving data to 'X_test' (MemoryDataSet)...                         data_catalog.py:382

                    INFO     Saving data to 'y_train' (MemoryDataSet)...                        data_catalog.py:382

                    INFO     Saving data to 'y_test' (PickleDataSet)...                         data_catalog.py:382

                    INFO     Completed 1 out of 3 tasks                                     sequential_runner.py:85

                    INFO     Loading data from 'X_train' (MemoryDataSet)...                     data_catalog.py:343

                    INFO     Loading data from 'X_test' (MemoryDataSet)...                      data_catalog.py:343

                    INFO     Loading data from 'y_train' (MemoryDataSet)...                     data_catalog.py:343

                    INFO     Running node: make_predictions: make_predictions([X_train,X_test,y_train]) node.py:327
                             -> [y_pred]

                    INFO     Saving data to 'y_pred' (PickleDataSet)...                         data_catalog.py:382

                    INFO     Completed 2 out of 3 tasks                                     sequential_runner.py:85

                    INFO     Loading data from 'y_pred' (PickleDataSet)...                      data_catalog.py:343

                    INFO     Loading data from 'y_test' (PickleDataSet)...                      data_catalog.py:343

                    INFO     Running node: report_accuracy: report_accuracy([y_pred,y_test]) -> None    node.py:327

                    ERROR    Node 'report_accuracy: report_accuracy([y_pred,y_test]) -> None' failed    node.py:352
                             with error:                                                                           
                             Simulate some bug here

                    WARNING  There are 1 nodes that have not run.                                     runner.py:202
                             You can resume the pipeline run from the nearest nodes with persisted                 
                             inputs by adding the following argument to your previous command:                     
                               --from-nodes "report_accuracy"

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /var/folders/dv/bz0yz1dn71d2hygq110k3xhw0000gp/T/ipykernel_7863/833844929.py:1 in <cell line: 1> │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ '/var/folders/dv/bz0yz1dn71d2hygq110k3xhw0000gp/T/ipykernel_7863/833844929.py'                   │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/framework/session/session.py:404 in run                   │
│                                                                                                  │
│   401 │   │   )                                                                                  │
│   402 │   │                                                                                      │
│   403 │   │   try:                                                                               │
│ ❱ 404 │   │   │   run_result = runner.run(                                                       │
│   405 │   │   │   │   filtered_pipeline, catalog, hook_manager, session_id                       │
│   406 │   │   │   )                                                                              │
│   407 │   │   │   self._run_called = True                                                        │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/runner/runner.py:88 in run                                │
│                                                                                                  │
│    85 │   │   │   self._logger.info(                                                             │
│    86 │   │   │   │   "Asynchronous mode is enabled for loading and saving data"                 │
│    87 │   │   │   )                                                                              │
│ ❱  88 │   │   self._run(pipeline, catalog, hook_manager, session_id)                             │
│    89 │   │                                                                                      │
│    90 │   │   self._logger.info("Pipeline execution completed successfully.")                    │
│    91                                                                                            │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/runner/sequential_runner.py:70 in _run                    │
│                                                                                                  │
│   67 │   │                                                                                       │
│   68 │   │   for exec_index, node in enumerate(nodes):                                           │
│   69 │   │   │   try:                                                                            │
│ ❱ 70 │   │   │   │   run_node(node, catalog, hook_manager, self._is_async, session_id)           │
│   71 │   │   │   │   done_nodes.add(node)                                                        │
│   72 │   │   │   except Exception:                                                               │
│   73 │   │   │   │   self._suggest_resume_scenario(pipeline, done_nodes, catalog)                │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/runner/runner.py:304 in run_node                          │
│                                                                                                  │
│   301 │   if is_async:                                                                           │
│   302 │   │   node = _run_node_async(node, catalog, hook_manager, session_id)                    │
│   303 │   else:                                                                                  │
│ ❱ 304 │   │   node = _run_node_sequential(node, catalog, hook_manager, session_id)               │
│   305 │                                                                                          │
│   306 │   for name in node.confirms:                                                             │
│   307 │   │   catalog.confirm(name)                                                              │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/runner/runner.py:398 in _run_node_sequential              │
│                                                                                                  │
│   395 │   )                                                                                      │
│   396 │   inputs.update(additional_inputs)                                                       │
│   397 │                                                                                          │
│ ❱ 398 │   outputs = _call_node_run(                                                              │
│   399 │   │   node, catalog, inputs, is_async, hook_manager, session_id=session_id               │
│   400 │   )                                                                                      │
│   401                                                                                            │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/runner/runner.py:366 in _call_node_run                    │
│                                                                                                  │
│   363 │   │   │   is_async=is_async,                                                             │
│   364 │   │   │   session_id=session_id,                                                         │
│   365 │   │   )                                                                                  │
│ ❱ 366 │   │   raise exc                                                                          │
│   367 │   hook_manager.hook.after_node_run(                                                      │
│   368 │   │   node=node,                                                                         │
│   369 │   │   catalog=catalog,                                                                   │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/runner/runner.py:356 in _call_node_run                    │
│                                                                                                  │
│   353 ) -> Dict[str, Any]:                                                                       │
│   354 │   # pylint: disable=too-many-arguments                                                   │
│   355 │   try:                                                                                   │
│ ❱ 356 │   │   outputs = node.run(inputs)                                                         │
│   357 │   except Exception as exc:                                                               │
│   358 │   │   hook_manager.hook.on_node_error(                                                   │
│   359 │   │   │   error=exc,                                                                     │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/pipeline/node.py:353 in run                               │
│                                                                                                  │
│   350 │   │   # purposely catch all exceptions                                                   │
│   351 │   │   except Exception as exc:                                                           │
│   352 │   │   │   self._logger.error("Node '%s' failed with error: \n%s", str(self), str(exc))   │
│ ❱ 353 │   │   │   raise exc                                                                      │
│   354 │                                                                                          │
│   355 │   def _run_with_no_inputs(self, inputs: Dict[str, Any]):                                 │
│   356 │   │   if inputs:                                                                         │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/pipeline/node.py:344 in run                               │
│                                                                                                  │
│   341 │   │   │   elif isinstance(self._inputs, str):                                            │
│   342 │   │   │   │   outputs = self._run_with_one_input(inputs, self._inputs)                   │
│   343 │   │   │   elif isinstance(self._inputs, list):                                           │
│ ❱ 344 │   │   │   │   outputs = self._run_with_list(inputs, self._inputs)                        │
│   345 │   │   │   elif isinstance(self._inputs, dict):                                           │
│   346 │   │   │   │   outputs = self._run_with_dict(inputs, self._inputs)                        │
│   347                                                                                            │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/pipeline/node.py:384 in _run_with_list                    │
│                                                                                                  │
│   381 │   │   │   │   f"{sorted(inputs.keys())}."                                                │
│   382 │   │   │   )                                                                              │
│   383 │   │   # Ensure the function gets the inputs in the correct order                         │
│ ❱ 384 │   │   return self._func(*(inputs[item] for item in node_inputs))                         │
│   385 │                                                                                          │
│   386 │   def _run_with_dict(self, inputs: Dict[str, Any], node_inputs: Dict[str, str]):         │
│   387 │   │   # Node inputs and provided run inputs should completely overlap                    │
│                                                                                                  │
│ /Users/Nok_Lam_Chan/dev/kedro_gallery/jupyter-debug-demo/src/jupyter_debug_demo/nodes.py:74 in   │
│ report_accuracy                                                                                  │
│                                                                                                  │
│   71 │   │   y_pred: Predicted target.                                                           │
│   72 │   │   y_test: True target.                                                                │
│   73 │   """                                                                                     │
│ ❱ 74 │   raise ValueError("Simulate some bug here")                                              │
│   75 │   accuracy = (y_pred == y_test).sum() / len(y_test)                                       │
│   76 │   logger = logging.getLogger(__name__)                                                    │
│   77 │   logger.info("Model has accuracy of %.3f on test data.", accuracy)                       │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Simulate some bug here

Now catalog.load that persisted dataset, i.e. func(catalog.load("some_data"))

y_pred = catalog.load("y_pred")
y_test = catalog.load("y_test")

[11/08/22 16:47:19] INFO     Loading data from 'y_pred' (PickleDataSet)...                      data_catalog.py:343

                    INFO     Loading data from 'y_test' (PickleDataSet)...                      data_catalog.py:343

catalog.datasets.y_pred.load().head()  # This is the alternative way to use auto-discovery which can be improved

0     setosa
2     setosa
7     setosa
20    setosa
21    setosa
Name: species, dtype: object

Copy the source code of func to notebook, it would work if the function itself is the node function, but if it is some function buried deep down, that’s a lot more copy-pasting and change of import maybe.

def report_accuracy(y_pred: pd.Series, y_test: pd.Series):
    """Calculates and logs the accuracy.

    Args:
        y_pred: Predicted target.
        y_test: True target.
    """
    raise ValueError("Simulate some bug here")
    accuracy = (y_pred == y_test).sum() / len(y_test)
    logger = logging.getLogger(__name__)
    logger.info("Model has accuracy of %.3f on test data.", accuracy)

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /var/folders/dv/bz0yz1dn71d2hygq110k3xhw0000gp/T/ipykernel_7863/1415042900.py:1 in <cell line:   │
│ 1>                                                                                               │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ '/var/folders/dv/bz0yz1dn71d2hygq110k3xhw0000gp/T/ipykernel_7863/1415042900.py'                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
NameError: name 'pd' is not defined

This won’t work immediately work, a couple of copy&paste is needed

manual copy the imports
Remove the function now - copy the source code as a cell instead

import pandas as pd
import logging

raise ValueError("Simulate some bug here")
accuracy = (y_pred == y_test).sum() / len(y_test)
logger = logging.getLogger(__name__)
logger.info("Model has accuracy of %.3f on test data.", accuracy)

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /var/folders/dv/bz0yz1dn71d2hygq110k3xhw0000gp/T/ipykernel_7863/2816569123.py:1 in <cell line:   │
│ 1>                                                                                               │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ '/var/folders/dv/bz0yz1dn71d2hygq110k3xhw0000gp/T/ipykernel_7863/2816569123.py'                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Simulate some bug here

Assume we know that the first line is buggy, let’s remove it

# raise ValueError("Simulate some bug here")
accuracy = (y_pred == y_test).sum() / len(y_test)
logger = logging.getLogger(__name__)
logger.info("Model has accuracy of %.3f on test data.", accuracy)
# It now works - lets copy this block back into the function and rerun

Change the source code and make it work in the notebook
Rerun the pipeline to ensure everything works

%reload_kedro
session.run()

[11/08/22 16:50:48] INFO     Resolved project path as:                                              __init__.py:132
                             /Users/Nok_Lam_Chan/dev/kedro_gallery/jupyter-debug-demo.                             
                             To set a different path, run '%reload_kedro <project_root>'

[11/08/22 16:50:49] INFO     Kedro project jupyter_debug_demo                                       __init__.py:101

                    INFO     Defined global variable 'context', 'session', 'catalog' and            __init__.py:102
                             'pipelines'

                    INFO     Registered line magic 'run_viz'                                        __init__.py:108

                    INFO     Kedro project jupyter-debug-demo                                        session.py:340

[11/08/22 16:50:50] INFO     Loading data from 'example_iris_data' (CSVDataSet)...              data_catalog.py:343

                    INFO     Loading data from 'parameters' (MemoryDataSet)...                  data_catalog.py:343

                    INFO     Running node: split: split_data([example_iris_data,parameters]) ->         node.py:327
                             [X_train,X_test,y_train,y_test]

                    INFO     Saving data to 'X_train' (MemoryDataSet)...                        data_catalog.py:382

                    INFO     Saving data to 'X_test' (MemoryDataSet)...                         data_catalog.py:382

                    INFO     Saving data to 'y_train' (MemoryDataSet)...                        data_catalog.py:382

                    INFO     Saving data to 'y_test' (PickleDataSet)...                         data_catalog.py:382

                    INFO     Completed 1 out of 3 tasks                                     sequential_runner.py:85

                    INFO     Loading data from 'X_train' (MemoryDataSet)...                     data_catalog.py:343

                    INFO     Loading data from 'X_test' (MemoryDataSet)...                      data_catalog.py:343

                    INFO     Loading data from 'y_train' (MemoryDataSet)...                     data_catalog.py:343

                    INFO     Running node: make_predictions: make_predictions([X_train,X_test,y_train]) node.py:327
                             -> [y_pred]

                    INFO     Saving data to 'y_pred' (PickleDataSet)...                         data_catalog.py:382

                    INFO     Completed 2 out of 3 tasks                                     sequential_runner.py:85

                    INFO     Loading data from 'y_pred' (PickleDataSet)...                      data_catalog.py:343

                    INFO     Loading data from 'y_test' (PickleDataSet)...                      data_catalog.py:343

                    INFO     Running node: report_accuracy: report_accuracy([y_pred,y_test]) -> None    node.py:327

                    INFO     Model has accuracy of 0.933 on test data.                                  nodes.py:77

                    INFO     Completed 3 out of 3 tasks                                     sequential_runner.py:85

                    INFO     Pipeline execution completed successfully.                                runner.py:90

{}

It works now!

Debugging with interactive session is not uncommon - compare to IDE/breakpoint. * You can make plots and see the data * You can intercept the variable and continue with the program - espeically useful when it is computation intensive.

See more comments from Antony

More to optimize 1st PoC * %load_node - populate all neccessary data where the node throws error * When pipeline fail - raise something like %load_node debug=True - the traceback should have information about which node the error is coming from. * Is there anything we can use viz? Sometimes I get question from people can kedro-viz help with debugging too.

More to optimize: * What if the error is not in the node function but somewhere deeper in the call stack? * Handle case when the inputs are not in catalog - how to recompute the necessary inputs? Potentially we can use the backtracking to do it in a more efficient way.