2026-03-23 11:31:25,006 - distributed.worker - ERROR - Worker stream died during communication: tcp://172.21.159.225:39121 Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/tornado/iostream.py", line 869, in _read_to_buffer bytes_read = self.read_from_fd(buf) File "/opt/conda/lib/python3.10/site-packages/tornado/iostream.py", line 1138, in read_from_fd return self.socket.recv_into(buf, len(buf)) ConnectionResetError: [Errno 104] Connection reset by peer The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2066, in gather_dep response = await get_data_from_worker( File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2892, in get_data_from_worker response = await send_recv( File "/opt/conda/lib/python3.10/site-packages/distributed/core.py", line 1024, in send_recv response = await comm.read(deserializers=deserializers) File "/opt/conda/lib/python3.10/site-packages/distributed/comm/tcp.py", line 241, in read convert_stream_closed_error(self, e) File "/opt/conda/lib/python3.10/site-packages/distributed/comm/tcp.py", line 142, in convert_stream_closed_error raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc distributed.comm.core.CommClosedError: in <TCP (closed) Ephemeral Worker->Worker for gather local=tcp://172.21.12.152:36842 remote=tcp://172.21.159.225:39121>: ConnectionResetError: [Errno 104] Connection reset by peer

2026-03-23 09:01:03,929 - distributed.worker - ERROR - Exception during execution of task ('simple-shuffle-3eb7ca30e5d51394af7746b3f9e0d38d', 5). Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2382, in _prepare_args_for_execution data[k] = self.data[k] File "/opt/conda/lib/python3.10/site-packages/distributed/spill.py", line 226, in __getitem__ return super().__getitem__(key) File "/opt/conda/lib/python3.10/site-packages/zict/buffer.py", line 108, in __getitem__ raise KeyError(key) KeyError: "('split-simple-shuffle-3eb7ca30e5d51394af7746b3f9e0d38d', 5, 3)" During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2259, in execute args2, kwargs2 = self._prepare_args_for_execution(ts, args, kwargs) File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2386, in _prepare_args_for_execution data[k] = Actor(type(self.state.actors[k]), self.address, k, self) KeyError: "('split-simple-shuffle-3eb7ca30e5d51394af7746b3f9e0d38d', 5, 3)"

2026-03-23 07:15:12,392 - distributed.worker - INFO - -------------------------------------------------

2026-03-23 07:15:12,391 - distributed.worker - INFO - Registered to: tcp://dask-scheduler:8786

2026-03-23 07:15:11,196 - distributed.worker - INFO - -------------------------------------------------

2026-03-23 07:15:11,196 - distributed.worker - INFO - Local Directory: /tmp/dask-worker-space/worker-_lotqtwc

2026-03-23 07:15:11,196 - distributed.worker - INFO - Memory: 3.73 GiB

2026-03-23 07:15:11,196 - distributed.worker - INFO - Threads: 1

2026-03-23 07:15:11,195 - distributed.worker - INFO - -------------------------------------------------

2026-03-23 07:15:11,195 - distributed.worker - INFO - Waiting to connect to: tcp://dask-scheduler:8786

2026-03-23 07:15:11,195 - distributed.worker - INFO - dashboard at: 172.21.12.152:8790

2026-03-23 07:15:11,195 - distributed.worker - INFO - Listening to: tcp://172.21.12.152:46333

2026-03-23 07:15:11,195 - distributed.worker - INFO - Start worker at: tcp://172.21.12.152:46333