2026-03-23 11:31:24,963 - distributed.worker - ERROR - Worker stream died during communication: tcp://172.21.159.225:39121 Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/tornado/iostream.py", line 869, in _read_to_buffer bytes_read = self.read_from_fd(buf) File "/opt/conda/lib/python3.10/site-packages/tornado/iostream.py", line 1138, in read_from_fd return self.socket.recv_into(buf, len(buf)) ConnectionResetError: [Errno 104] Connection reset by peer The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2066, in gather_dep response = await get_data_from_worker( File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2892, in get_data_from_worker response = await send_recv( File "/opt/conda/lib/python3.10/site-packages/distributed/core.py", line 1024, in send_recv response = await comm.read(deserializers=deserializers) File "/opt/conda/lib/python3.10/site-packages/distributed/comm/tcp.py", line 241, in read convert_stream_closed_error(self, e) File "/opt/conda/lib/python3.10/site-packages/distributed/comm/tcp.py", line 142, in convert_stream_closed_error raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc distributed.comm.core.CommClosedError: in <TCP (closed) Ephemeral Worker->Worker for gather local=tcp://172.21.160.14:49644 remote=tcp://172.21.159.225:39121>: ConnectionResetError: [Errno 104] Connection reset by peer

2026-03-23 09:31:41,329 - distributed.worker - WARNING - Compute Failed Key: shuffle-barrier-3c1be3a46f7dfb9b8d9f95ff4e32154d Function: shuffle_barrier args: ('3c1be3a46f7dfb9b8d9f95ff4e32154d', [133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133, 133]) kwargs: {} Exception: "RuntimeError('shuffle_barrier failed during shuffle 3c1be3a46f7dfb9b8d9f95ff4e32154d')"

2026-03-23 07:15:34,007 - distributed.worker - INFO - -------------------------------------------------

2026-03-23 07:15:34,007 - distributed.worker - INFO - Registered to: tcp://dask-scheduler:8786

2026-03-23 07:15:32,673 - distributed.worker - INFO - -------------------------------------------------

2026-03-23 07:15:32,673 - distributed.worker - INFO - Local Directory: /tmp/dask-worker-space/worker-i1st2_pp

2026-03-23 07:15:32,673 - distributed.worker - INFO - Memory: 3.73 GiB

2026-03-23 07:15:32,673 - distributed.worker - INFO - Threads: 1

2026-03-23 07:15:32,673 - distributed.worker - INFO - -------------------------------------------------

2026-03-23 07:15:32,673 - distributed.worker - INFO - Waiting to connect to: tcp://dask-scheduler:8786

2026-03-23 07:15:32,673 - distributed.worker - INFO - dashboard at: 172.21.160.14:8790

2026-03-23 07:15:32,672 - distributed.worker - INFO - Listening to: tcp://172.21.160.14:45603

2026-03-23 07:15:32,672 - distributed.worker - INFO - Start worker at: tcp://172.21.160.14:45603