2025-12-06 01:00:46,615 - distributed.worker - ERROR - Exception during execution of task ('sort_values-5952167633b7cd4496705a360fbe5405', 6). Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2382, in _prepare_args_for_execution data[k] = self.data[k] File "/opt/conda/lib/python3.10/site-packages/distributed/spill.py", line 226, in __getitem__ return super().__getitem__(key) File "/opt/conda/lib/python3.10/site-packages/zict/buffer.py", line 108, in __getitem__ raise KeyError(key) KeyError: "('drop_by_shallow_copy-e3f702fb33ea97b5e71a9e068d3b5c18', 6)" During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2259, in execute args2, kwargs2 = self._prepare_args_for_execution(ts, args, kwargs) File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2386, in _prepare_args_for_execution data[k] = Actor(type(self.state.actors[k]), self.address, k, self) KeyError: "('drop_by_shallow_copy-e3f702fb33ea97b5e71a9e068d3b5c18', 6)"
2025-12-06 01:00:37,453 - distributed.worker - WARNING - Compute Failed Key: ('shuffle-p2p-d1ae0c101f210d7c5f8ea8c7dc3cf426', 6) Function: shuffle_unpack args: ('8f4adf35dc92d235c8357c5e96d5cfa6', 6, 21309) kwargs: {} Exception: "RuntimeError('shuffle_unpack failed during shuffle 8f4adf35dc92d235c8357c5e96d5cfa6')"
2025-12-05 08:03:42,254 - distributed.worker - WARNING - Compute Failed Key: ('hash-join-503a0712c875f18d8e1f083a31665de2', 14) Function: merge_unpack args: ('6a467b9020e603a14a42abd7e57c8fc7', '07e04b3958ae560a7202e7743536d103', 14, 20562, 20564, 'inner', 'hashed_source_and_date', 'hashed_source_and_date', <distributed.protocol.serialize.Serialized object at 0x7fd67a16cc70>, ['_x', '_y']) kwargs: {} Exception: "RuntimeError('Worker tcp://172.21.25.6:38605 left during active shuffle 6a467b9020e603a14a42abd7e57c8fc7')"
2025-12-05 08:03:15,694 - distributed.worker - WARNING - Compute Failed Key: ('hash-join-c9bcbabf5e1d277f0382eee9abe4bff6', 10) Function: merge_unpack args: ('e1f614eaa1c52cb9d9911c29f12ec66e', '6c9e712e0ae70a6452d1b9451a663bfe', 10, 20548, 20550, 'inner', 'hashed_source_and_date', 'hashed_source_and_date', <distributed.protocol.serialize.Serialized object at 0x7fd65dd27bb0>, ['_x', '_y']) kwargs: {} Exception: "RuntimeError('Worker tcp://172.21.25.101:36249 left during active shuffle e1f614eaa1c52cb9d9911c29f12ec66e')"
2025-12-05 08:02:48,785 - distributed.worker - WARNING - Compute Failed Key: ('hash-join-5c8f964fdea0da526c6019829c1f7484', 10) Function: merge_unpack args: ('5ee862de629847ea081687dc1b08a1f6', '83920b22fbab96b1dff20ca3efb7ad13', 10, 20534, 20536, 'inner', 'hashed_source_and_date', 'hashed_source_and_date', <distributed.protocol.serialize.Serialized object at 0x7fd67a16e680>, ['_x', '_y']) kwargs: {} Exception: "RuntimeError('Worker tcp://172.21.25.101:46203 left during active shuffle 5ee862de629847ea081687dc1b08a1f6')"
2025-12-05 08:02:22,636 - distributed.worker - WARNING - Compute Failed Key: ('hash-join-ac76a017c1999ccc3fa2be6896153632', 10) Function: merge_unpack args: ('7e7c8f7fd24ded13a690a4aa92887301', '453d74adcc7180dab29103a2b434236c', 10, 20520, 20522, 'inner', 'hashed_source_and_date', 'hashed_source_and_date', <distributed.protocol.serialize.Serialized object at 0x7fd63ce761d0>, ['_x', '_y']) kwargs: {} Exception: "RuntimeError('Worker tcp://172.21.25.101:32879 left during active shuffle 7e7c8f7fd24ded13a690a4aa92887301')"
2025-12-05 08:01:59,144 - distributed.worker - ERROR - Worker stream died during communication: tcp://172.21.25.101:42193 Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/tornado/iostream.py", line 869, in _read_to_buffer bytes_read = self.read_from_fd(buf) File "/opt/conda/lib/python3.10/site-packages/tornado/iostream.py", line 1138, in read_from_fd return self.socket.recv_into(buf, len(buf)) ConnectionResetError: [Errno 104] Connection reset by peer The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2066, in gather_dep response = await get_data_from_worker( File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2892, in get_data_from_worker response = await send_recv( File "/opt/conda/lib/python3.10/site-packages/distributed/core.py", line 1024, in send_recv response = await comm.read(deserializers=deserializers) File "/opt/conda/lib/python3.10/site-packages/distributed/comm/tcp.py", line 241, in read convert_stream_closed_error(self, e) File "/opt/conda/lib/python3.10/site-packages/distributed/comm/tcp.py", line 142, in convert_stream_closed_error raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc distributed.comm.core.CommClosedError: in <TCP (closed) Ephemeral Worker->Worker for gather local=tcp://172.21.159.210:53530 remote=tcp://172.21.25.101:42193>: ConnectionResetError: [Errno 104] Connection reset by peer
2025-12-05 06:31:33,018 - distributed.worker - WARNING - Compute Failed Key: shuffle-barrier-2690b3fa24a763d2ad62478112e17d8c Function: shuffle_barrier args: ('2690b3fa24a763d2ad62478112e17d8c', [20325, 20325, 20325, 20325, 20325, 20325, 20325, 20325, 20325, 20325, 20325, 20325, 20325, 20325, 20325, 20325]) kwargs: {} Exception: "RuntimeError('shuffle_barrier failed during shuffle 2690b3fa24a763d2ad62478112e17d8c')"
2025-12-04 13:02:05,173 - distributed.worker - WARNING - Compute Failed Key: shuffle-barrier-520f47582da650d63e845a0a5d37f79b Function: shuffle_barrier args: ('520f47582da650d63e845a0a5d37f79b', [19525, 19525, 19525, 19525, 19525, 19525, 19525, 19525, 19525, 19525, 19525, 19525, 19525, 19525, 19525, 19525]) kwargs: {} Exception: "RuntimeError('shuffle_barrier failed during shuffle 520f47582da650d63e845a0a5d37f79b')"
2025-12-04 08:30:58,857 - distributed.worker - ERROR - Worker stream died during communication: tcp://172.21.25.101:40477 Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/tornado/iostream.py", line 869, in _read_to_buffer bytes_read = self.read_from_fd(buf) File "/opt/conda/lib/python3.10/site-packages/tornado/iostream.py", line 1138, in read_from_fd return self.socket.recv_into(buf, len(buf)) ConnectionResetError: [Errno 104] Connection reset by peer The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2066, in gather_dep response = await get_data_from_worker( File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2892, in get_data_from_worker response = await send_recv( File "/opt/conda/lib/python3.10/site-packages/distributed/core.py", line 1024, in send_recv response = await comm.read(deserializers=deserializers) File "/opt/conda/lib/python3.10/site-packages/distributed/comm/tcp.py", line 241, in read convert_stream_closed_error(self, e) File "/opt/conda/lib/python3.10/site-packages/distributed/comm/tcp.py", line 142, in convert_stream_closed_error raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc distributed.comm.core.CommClosedError: in <TCP (closed) Ephemeral Worker->Worker for gather local=tcp://172.21.159.210:38092 remote=tcp://172.21.25.101:40477>: ConnectionResetError: [Errno 104] Connection reset by peer
2025-12-03 04:00:54,478 - distributed.worker - INFO - -------------------------------------------------
2025-12-03 04:00:54,478 - distributed.worker - INFO - Registered to: tcp://dask-scheduler:8786
2025-12-03 04:00:54,033 - distributed.worker - INFO - -------------------------------------------------
2025-12-03 04:00:54,033 - distributed.worker - INFO - Local Directory: /tmp/dask-worker-space/worker-yueccuj8
2025-12-03 04:00:54,033 - distributed.worker - INFO - Memory: 3.73 GiB
2025-12-03 04:00:54,033 - distributed.worker - INFO - Threads: 1
2025-12-03 04:00:54,033 - distributed.worker - INFO - -------------------------------------------------
2025-12-03 04:00:54,033 - distributed.worker - INFO - Waiting to connect to: tcp://dask-scheduler:8786
2025-12-03 04:00:54,033 - distributed.worker - INFO - dashboard at: 172.21.159.210:8790
2025-12-03 04:00:54,033 - distributed.worker - INFO - Listening to: tcp://172.21.159.210:37695
2025-12-03 04:00:54,033 - distributed.worker - INFO - Start worker at: tcp://172.21.159.210:37695