2026-01-19 15:02:53,592 - distributed.worker - WARNING - Compute Failed Key: ('shuffle-transfer-ea0a90b691c3e303b96e36db26de1837', 12) Function: shuffle_transfer args: ( source_id timestamp ... date_diff_ms _partitions 0 2794848552 1768833777346 ... 398.0 2 1 2794848621 1768833808004 ... 410.0 1 2 2794848552 1768833808700 ... 652.0 2 3 2794848621 1768833813761 ... 436.0 1 4 2794848552 1768833813854 ... 499.0 2 ... ... ... ... ... ... 2071205 2794848605 1768833151105 ... 738.0 0 2071206 2794848523 1768833152797 ... 205.0 7 2071207 2794848523 1768833155750 ... 204.0 7 2071208 2794848523 1768833156564 ... 203.0 7 2071209 2794848523 1768833157379 ... 205.0 7 [2071210 rows x 5 columns], 'ea0a90b691c3e303b96e36db26de1837', 12, 16, '_partitions') kwargs: {} Exception: "RuntimeError('shuffle_transfer failed during shuffle ea0a90b691c3e303b96e36db26de1837')"
2026-01-19 15:02:38,644 - distributed.worker - WARNING - Compute Failed Key: ('shuffle-transfer-dd0235e2b443afff522d4ae5c408aeda', 10) Function: shuffle_transfer args: ( schema_version detector_id ... heading_diff_cur_and_next _partitions 338 1 2794848456 ... 0.7 11 338 1 2794848514 ... 0.0 11 338 1 2794848504 ... 0.2 14 338 1 2794848504 ... 0.0 15 338 1 2794848459 ... 0.0 4 .. ... ... ... ... ... 371 1 2794848489 ... 0.3 13 371 1 2794848489 ... 0.0 15 371 1 2794848459 ... 0.2 3 371 1 2794848516 ... 0.0 2 371 1 2794848516 ... 0.0 4 [544 rows x 21 columns], 'dd0235e2b443afff522d4ae5c408aeda', 10, 16, '_par kwargs: {} Exception: "RuntimeError('shuffle_transfer failed during shuffle dd0235e2b443afff522d4ae5c408aeda')"
2026-01-19 15:01:29,918 - distributed.worker - WARNING - Compute Failed Key: ('shuffle-transfer-19b050aaaf35e9e0332052c16d1adfa7', 12) Function: shuffle_transfer args: ( schema_version detector_id ... heading_diff_cur_and_next _partitions 406 1 2794848459 ... 0.1 4 406 1 2794848514 ... 0.3 12 406 1 2794848504 ... 0.0 13 406 1 2794848469 ... -0.1 13 406 1 2794848469 ... -0.1 12 .. ... ... ... ... ... 439 1 2794848514 ... 0.2 8 439 1 2794848470 ... 0.0 13 439 1 2794848504 ... 0.2 15 439 1 2794848504 ... -3.0 13 439 1 2794848450 ... 0.1 9 [544 rows x 21 columns], '19b050aaaf35e9e0332052c16d1adfa7', 12, 16, '_par kwargs: {} Exception: "RuntimeError('shuffle_transfer failed during shuffle 19b050aaaf35e9e0332052c16d1adfa7')"
2026-01-19 05:31:20,883 - distributed.worker - WARNING - Compute Failed Key: shuffle-barrier-5a497579a30c9658a4d67f6581a2f0a5 Function: shuffle_barrier args: ('5a497579a30c9658a4d67f6581a2f0a5', [6032, 6032, 6032, 6032, 6032, 6032, 6032, 6032, 6032, 6032, 6032, 6032, 6032, 6032, 6032, 6032]) kwargs: {} Exception: "RuntimeError('shuffle_barrier failed during shuffle 5a497579a30c9658a4d67f6581a2f0a5')"
2026-01-19 05:30:50,265 - distributed.worker - ERROR - Worker stream died during communication: tcp://172.21.159.228:35277 Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/tornado/iostream.py", line 869, in _read_to_buffer bytes_read = self.read_from_fd(buf) File "/opt/conda/lib/python3.10/site-packages/tornado/iostream.py", line 1138, in read_from_fd return self.socket.recv_into(buf, len(buf)) ConnectionResetError: [Errno 104] Connection reset by peer The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2066, in gather_dep response = await get_data_from_worker( File "/opt/conda/lib/python3.10/site-packages/distributed/worker.py", line 2892, in get_data_from_worker response = await send_recv( File "/opt/conda/lib/python3.10/site-packages/distributed/core.py", line 1024, in send_recv response = await comm.read(deserializers=deserializers) File "/opt/conda/lib/python3.10/site-packages/distributed/comm/tcp.py", line 241, in read convert_stream_closed_error(self, e) File "/opt/conda/lib/python3.10/site-packages/distributed/comm/tcp.py", line 142, in convert_stream_closed_error raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc distributed.comm.core.CommClosedError: in <TCP (closed) Ephemeral Worker->Worker for gather local=tcp://172.21.25.68:55104 remote=tcp://172.21.159.228:35277>: ConnectionResetError: [Errno 104] Connection reset by peer
2026-01-18 10:32:01,765 - distributed.worker - INFO - -------------------------------------------------
2026-01-18 10:32:01,764 - distributed.worker - INFO - Registered to: tcp://dask-scheduler:8786
2026-01-18 10:32:01,333 - distributed.worker - INFO - -------------------------------------------------
2026-01-18 10:32:01,333 - distributed.worker - INFO - Local Directory: /tmp/dask-worker-space/worker-ye1xvbkk
2026-01-18 10:32:01,333 - distributed.worker - INFO - Memory: 3.73 GiB
2026-01-18 10:32:01,333 - distributed.worker - INFO - Threads: 1
2026-01-18 10:32:01,333 - distributed.worker - INFO - -------------------------------------------------
2026-01-18 10:32:01,333 - distributed.worker - INFO - Waiting to connect to: tcp://dask-scheduler:8786
2026-01-18 10:32:01,333 - distributed.worker - INFO - dashboard at: 172.21.25.68:8790
2026-01-18 10:32:01,333 - distributed.worker - INFO - Listening to: tcp://172.21.25.68:33971
2026-01-18 10:32:01,333 - distributed.worker - INFO - Start worker at: tcp://172.21.25.68:33971