Agent Skill
2/7/2026

push-to-df-trans

Run the 'push to df_trans' workflow on the remote M3 Ultra machine. This skill should be used when the user asks to push new trading days to df_trans, update transaction data, or run known_good_version.py on the remote server. Handles SSH connection, PYTHONHASHSEED requirements, and sequential day processing.

T
tankygranny05
0GitHub Stars
2Views
npx skills add tankygranny05/agent-box

SKILL.md

Namepush-to-df-trans
DescriptionRun the 'push to df_trans' workflow on the remote M3 Ultra machine. This skill should be used when the user asks to push new trading days to df_trans, update transaction data, or run known_good_version.py on the remote server. Handles SSH connection, PYTHONHASHSEED requirements, and sequential day processing.

name: push-to-df-trans description: Run the 'push to df_trans' workflow on the remote M3 Ultra machine. This skill should be used when the user asks to push new trading days to df_trans, update transaction data, or run known_good_version.py on the remote server. Handles SSH connection, PYTHONHASHSEED requirements, and sequential day processing.

Push to df_trans (Remote M3 Ultra)

[Created by Opus: 7c0d2c3a-dd9f-4ce3-a12a-7ac30231a354]

Overview

This skill executes the known_good_version.py workflow on a remote M3 Ultra machine to push new trading days into the df_trans_numerical.pickle file. The workflow labels transactions with hardcoded insider annotations and updates the main transaction database.

Critical Requirements

PYTHONHASHSEED=0 is MANDATORY

The remote pickle files were created with deterministic hashing (PYTHONHASHSEED=0). Running without this environment variable causes a KeyError because hash values for stk/group/type strings won't match the stored hashes.

Symptom of missing PYTHONHASHSEED:

KeyError: -319328615819148925

Solution: Always prefix the command with PYTHONHASHSEED=0.

Sequential Execution Required

Each run modifies the same pickle file (/db23/pickles/df_trans_numerical.pickle). Days must be processed sequentially, never in parallel, to avoid data corruption.

Connection Details

ParameterValue
SSH Commandssh -p 22222 14.161.37.223
User Aliascm3u (defined in ~/.zshrc)
Python Path~/anaconda3/bin/python
Project Dir~/PycharmProjects/remote_to_m3_ultra
Scriptworkflows/known_good_version.py

File Locations

Remote Machine (M3 Ultra)

FilePath
Main script~/PycharmProjects/remote_to_m3_ultra/workflows/known_good_version.py
Transaction data/db23/pickles/df_trans_numerical.pickle (~10.8 GB)
String mappings/db23/pickles/dic_s2i_and_dic_i2s.pickle (~336 MB)
New day files/db23/df_new_day_numerical/YYYY_MM_DD.pickle
Backup/db23/df_trans_backups/df_trans_numerical.pickle

Local Machine

FilePath
Script (synced via PyCharm)/Users/sotola/PycharmProjects/remote_to_m3_ultra/workflows/known_good_version.py

Workflow

Single Day Execution

To process a single day:

ssh -p 22222 14.161.37.223 "cd ~/PycharmProjects/remote_to_m3_ultra && \
  PYTHONHASHSEED=0 ~/anaconda3/bin/python workflows/known_good_version.py --day YYYY_MM_DD"

Example:

ssh -p 22222 14.161.37.223 "cd ~/PycharmProjects/remote_to_m3_ultra && \
  PYTHONHASHSEED=0 ~/anaconda3/bin/python workflows/known_good_version.py --day 2025_12_03"

Multiple Days (Sequential)

When processing multiple days, execute them one at a time in chronological order. Wait for each command to complete before starting the next.

Example for days 2025_12_03 through 2025_12_05:

# Day 1
ssh -p 22222 14.161.37.223 "cd ~/PycharmProjects/remote_to_m3_ultra && \
  PYTHONHASHSEED=0 ~/anaconda3/bin/python workflows/known_good_version.py --day 2025_12_03"

# Day 2 (run after Day 1 completes)
ssh -p 22222 14.161.37.223 "cd ~/PycharmProjects/remote_to_m3_ultra && \
  PYTHONHASHSEED=0 ~/anaconda3/bin/python workflows/known_good_version.py --day 2025_12_04"

# Day 3 (run after Day 2 completes)
ssh -p 22222 14.161.37.223 "cd ~/PycharmProjects/remote_to_m3_ultra && \
  PYTHONHASHSEED=0 ~/anaconda3/bin/python workflows/known_good_version.py --day 2025_12_05"

Day Format

Days can be specified in either format (the script normalizes them):

  • 2025_12_03 (preferred)
  • 20251203

Auto-detect Latest Day

To automatically process the latest available day file:

ssh -p 22222 14.161.37.223 "cd ~/PycharmProjects/remote_to_m3_ultra && \
  PYTHONHASHSEED=0 ~/anaconda3/bin/python workflows/known_good_version.py"

(Omit --day flag and set DAY = None in the script)

Expected Output

A successful run shows:

  1. GlobalVariables initialization with file timestamps
  2. Loading df_trans (~2 seconds for 10+ GB)
  3. Building label mappings for group, type, insider, type_ah
  4. Adding hardcoded labels (50 entries)
  5. Pushing the new day
  6. Applying mappings and flagging insider trades
  7. Writing updated pickle (~2-3 seconds)
  8. Total runtime: ~55-60 seconds per day

Success indicators:

  • [√] Passed for sanity checks
  • Added 50 hardcoded label entries
  • Final write_pickle for df_trans_numerical.pickle

Syncing Local Changes to Remote

If local script changes need to be synced to remote:

scp -P 22222 /Users/sotola/PycharmProjects/remote_to_m3_ultra/workflows/known_good_version.py \
  14.161.37.223:~/PycharmProjects/remote_to_m3_ultra/workflows/known_good_version.py

Troubleshooting

ModuleNotFoundError: No module named 'lib'

The script needs the REPL-friendly import patch at the top. Ensure the local script has:

import sys
from pathlib import Path

try:
    ROOT = Path(__file__).resolve().parents[1]
    for path in (ROOT,):
        path_str = str(path)
        if path_str not in sys.path:
            sys.path.insert(0, path_str)
except Exception:
    """REPL environments don't set __file__."""
    pass

Then sync to remote.

KeyError with large negative number

Missing PYTHONHASHSEED=0. Always include it in the command.

command not found: python

Use the full anaconda path: ~/anaconda3/bin/python (not python or python3).

Skills Info
Original Name:push-to-df-transAuthor:tankygranny05