wandb/server

The `--git-hash` option in `wandb job create` is not working.

zfhxi opened this issue · 5 comments

I created a job using wandb local:

wandb job create git  https://xxx.git --project="TEST"  \
    --entity="username" --entry-point="main.py" --name="test1" \
    --git-hash="b7baca74dd034cb900ea0e3f48c397ea51c4c481"

the wandb local created the job in the TEST project, and the wandb-job.json:

{
    "_version": "v0",
    "source_type": "repo",
    "runtime": "3.7",
    "source": {
        "git": {
            "remote": "https://gitee.com/julianchern/wtallite",
            "commit": "b7baca74dd034cb900ea0e3f48c397ea51c4c481"
        },
        "entrypoint": [
            "python3.7",
            "main.py"
        ],
        "notebook": false
    },
    // ...
}

After that, I had modifed my codes and synced with remote repository, and the commits are as following:

$  git log --pretty=oneline -10
8a6b803c530e800cdf3304d12c6467dcfd655bf5 (HEAD -> main, origin/main) now1001
49ae7364f743c1b699d7a00f51e9805030c38c18 now1000
b7baca74dd034cb900ea0e3f48c397ea51c4c481 now1002
# ...

Then, I launched the job by pushing it to the existing queue:
image

After completing the run, I located the codes cloned from a remote repository by the wandb local server and reviewed the commit:

$ cd "/tmp/tmpavc8q10w" 
$ git log --pretty=oneline -10
8a6b803c530e800cdf3304d12c6467dcfd655bf5 (grafted, HEAD -> main, origin/main) now1001

The expected commit, as specified by --git-hash, should be b7baca74dd034cb900ea0e3f48c397ea51c4c481 rather than the HEAD commit!

The above information indicates that:

  1. The wandb local server clones the latest version of remote repository when launching the job
  2. --git-hash option in wandb job create seems to be not working.

Can anyone help solve this?

Hello! Thank you for sending this information! Could you send a link to your workspace so we can look at it? Only wandb employees will be able to view your project if this is a private project.

Also, could you verify that the launch job you created corresponds to the run id avc8q10w? Just to make sure that we are looking at the same run as the one created.

Hello! Thank you for sending this information! Could you send a link to your workspace so we can look at it? Only wandb employees will be able to view your project if this is a private project.

Also, could you verify that the launch job you created corresponds to the run id avc8q10w? Just to make sure that we are looking at the same run as the one created.

Thank you for your response. I've created a demo at https://github.com/zfhxi/test_wandb_launch_job

After hours of work, I've found this solution:

import os
import argparse
import subprocess
import sys
from git import Repo

def restart_program():
    p = subprocess.Popen([sys.executable] + sys.argv)
    p.wait()
    print("Fininshed the sub program!")
    sys.exit(0)
    
def reset_commit(repo, commit_id, workspace):
    commit = repo.commit(commit_id)
    repo.head.reset(commit=commit, index=True, working_tree=True)
    print( f"Workspace {workspace} is checkouting to {commit_id} ...")


def prerun(args):
    # Confirming if the current branch matches the specific job commit
    if bool(args.wandb_job_commit):
        repo = Repo(args.workspace)
        current_commit = repo.head.commit.hexsha
        # assert current_commit == args.wandb_job_commit, f"Current commit {current_commit} is not equal the job commit {args.wandb_job_commit}!"
        if current_commit != args.wandb_job_commit:
            print( f"Current commit {current_commit} is not equal the job commit {args.wandb_job_commit}!") # fmt: skip
            try:
                reset_commit(repo, args.wandb_job_commit)
            except Exception as e:
                print(e)
                print("Trying to fetch the latest 20 commits ...")
                origin = repo.remotes.origin
                repo.git.fetch(origin, "--depth=20")
                reset_commit(repo, args.wandb_job_commit)
            restart_program()
        else:
            print( f"Current commit {current_commit} == job commit {args.wandb_job_commit}!") # fmt: skip
    pass


if __name__=="__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument( "--wandb-job-commit", type=str, default=None, help="validating the commit hexsha") # fmt: skip
    args=parser.parse_args()
    args.workspace = os.path.dirname(os.path.abspath(__file__))

    prerun(args)
    pass
    # main codes

The codes perform the following actions:

  1. Check the current workspace's commit.
  2. Fetch the latest 20 commits from the remote repository.
  3. Switch to a specific commit.
  4. Restart the current script.

I anticipate more elegant solutions!