Model Name: Reinforcement Learning Car Agent

Algorithm: Proximal Policy Optimization (PPO)

Environment: AWS DeepRacer Track

Reward Function:

Objective: Encourage the car to follow the centerline. Structure: Markers: Marker 1: Closest to center (high reward) Marker 2: Moderate distance (medium reward) Marker 3: Far distance (low reward) Speed Bonus: Additional reward for speeds above 2.0 m/s. Penalty: Significant penalty for going off track or too far from the centerline. Training Duration:

Initial training for 1-2 hours; monitor performance and adjust as needed based on lap times and reward metrics. Observations:

Monitor lap times, off-track incidents, and average rewards during training. Adjust reward values and thresholds based on model performance; experiment with speed thresholds to optimize lap times. Next Steps:

Consider cloning the model after initial training for further improvements. Submit to the leaderboard for performance evaluation against peers.