Model Name: Reinforcement Learning Car Agent
Algorithm: Proximal Policy Optimization (PPO)
Environment: AWS DeepRacer Track
Reward Function:
Objective: Encourage the car to follow the centerline. Structure: Markers: Marker 1: Closest to center (high reward) Marker 2: Moderate distance (medium reward) Marker 3: Far distance (low reward) Speed Bonus: Additional reward for speeds above 2.0 m/s. Penalty: Significant penalty for going off track or too far from the centerline. Training Duration:
Initial training for 1-2 hours; monitor performance and adjust as needed based on lap times and reward metrics. Observations:
Monitor lap times, off-track incidents, and average rewards during training. Adjust reward values and thresholds based on model performance; experiment with speed thresholds to optimize lap times. Next Steps:
Consider cloning the model after initial training for further improvements. Submit to the leaderboard for performance evaluation against peers.