Design #1

New Issue

Open

opened 2026-03-25 16:01:07 +00:00 by Quaternions · 0 comments

Quaternions commented

2026-03-25 16:01:07 +00:00

Owner

Inputs:

Render 160x90 depth texture
- 4 (?) frames of depth history
Airtime & height difference to next surf/platform (sourced from bot)
- Plan next 2 surfs/platforms ahead
Velocity vector relative to camera orientation ❌ This is redundant, it is implied by the position history ✅ nvm it's rather important and not quite the same as the previous position
Position/Angles of 4 (?) previous game ticks as a sort of contextual planning memory ("ah right, that's the curve I was going for")

Outputs:

WASD Space (GameControls), mouse delta
Always one strafe tick (0.01s)

Architecture:

rust burn
reinforcement learning

Training procedure:

Use the WR bot for each map as a reference path.
Reward the agent with the wr curve progress delta. Find the closest point on the WR curve, compare to the previous best, and reward the difference. No negative reward for backwards progress. The agent gets a lump reward of 1/final_time when it reaches the finish.

Inputs: - Render 160x90 depth texture - 4 (?) frames of depth history - Airtime & height difference to next surf/platform (sourced from bot) - Plan next 2 surfs/platforms ahead - Velocity vector relative to camera orientation ❌ This is redundant, it is implied by the position history ✅ nvm it's rather important and not quite the same as the previous position - Position/Angles of 4 (?) previous game ticks as a sort of contextual planning memory ("ah right, that's the curve I was going for") Outputs: - WASD Space (GameControls), mouse delta - Always one strafe tick (0.01s) Architecture: - rust burn - reinforcement learning Training procedure: - Use the WR bot for each map as a reference path. - Reward the agent with the wr curve progress delta. Find the closest point on the WR curve, compare to the previous best, and reward the difference. No negative reward for backwards progress. The agent gets a lump reward of `1/final_time` when it reaches the finish.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: StrafesNET/strafe-ai#1