By Christopher Pepe, Dragon of the West
My favorite class in college was Neural networks for non-linear control systems which was way out of my league but I wanted it, so I powered through. I majored in mechanical engineering and studied computer science and electrical engineering because of a passion for robotics. This class blew my mind. Life being what it is, I went off to build a career which took me away from machine learning until recently. Things have come a long way in the intervening years and I wanted to recreate the approach we used in this class.
The animations that we created were what most captured my imagination because they would show the controller learning and improving.
My old code is unreadable. Between my decade and a half of improved coding style, and the early NN libraries that we used, I didn't glean much what code I still have. I spent a while trying to untangle that mess before just doing it from memory and filling in the gaps with frog DNA.
I created a simple universe in which a rocket that can only move along the Y-axis. It is given the task of moving from its current position to a goal state along a "minimum jerk based trajectory." The rocket has a single engine that can fire downward. Since it has a mass of 1000kg it needs to produce a thrust of 9800N to hover (I think, it has been a while). Anything less than that and the rocket will lose altitude, anything more and it will gain altitude.
This required some serious way back machining to build the physics engine which tracks the current position, velocity, and acceleration of the rocket for a given timestep. Acceleration is calculated and velocity, and position are integrated from there.
While this approach requires iterating over time, this update method is an easy way to measure the effect of changing the thrust over the timeSteps. Any thrust profile can be plugged in to see what the rocket would do. For example, one could leave the thruster off of 2s and then apply 10,000N of thrust. In that scenario, the rocket would free fall for 2s and then continue to fall before overcoming gravity and gaining altitude.
The code below was used to allow the trained neural network controller to fly the rocket and try to achieve its goal. Given the current state (time to goal, position, distance to goal) the network outputs a thrust to apply for the next time step. Because everything in the neural network is normalized we have to scale the output via hugeify(). My rocket is called dragon too.
Naturally, I tuned all of the hyperparameters for the best results. I learned a bit about why this network performs well with those parameters. I again found that it took a surprisingly few number of neurons to achieve the goal, and more neurons generally didn't improve things. Increasing the batch size improved the quality of the output (I assume because the network could see more of the time series in each training epoch). The dramatic change in learning takes places around 19 epochs, and beyond 100 epochs there is no real improvement in learning (presumably overfitting at this point).
The path is simple. Starting at a height of 15m, moving at 0 m/s, fly up to 50m over the course of 45s. The minimum jerk trajectory to achieve that goal looks like this
and after 100 epochs of training the neural network produced this trajectory like this
What I trained on is the thrust for a given state (time to goal, position, distance to goal) and the thrust curve doesn't look as nice (by picking and choosing it can look better). The left is my calculated minimum jerk trajectory, the right is the output of the neural controller.
Here is an overly simplified animation of the resultant trajectories over a number of training epochs. Each path briefly flashes to the screen before the animation plays. Each marker remains at its final position so you can see how each run compares.
If you are curious, here is the output from the network after 100 epochs. The code is here. This was our approach in the aforementioned class. Generate tons of positional data and then animate it rather than have the simulation and controller working together in real time. I plan to move to a framework like OpenAI's Gym for future work.
For the animation, I wanted to use an actual rocket looking figure, and scale fire coming out of the nozzle proportional to the magnitude of thrust but will tackle better animation next time.
I really wanted to approach this problem with a recurrent neural network since it is a time series problem as presented. I was unable to figure that out in the time I had for this effort but will revisit it. I think an approach like a text generator would work but need to noodle through some of the challenges unique to this problem.
Finally, I was happy to have a network that output a thrust rather than something like an acceleration or displacement but how I got there was weak. Generating training data requires solving for the physics to determine the "correct" thrust and this is flawed in a few ways:
- if we can perfectly model the system we don't need a NN controller to try to learn the dynamics
- this system cannot handle the dynamics changing (e.g. mass changing from burning fuel) or a hundred geese landing on the rockets nose
Another area of interest is replacing the trajectory generator with yet another neural network. My current thinking is that I can use a Generative Adversarial Network (GAN) to train a generator on the trajectory creation skill and feed in actual minimum jerk trajectories to the discriminator. There are still problems with this approach but it will be worth the exercise if only to build a successful GAN.
Anyway, another fun toy project completed.