Thank you!
It runs in GPUs as far as the TF framework is concerned to. But it have a few limitations. It can't distribute the training over multiple GPUs, for example.
The distributing strategies from TF are very "gradient-descent based". This is something I want to work on next. Although the PSO algorithm has a high volume of communication between particles, and thus a major handicap for a distributed architecture, I think we can find a suitable approach to minimize this.