hello r/tensorflow. As a backend engineer, I am very unfamiliar with tensorflow and ML in general, so please forgive me if this question seems unreasonable to you.
Because of the need of my lab, I’ve been looking for a solution for tensorflow orchestration. We have one server with a powerful GPU, and several users who want to run their tensorflow jobs on that powerful GPU. Instead of making schedules offline and individually log in to the server, is there any open source project I can deploy to the server that serves as an orchestrator?
For example, it provides a simple WebUI to let the user upload their job and all necessary files. Then the user submits the job to add it to a queue, which will run when it’s the first in the line. It will also report the progress and the result of the job.
I think there should be some kind of open-sourced project out there that fits this need, but I haven’t found it yet. So please help.