There is the DeepStack forum where folks into that sort of thing are doing those types of mods.
Learn, share and discuss on the DeepStack Platform
forum.deepstack.cc
Training the models is "just" taking your photos and delineating what you want it to recognize. I haven't done that yet, but someone here did for plates and he said it took a couple hours to delineate around 100 photos.
docs.deepstack.cc
The main processing is in the live implementation of it. Obviously the more cameras, MP and cameras you are using for DeepStack, the more powerful the machine needed LOL.