Help training a model, batch and epochs #s?

EvanVanVan

Pulling my weight
Jul 29, 2022
152
105
NJ
I'm looking train my own custom model, but I'm struggling through it lol. Following the ultralytics tutorial (Train Custom Data), I have about 1000 images labeled and exported with roboflow. After some trial and error I successfully ran the train.py script that seems to work and produces a "best.pt."

Googling it, I came up with a starting batch number of 32 and epochs number of 50. I also choose the Yolov5m.pt for one reason or another. Do those all seem like good starting points?

I used the following command:

python3 train.py --img 640 --batch 32 --epochs 50 --data data.yaml --weights yolov5m.pt --cache ram --cache disk

The data.yaml file points to my training images in the following directories:

data.yaml:
train: ../train/images
val: ../valid/images
test: ../test/images

Currently, I use almost all of my images for training and only have 3 images each in validation and testing (fwiw, 3 images clearly aren't representative of all of the classes/labels). With only 1000 images I'm thinking I want as many used for training as possible, but train.py results in an error if there no val and test images.

Lastly, if trying to use CPAI .NET module, I convert the resulting best.pt to onnx using the following command:

python3 export.py --weights best.pt --include torchscript onnx

Is that correct?

Thanks

Edit: There also seems to be an issue with high CPU usage with the model in use (even though it's using CPAI 6.2 CUDA version). There are random spikes to 15%-25%, rather than 2-3%. Would that be caused by some of these settings? Maybe they're a tougher read than MikeLud's models?
 
Last edited:
Wow, I just put the model in service this morning and I'm already seeing some pretty impressive results! Of course Deer are defintely over represented in the model lol. (Again, "best.pt" is just the default name of the model.)

deer 1.jpg
The only model to identify a deer.

deer 2.jpg
The only model to correctly identify a deer (and at a high confidence).

deer 3.jpg
Identifying half a deer lol.

I don't mean this as a dig at @MikeLud1 's models but it does demonstrate the benefits of training a model with your own images that the AI will more easily be able to match to your recording.
 
Last edited:
  • Like
Reactions: CanCuba
Wow, I just put the model in service this morning and I'm already seeing some pretty impressive results! Of course Deer are defintely over represented in the model lol. (Again, "best.pt" is just the default name of the model.)

View attachment 165161
The only model to identify a deer.

View attachment 165162
The only model to correctly identify a deer (and at a high confidence).

View attachment 165164
Identifying half a deer lol.

I don't mean this as a dig at @MikeLud1 's models but it does demonstrate the benefits of training a model with your own images that the AI will more easily be able to match to your recording.
Use the below to convert to ONNX
Code:
python3 export.py --weights best.pt --include onnx
 
  • Like
Reactions: EvanVanVan
Currently, I use almost all of my images for training and only have 3 images each in validation and testing (fwiw, 3 images clearly aren't representative of all of the classes/labels). With only 1000 images I'm thinking I want as many used for training as possible, but train.py results in an error if there no val and test images.
Increase your validation and testing images to about 100 each. This should give you better training results.
 
  • Like
Reactions: EvanVanVan
The testing image give feedback to the training so it can improve the training. After thinking you can have only about 25 images for validation because it does not help to improve the training.
 
  • Like
Reactions: EvanVanVan
Very cool, thank you for the advice!

Any idea why the CPU usage would be so much higher with my own model (with I'm assuming few images) than yours?
 
Very cool, thank you for the advice!

Any idea why the CPU usage would be so much higher with my own model (with I'm assuming few images) than yours?
When you trained the model you used yolov5m.pt, this could be why. All my models I trained using yolov5s.pt. Post a screenshot of the results like the below.

1686262456287.png

1686262611501.png
 
  • Like
Reactions: EvanVanVan
Yeah, it could be yolov5m.pt, my model is 3x the size of yours (the same size difference as yolov5s.pt vs yolov5m.pt).

Lol please go easy on me, I don't know what any of these graphs means.

Here is the original with only 3 val and test pictures:
results.png

I reran the training with yolov5s.pt and 25 val and test images each:
results-25.png

Lastly, while I was doing that I found something on the roboflow site recommending a 70/20/10 split so here's that one too lol:
results70-20-10.png

Annoying, there doesn't seem to be anyway to shuffle the dataset on roboflow? So for like the 25 val/test version, they were almost exclusively bear pictures instead of a random representation. The 70-20-10 split is a little better.
 
I normally train with a 90-10 split, you do not need the validation images. you can also try training with 100 epochs. All my models are trained with 75 to 200 epochs using as much as 100,000 images and can take over 24 hours to train using a RTX 3090 GPU.

Also below is another graph to tell how well you model trained, the closer to 1.0 the better the models is.

1686269826190.png
 
When training you what to see the graph that are highlighted in blue going down (lower numbers) and the graphs highlighted in green going up. If the model is not improve then the best.pt does not get updated.

1686270459971.png
 
  • Like
Reactions: EvanVanVan
Sounds good, thank you.

100K images is insane!
Most of the images I use for training are from the COCO dataset that contains over 330k images and I use FiftyOne to extract the image I need.
 
  • Like
Reactions: EvanVanVan
Most of the images I use for training are from the COCO dataset that contains over 330k images and I use FiftyOne to extract the image I need.

When I'm (re)training a model after adding more images to the dataset should I use --weights yolov5s.pt again, like I did originally? or would using --weights best.pt continue to improve/build off the existing model? (Obviously, I dont really understand what the pretrained yolov5s.pt model is. lol)
 
Use ether D1 or VGA for your Sub Stream. When training it does not stretch 480px it fills the rest of the image with the color gray

View attachment 171915

View attachment 171916

Didn't want to clog up the BI DAT tool thread, but I've racked my brain with this very problem so I'll ask/continue it here lol.

I've spent a lot of time wondering what aspect ratio/sized images are best to train my model on....Here is an original image and 3 examples preprocessing trying to get the most accurate image to train on.

Starting with this ORIGINAL image:
original.jpg

1. Pre-crop the "long" edge, square without changing the height - (keeps a 1:1 image ratio. After shrinking to 640x640, keeps accurate/proportionate sized objects the camera/AI will check)
square2.jpg

2. Fill-in pixels to the short edge - (keeps the image square, but once the image gets shrunk to 640x640, the resulting objects are smaller than in reality (and smaller than the camera/AI will see/check in practice.))
filled-in2.jpg

3. No manual pre-processing at all. Roboflow will/could squish the image square - (Super unrealistic skinny objects to train on.)
squished2.jpg

As much work as manually cropping images square is (compared to letting Roboflow "fill-in" or "squish" automatically during pre-processing), so far I've figured it'll give my model the best data to train on.

In regards to using 704x480 (D1) substream images, I still think there would be some benefit to cropping it square 480x480, then enlarging the image to 640x640 in both dimensions. That way, proportionally, the objects would still take up a realisictic amount of pixels/screen space.

Thoughts?
 

Attachments

  • filled-in.jpg
    filled-in.jpg
    3.6 MB · Views: 6
  • square.jpg
    square.jpg
    1.9 MB · Views: 5
  • squished.jpg
    squished.jpg
    2.2 MB · Views: 5
Last edited:
When training it would best to use images preprocessed with both option 1 and 2 (undistorted) because when using the model the image will be formatted like preprocess 2 (undistorted) .
If you resize the images 480x480 the model will decrease in accuracy