I'm looking train my own custom model, but I'm struggling through it lol. Following the ultralytics tutorial (Train Custom Data), I have about 1000 images labeled and exported with roboflow. After some trial and error I successfully ran the train.py script that seems to work and produces a "best.pt."
Googling it, I came up with a starting batch number of 32 and epochs number of 50. I also choose the Yolov5m.pt for one reason or another. Do those all seem like good starting points?
I used the following command:
The data.yaml file points to my training images in the following directories:
Currently, I use almost all of my images for training and only have 3 images each in validation and testing (fwiw, 3 images clearly aren't representative of all of the classes/labels). With only 1000 images I'm thinking I want as many used for training as possible, but train.py results in an error if there no val and test images.
Lastly, if trying to use CPAI .NET module, I convert the resulting best.pt to onnx using the following command:
Is that correct?
Thanks
Edit: There also seems to be an issue with high CPU usage with the model in use (even though it's using CPAI 6.2 CUDA version). There are random spikes to 15%-25%, rather than 2-3%. Would that be caused by some of these settings? Maybe they're a tougher read than MikeLud's models?
Googling it, I came up with a starting batch number of 32 and epochs number of 50. I also choose the Yolov5m.pt for one reason or another. Do those all seem like good starting points?
I used the following command:
python3 train.py --img 640 --batch 32 --epochs 50 --data data.yaml --weights yolov5m.pt --cache ram --cache disk
The data.yaml file points to my training images in the following directories:
data.yaml:
train: ../train/images
val: ../valid/images
test: ../test/images
Currently, I use almost all of my images for training and only have 3 images each in validation and testing (fwiw, 3 images clearly aren't representative of all of the classes/labels). With only 1000 images I'm thinking I want as many used for training as possible, but train.py results in an error if there no val and test images.
Lastly, if trying to use CPAI .NET module, I convert the resulting best.pt to onnx using the following command:
python3 export.py --weights best.pt --include torchscript onnx
Is that correct?
Thanks
Edit: There also seems to be an issue with high CPU usage with the model in use (even though it's using CPAI 6.2 CUDA version). There are random spikes to 15%-25%, rather than 2-3%. Would that be caused by some of these settings? Maybe they're a tougher read than MikeLud's models?
Last edited: