Any image analytics approach to this is going to require calibration of course, unless your camera has the ability to determine real world sizes (this area has seen quite a bit of development recently, but not in IP cams!).
Some day I'd like to try building software that can determine vehicle speed from a camera pointed at a road. Although I am not enough of a mathematician or computer vision expert to do a
really good job of it, I think I could probably make a speedometer that is pretty close to reality, especially if the camera is positioned carefully.
For example I have this camera pointed at the road going past my house and I think it would be pretty good for measuring vehicle speed.
Some object detection code could be used to track a vehicle between multiple frames and determine the distance traveled in pixels. Then just multiply the number of pixels times the real world size of a pixel and you have a real world distance. Divide by the time between the two frames, and you have a speed. No markers required. No high frame rate required.
One hard part is knowing the amount of real world space covered by every pixel. This will vary across the frame, and also vary with the object's distance from the camera. A simple program, the kind I would write, would just assume that every pixel in the frame was the same size, and this would contribute to lower accuracy. That would probably be okay for a camera like mine which is a fairly narrow view and close to a 90 degree angle from the road:
Of course, there are actually two roads in this camera's view, and one of those roads is at a worse angle to the camera. It would be much tougher to accurately measure the speed of vehicles on that road.
You also need to know very accurately and precisely how much time elapsed between the two video frames. Like, how many milliseconds apart the frames are. A vehicle going 30 MPH moves 1 foot in just 23 milliseconds, so being off just a little bit in the frame timing would have a huge impact on the speed calculation. If the camera isn't capable of providing that level of precision, then you would need to average the speed over many frames in order to cancel out the timing errors.
So yeah, there's a lot of complexity here. It really makes me appreciate a piece of software that claims to be able to measure speed across multiple lanes of traffic.
Doing the same job at night also becomes a lot harder because you would need much smarter object detection code to figure out the vehicle outline when there are headlights and taillights on.