Good morning!
Search Google for a program called "Whisper" from OpenAI -- it's a good read based on your question.
Secondly, no "smart" phones comprehend a word of what you're saying. Instead, what they do is record "X" number of seconds of audio and send that audio clip into an actual server (such as Google for Android phones) where it is then converted/translated/etc. and the text is then sent back to the phone. This isn't a standard... it's control, it's marketing (it's other things too, but we won't go there on the forum.)
Web browsers are content translators -- you tell them WHERE to go, they tell you what they found... that's actually a nutshell of a browser, though it goes into much larger detail depending on what they found from the site you're visiting. If you want Windows (for example) to discern what you just said, you have to install an app that does the same process as was described above for the phones... your actual computer simply is nowhere near as powerful as would be needed to process THAT amount of data and also be even remotely responsive as a desktop.
How many models of cars... never counted, but quite a few.
Without a steering wheel... 'had an old Jeep Cherokee that was rear-ended... (long story) It no longer had one after that day!