My suggestion is to try first just to grab the screen, with correct codec that support your cpu/gpu for encoding.
When this is correct and working fine, you could start to do the exact the same with the sound, just grab the sound and encode that with correct codec.
Then you combine the video and the sound stream together. And see if that works well.
Don't start to solve to problems together. That could also add more problems to your first problem.
Now a-days i have seen that all stream services uses av1 to encode the video stream.
Like netflix, facebook, youtube, twitch...and so on...
I am getting older and no longer keep up with developments, that happens around me.
This blow my mind then i realize this, and in Sweden we have some IP TV broadcast companies that now change there IP TOP Set box strategy, to just quit with mpeg and just go over to av1, throw away the old boxes and invest in new ones that support av1.
Then you have a IP SET TOP BOX with Android with WIFI6 with lots of app support.
So you can install your self many apps in your own IP SET TOP BOX.
Changing from mpeg4 to av1, that because it's free (Not like mpeg2/mpeg4(H264)) and I have read about it that's faster and take smaller bandwidth. But what i have read about av1 is that ours cpu/gpu don't support that codec in hardware so the cpu have to process that like software.(I have not confirmed that yet). Or ours software like ffmpeg don't support encoding/decoding av1 in hardware.
So next gen, stream players like google stream cast uses av1 to decode the stream.
Here some links to read more about it:
https://en.wikipedia.org/wiki/AV1https://netflixtechblog.com/bringing-av1-streaming-to-netflix-members-tvs-b7fc88e42320https://www.tomshardware.com/news/intel-av1-encoder-for-cpusHave a Best hacking with our community open source software and codecs.