This is a curious one, and I wish I had more information or hardware configurations--but maybe somebody's run into this before and can offer a solution!
We just released a demo on Steam for our new game, and while a majority of players haven't run into this particular issue, we've noticed a small number of players mention our lip synced portraits are running at light speed, running through their entire line's worth of phonemes in a handful of frames.
You can see the issue as recorded by a streamer here: https://youtu.be/umEgGabqeYc?t=101 (time code 1:40 seconds)
We're currently using Rhubarb with a process speed of 1, and in all of our testing and for most of our players, this doesn't appear to be an issue. Does anyone have any ideas on a direction we could look in to chase this down?
It looks like you're new here. If you want to get involved, click one of these buttons!
Comments
What are your AC and Unity versions?
Rhubarb isn't a built-in supported lipsync option - how are you integrating it with AC?
Hi Chris, sorry for the delayed response!
We've been locked into AC version 1.77.3 and Unity version 2022.2.10. Our technical director says that all he added was updated text parsing to handle the rhubarb text format--he didn't modify the runtime.
We dug into the issue a bit more and believe the issue is likely tied to Unity's global timer, possibly involving overclocked CPUs. I'll quote our TD here:
"The lip sync uses Unity's global timer to figure out which phoneme to show as time passes, and when we play a dialog, it will load our Rhubarb txt file and check the time offsets from the beginning for each frame. It then adds the current global time to each of those frames.
So when the Global time passes a frame, it pulls if off the list and goes to the next frame.
I was able to reproduce the issue by forcing the timing of each frame to be zero. Since we only pull at most one phoneme off the list, that will force a new phoneme for each update / frame of the game."
At the moment, we have a half-fix in: rather than only pulling one phoneme off the list each frame, we pull all phonemes that are past due. That removes the rapid speaking issue but obviously in affected hardware configurations the VO syncing will still be way off--but it certainly looks better than everything firing off at once.
How so? By modifying AC's ProcessLipSync function? Outside of the edge case of having multiple frames share the same time value, I'm not clear on what the core issue is.