If you haven’t, check out the release post: Introducing geminilive4s .
Today, geminilive4s 0.3.0 has been released, which simplfies the usage API while introducing a few useful changes:
- Let Gemini start the conversation.
- Manual voice activity.
Let’s dive into them.
Let Gemini start the conversation
Imagine that you are building a support agent phone line, typically, as long as the call is connected, you would hear the sound from the agent to explain you what’s going on. Still, Gemini has to wait for messages before speaking. Yes, even if there goes 30 seconds with no input, Gemini will wait patiently.
Now, setting geminiMustSpeakFirst = true
causes Gemini to start the conversation, this is done by sending a Hello
message automatically after starting the conversation, this simple trick has worked surprisignly well:
gemini.conversationPipe(geminiMustSpeakFirst = true)
Manual voice activity
Gemini has Voice Activity Detection enabled by default which means that Gemini stops speaking when it detects that the user is speaking
(see Change voice activity detection settings), still, there could be scenarios where this needs to be handled outside of Gemini, which can be done by setting disableAutomaticActivityDetection = true
:
val config = GeminiConfig(
...
// We disable the automatic voice detection since we are sending manual signals
disableAutomaticActivityDetection = true,
// required to disable VAD
customApiVersion = Some(GeminiCustomApi.V1Alpha)
)
This requires sending the activity Start
to let Gemini know that a user started speaking, and, activity End
to signal that the user stopped speaking and its now Gemini’s turn:
// user started speaking
GeminiInputChunk(new Array(0), Some(GeminiInputChunk.ActivityEvent.Start))
// user is done speaking
GeminiInputChunk(new Array(0), Some(GeminiInputChunk.ActivityEvent.Start))
This also requires listening to Gemini output events for the turnComplete
flag set to true
which Gemini uses to signal that it is done speaking, in this example, we listen to Gemini’s output, signaling the activity Start
event when Gemini completed its turn, which causes Gemini to pay attention to the user:
geminiOutputStream
.filter(_.turnComplete)
.map { _ =>
GeminiInputChunk(
new Array(0),
Some(GeminiInputChunk.ActivityEvent.Start)
)
}
.through(startTurnTopic.publish)
This snippet is part of a new example where the user signals that it is done speaking by pressing Enter
: TakeManualTurns.scala
✅ Connected to Gemini Live API
Microphone recording started...
Speaker started...
Sending wake up signal
🤖 Gemini has finished speaking. You can speak now. Press [ENTER] when you are done
🤫 You manually ended your turn. Gemini is now responding...
🤖 Gemini has finished speaking. You can speak now. Press [ENTER] when you are done
🤫 You manually ended your turn. Gemini is now responding...
🤖 Gemini has finished speaking. You can speak now. Press [ENTER] when you are done
🤫 You manually ended your turn. Gemini is now responding...
🤖 Gemini has finished speaking. You can speak now. Press [ENTER] when you are done
🤫 You manually ended your turn. Gemini is now responding...
🤖 Gemini has finished speaking. You can speak now. Press [ENTER] when you are done
🤫 You manually ended your turn. Gemini is now responding...
🤖 Gemini has finished speaking. You can speak now. Press [ENTER] when you are done
Bonus
With these changes, it is possible to put Gemini on both sides, here they are discussing about Laminar vs Slinky for Scala.js: