If you have been using Java libraries for a while, I’m confident that code like this looks familiar to you, right?
genai.types.LiveConnectConfig
.builder()
.inputAudioTranscription(
genai.types.AudioTranscriptionConfig.builder().build()
)
.outputAudioTranscription(
genai.types.AudioTranscriptionConfig.builder().build()
)
.responseModalities(genai.types.Modality.Known.AUDIO)
.systemInstruction(...)
.speechConfig(...)
.tools(tool)
.temperature(0.7f)
.enableAffectiveDialog(true)
.proactivity(
genai.types.ProactivityConfig.builder().proactiveAudio(true).build()
)
.build()
Snippet from geminilive4s-0.2.0
Given that geminilive4s
is a library, it can’t assume that everyone using it requires the same set of arguments, if I wanted to allow configuring proactivity
, I could surely receive that as an argument and just do this:
.proactivity(
genai.types.ProactivityConfig.builder().proactiveAudio(proactiveAudio).build()
)
Not so fast, Google’s genai
crashes at runtime because it does not like getting unexpected parameters:
[info] Live session closed with code: 1007 and reason: Invalid JSON payload received. Unknown name "proactivity" at 'setup': Cannot find field.
[error] com.google.genai.errors.GenAiIOException: WebSocket closed unexpectedly: Invalid JSON payload received. Unknown name "proactivity" at 'setup': Cannot find field.
[error] at com.google.genai.AsyncLive$GenAiWebSocketClient.onClose(AsyncLive.java:234)
[error] at org.java_websocket.client.WebSocketClient.onWebsocketClose(WebSocketClient.java:688)
[error] at org.java_websocket.WebSocketImpl.closeConnection(WebSocketImpl.java:557)
[error] at org.java_websocket.WebSocketImpl.eot(WebSocketImpl.java:612)
[error] at org.java_websocket.client.WebSocketClient.run(WebSocketClient.java:546)
[error] at java.base/java.lang.Thread.run(Thread.java:1583)
This fails because the parameter is only accepted when v1alpha
API is enabled, the same problem occurs if I want to conditionally enable transcripts like:
...
.inputAudioTranscription(
genai.types.AudioTranscriptionConfig.builder().build()
)
A typical way to resolve this is by using a base builder + the mutation applied conditionally, like:
// apply the default values
val base = genai.types.LiveConnectConfig.builder()
if (proactiveAudio)
base.proactivity(
genai.types.ProactivityConfig.builder().proactiveAudio(proactiveAudio).build()
)
else base
But, we have a ton of configurable arguments, it is insane to write base
then baseWithProactivity
then baseWithTranscription
and so on.
We’ll overcome to this by leveraging Scala functions, let’s define a transformation function that conditionally applies a given change to the builder object:
type Builder = LiveConnectConfig.Builder
def transform(when: Boolean)(f: Builder => Builder)(builder: Builder): Builder =
if (when) f(builder)
else builder
Then, we can define all conditional transformations:
val options = List(
transform(params.disableAutomaticActivityDetection)(
_.realtimeInputConfig(
RealtimeInputConfig
.builder()
.automaticActivityDetection(
AutomaticActivityDetection.builder().disabled(true).build()
)
.build()
)
),
transform(params.inputAudioTranscription)(
_.inputAudioTranscription(AudioTranscriptionConfig.builder().build())
),
transform(params.outputAudioTranscription)(
_.outputAudioTranscription(AudioTranscriptionConfig.builder().build())
),
transform(params.enableAffectiveDialog)(_.enableAffectiveDialog(true)),
transform(params.proactivity)(
_.proactivity(ProactivityConfig.builder().proactiveAudio(true).build())
)
)
Finally, we apply these transformations sequentially:
// apply the default values
val base = genai.types.LiveConnectConfig.builder()
options
.foldLeft(base) { case (builder, apply) => apply(builder) }
.build()
The nice builder usage now looks like this:
def make(params: GeminiConfig): LiveConnectConfig = {
def transform(when: Boolean)(
f: LiveConnectConfig.Builder => LiveConnectConfig.Builder
)(builder: LiveConnectConfig.Builder): LiveConnectConfig.Builder = {
if (when) f(builder) else builder
}
val options = List(
transform(params.outputAudioTranscription)(
_.outputAudioTranscription(AudioTranscriptionConfig.builder().build())
),
transform(params.enableAffectiveDialog)(_.enableAffectiveDialog(true)),
// ... more transformation follow
)
val base = LiveConnectConfig
.builder()
.responseModalities(Modality.Known.AUDIO)
// ... more defaults follow
options
.foldLeft(base) { case (builder, apply) => apply(builder) }
.build()
}
Snippet from geminilive4s snapshot
Like you can see, we can now easily extend this to support more configuration options while keeping code readable.