gpt-omni/mini-omni

Architecture difference from technical report?

Closed this issue · 10 comments

in technical report, architecture of TTS Adapter is additional 6 transformer blocks,
image

but in config in link, can not see such architecture. maybe post_adapter?

but no post_adapter in config and checkpoint.

Hi, @seaplus296 you are right, post_adapter is the tts_adapter. But the open source version do not include tts_adapter.

@mini-omni Then, is paper version has 30-layer transformer for tts-output? which means do not sharing lm head between code vocab and lm vocab?

By the way, I think it would be good to notify that model here is not reported in technical report if it's different....

@mini-omni Then, is paper version has 30-layer transformer for tts-output? which means do not sharing lm head between code vocab and lm vocab?

yes, we add extra 6-layer for tts-adapter, and they do not share lm_head (even for the version without tts-adapter)

By the way, I think it would be good to notify that model here is not reported in technical report if it's different....

thanks for your suggestion, I'll add it in the README.

@mini-omni Thanks. Finally is there any performance difference between report's architecture and open-source version? Why architecture changed?

@mini-omni Thanks. Finally is there any performance difference between report's architecture and open-source version? Why architecture changed?

The open-source version is the one used for synchronous development and experimental comparison. In our subjective evaluation, the version with the TTS-adapter has less impact on text processing capabilities.

@mini-omni Thank you for kind replies.

@mini-omni sorry, i have some follow-up question. If open-source version's tts adapter is just vocab expansion how it is trained in stage 1? only expanded parameters?

the open-source version does not include tts-adapter, and the tts-adapter is mainly adding more layers.