1】Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations延世大學論文地址:https://arxiv