This text-to-image convertor aims to check
the conversion of data between the various modalities
(text, image) because of the evolution of human-machine
communication that introduced the utilization of natural
communication modalities to humans. Such as gestures,
speech, sound, and vision. In fact, one of the main
challenges of this "multimodal" learning is the learning
of a shared illustration between the distinct modalities
and the prediction of the missing knowledge ( by
retrieval or synthesis) from one conditioned modality to
another. Some researchers work on the various varieties
of conversions; Text to Speech, Speech to image or Text
to image synthesis, and vice-versa however in this paper
we tend to can focus on: image to audio image-to-text
synthesis.