Multi-modal AI

AI that mimics human intelligence can address broader enterprise needs

We humans perceive, process, attend and act up on signals received by multiple senses simultaneously. We are also able to call upon prior knowledge from multiple domains to make appropriate inferences. While AI has made great advancements in vision, language and audio domains, there hasn’t been much work on combining models from different domains into an integrated whole. Can we combine all the three signals for better modeling of ground truth?

At Aganitha, we find ourselves very frequently combining signals from multiple domains in order to effectively address the business problems our customers have been referring to us. Hence, multi-modal AI, cutting across CV, NLP and Speech is a very important area of focus for us. Here are some of the research questions we have been looking into:

We expect multi-modal AI to rapidly evolve and tackle these questions.