• Sharing my my notebook for the Google AI Assistants For Data Task With Gemma competition. The notebook (link at the end) covers the basic building blocks to adapt LLMs for your own use case: Here is an excerpt of the main findings so far: Dataset generation, RAG with ColBERT and query strategy yield the best…

    Read more

  • MLLMs (Multi Modal Large Language Models) such as GPT-4V and Gemini are able to ingest data in multiple modalities such as: text, video, sound and images. Personally, one of the most useful applications of MLLMs is UI navigation. As a SWE, you could have an web based agent that runs Gherkin-like syntax tests without having…

    Read more

  • The Generative Internet

    Generative AI is opening the doors to new products. It’s great to witness in real time how the Internet is evolving and to imagine what it might become in the coming years. Search, UI navigation and content generation are the trident technologies leading this evolution. Search The Internet acts as the largest repository of knowledge…

    Read more

  • Working with LLMs is shifting from human-machine interactions to human-machine and machine-machine interactions. This allows LLMs to do ever more complex tasks. This new interactivity has been coined as AI agent. Threaded conversations lack structure to complete complex tasks. Therefore, objective divergence is a common issue with AI agents. Objective divergence is the equivalent of…

    Read more

  • LLMs have a limited input they can generate and output. In retrieval augmented generation (RAG) applications, a set of documents is first retrieved and added to the input alongside an instruction, thus creating the prompt. This is referred to as in-context learning. We can draw an analogy with computer architectures here: the LLM is the…

    Read more

  • 🚀 Excited to share new work on “Harnessing Retrieval-Augmented Generation (RAG) for Uncovering Knowledge Gaps“. In this paper, I simulate how users search the Internet but instead of searching for content that exists through traditional information retrieval methods, we search for the most relevant content, even if it doesn’t exist.  Therefore, information retrieval shifts from…

    Read more

  • Adept’s mission to enable computers to interact with UIs will enhance our productivity and save time. I have been eagerly awaiting their ACT-1 model for quite some time. However, while that is being developed, they have released FUYU, a multimodal LLM or MLMM. The term ‘multimodal’ implies its ability to process both text and images.…

    Read more