Joan's Tech Stuff

April 1, 2024

Kaggle Docs QA With Gemma – Data Collection + Dataset Generation + Fine Tuning + RAG + ColBERT Re-Ranker + Evaluation

Sharing my my notebook for the Google AI Assistants For Data Task With Gemma competition. The notebook (link at the end) covers the basic building blocks to adapt LLMs for your own use case: Here is an excerpt of the main findings so far: Dataset generation, RAG with ColBERT and query strategy yield the best…
Read more
February 20, 2024

VQA and Image Chunking for MLLMs (GPT-4V and Gemini)

MLLMs (Multi Modal Large Language Models) such as GPT-4V and Gemini are able to ingest data in multiple modalities such as: text, video, sound and images. Personally, one of the most useful applications of MLLMs is UI navigation. As a SWE, you could have an web based agent that runs Gherkin-like syntax tests without having…
Read more
February 14, 2024

The Generative Internet

Generative AI is opening the doors to new products. It’s great to witness in real time how the Internet is evolving and to imagine what it might become in the coming years. Search, UI navigation and content generation are the trident technologies leading this evolution. Search The Internet acts as the largest repository of knowledge…
Read more
January 22, 2024

🚧 Constraining AI agents increases reliability

Working with LLMs is shifting from human-machine interactions to human-machine and machine-machine interactions. This allows LLMs to do ever more complex tasks. This new interactivity has been coined as AI agent. Threaded conversations lack structure to complete complex tasks. Therefore, objective divergence is a common issue with AI agents. Objective divergence is the equivalent of…
Read more
January 13, 2024

Document Chunking for RAG (Retrieval Augmented Generation)

LLMs have a limited input they can generate and output. In retrieval augmented generation (RAG) applications, a set of documents is first retrieved and added to the input alongside an instruction, thus creating the prompt. This is referred to as in-context learning. We can draw an analogy with computer architectures here: the LLM is the…
Read more
December 21, 2023

📖 Can LLMs find knowledge gaps in the Internet?

🚀 Excited to share new work on “Harnessing Retrieval-Augmented Generation (RAG) for Uncovering Knowledge Gaps“. In this paper, I simulate how users search the Internet but instead of searching for content that exists through traditional information retrieval methods, we search for the most relevant content, even if it doesn’t exist. Therefore, information retrieval shifts from…
Read more
October 19, 2023

How can Adept’s FUYU be used for UI Navigation?

Adept’s mission to enable computers to interact with UIs will enhance our productivity and save time. I have been eagerly awaiting their ACT-1 model for quite some time. However, while that is being developed, they have released FUYU, a multimodal LLM or MLMM. The term ‘multimodal’ implies its ability to process both text and images.…
Read more