Day 2•2:05 PM•25m

Evaluating an LLM's ability to code

In this talk I want to share the dimensions I like to look at LLM's from, the path I've taken so far at building testing capabilities, and how much more there is to do to continue research in this space.

Benchmarks will come out and show that some new model is the best at coding, but that doesn't always align with real world usage. This was frustrating to me, and led me down a path of trying to build a way to evaluate in a way that would mimic real world usage.

Edge

The Landing (Ground floor)

talk

Raleon

Adam Larson

Founder and CTO

Adam started programming professionally in 2005 in the video game industry. He specifically joined a startup as a rounding engineer. He spent 7 years working on games of various sizes from BurgerTime World Tour to Batman Arkham Asylum. From 2012 - 2018, Adam built and ran an agency focused on Mobile and web technology. Along the way he cofounded and was CTO of several startups. His agency was purchased in 2018. From 2018 - 2022 he worked in Fintech seeing nCino go through its IPO in 2020. When he left he was Direct of Engineering Analytics where he was responsible for building out nCino's data platform as well as its AI strategy. In 2022, he left nCino to start Raleon that is focused on using AI to rethink the direct to consumer marketing space.