Tamás Takács
9 min read
International Olympiad in Artificial Intelligence (IOAI) 2025
To be honest, we got off to a late start this year. Planning should’ve kicked off right after the IAIO Olympiad wrapped up, but last year was so intense that everyone really needed a break, and maybe we took that a bit too seriously. It wasn’t until January that we started pulling the team back together, and even then, only in bits and pieces. After months of pause, some team members had moved on, so we had definitely faced a rough and rocky beginning to this year’s organizing journey.
The National Olympiad
Since we started organizing later than planned, our communication with schools, students, and sponsors was also delayed. This meant we had to quickly jump into every communication channel we could, trying to reach as many people as possible in a short time. On top of that, we kicked off the year with just four core team members, which meant extra effort was needed just to stay coordinated among ourselves.
For student preparation, we opted for a more asynchronous approach this year. Many of the organizers are juggling teaching duties and PhD work, making regular live sessions difficult to sustain. So instead, we curated a broad set of topics, including Machine Learning, Deep Learning, Computer Vision, NLP, Reinforcement Learning, and various AI tools, and created a wide range of practice materials in Hungarian. These were sent to students to work on at their own pace and were designed to resemble both past Olympiad tasks and national qualifier problems.
Our Discord server is thankfully still alive and well. There’s a small group of students who are quite active there, which definitely lifts a weight off my shoulders—I often feel like I’m not the best at online communication with students, so it’s encouraging to see engagement. Hopefully we can keep that momentum going.
Besides the national on-site qualifier, we also launched a small Kaggle competition as an extra step in the selection process. This online challenge gave students two hours to solve a task without heavy supervision. The goal was to see how well they could navigate Kaggle, open a notebook, run it, and tweak the starter code provided. The task was a binary classification problem evaluated by ROC AUC, and students competed live on a public leaderboard. The dataset was fully synthetic and centered around university admissions—a little side project I built myself for this purpose.

Snippet of the Leaderboard. Photo: Me 😊.
Online Round
The code behind the task was admittedly a bit convoluted, but intentionally so. I aimed to create a problem that was approachable yet challenging, where students could achieve a high ROC AUC score with smart application of ML models. I incorporated geospatial data using a GeoJSON file and included a key feature: the distance between a student’s home and the university they were applying to. In general, the closer someone lived to the university, the better their chances of admission.
Beyond that, I added a mix of features, some that had no real impact on the outcome, like gender or age, and others that had either positive or negative correlations with admission. I also introduced correlations between features using proxy variables, so the data wasn’t fully independent. From this setup, I generated a final admission score, applying a fixed threshold to select the top 8% and a probabilistic threshold for the next 33% to introduce a bit of noise into the dataset.
def random_county():
"""50% Zipf, 50% teljesen véletlenszerű megyeloszlás."""
county_list = list(county_centroids.keys())
if np.random.rand() < 0.5:
zipf_indices = np.random.zipf(a=1.3, size=len(county_list))
zipf_indices = np.clip(zipf_indices, 1, len(county_list))
return county_list[zipf_indices[np.random.randint(0, len(zipf_indices))] - 1]
else:
return np.random.choice(county_list)
def gen_score(a, t=0.5):
raw = np.clip(np.random.beta(1 + a * 10 + t * 3, 1 + (1 - a) * 3), 0, 1)
return int(np.round(raw * 100))
for _ in range(num_students):
age = np.random.randint(17, 20)
gender = np.random.choice([0, 1])
academic_apt = np.clip(np.random.normal(0.6, 0.15), 0, 1)
socioeconomic = np.clip(np.random.normal(0.6, 0.2), 0, 1)
motivation = np.clip(np.random.normal(0.6, 0.2), 0, 1)
humanities_tendency = np.clip(np.random.normal(academic_apt * 0.8 + motivation * 0.2, 0.1), 0, 1)
science_tendency = np.clip(np.random.normal(academic_apt * 0.8 + (1 - motivation) * 0.2, 0.1), 0, 1)
avg_grades = {
f'Osztályzat_{grade}': round(np.interp(np.random.beta(1 + academic_apt * 4, 2), [0, 1], [2, 5]) * 4) / 4
for grade in range(9, 13)
}
history = gen_score(academic_apt, humanities_tendency)
math = gen_score(academic_apt, science_tendency)
hungarian = gen_score(academic_apt, humanities_tendency)
chosen_science = np.random.choice(science_subjects, p=[0.5, 0.2, 0.3])
chosen_language = np.random.choice(language_subjects, p=[0.75, 0.25])
sci_scores = {s: gen_score(academic_apt, science_tendency) if s == chosen_science else -1 for s in science_subjects}
lang_scores = {s: gen_score(academic_apt) if s == chosen_language else -1 for s in language_subjects}
emelt_subjects = np.random.choice(all_possible_emelt_subjects, size=2, replace=False,
p=[0.2, 0.05, 0.1, 0.25, 0.05, 0.1, 0.15, 0.1])
emelt = {}
for subject in all_possible_emelt_subjects:
if subject in ['Matematika', 'Történelem', 'Magyar Nyelv és Irodalom']:
emelt[subject + '_emelt'] = 1 if subject in emelt_subjects else 0
elif subject in science_subjects and sci_scores[subject] != -1:
emelt[subject + '_emelt'] = 1 if subject in emelt_subjects else 0
elif subject in language_subjects and lang_scores[subject] != -1:
emelt[subject + '_emelt'] = 1 if subject in emelt_subjects else 0
else:
emelt[subject + '_emelt'] = -1
parental_edu = int(np.clip(np.round(socioeconomic * 4 + np.random.normal(0, 0.5)), 1, 5))
prestige = np.round(np.clip(socioeconomic * 8 + np.random.normal(1, 1), 1, 10), 2)
extracurriculars = int(np.random.rand() < humanities_tendency)
habits = int(np.clip(np.round(motivation * 7 + np.random.normal(0, 1)), 1, 8))
work_exp = int(np.random.rand() < (0.05 + 0.1 * (1 - socioeconomic)))
recommendation = int(np.clip(np.random.poisson(1.5 + motivation), 1, 6))
competitions = int(np.random.rand() < (0.1 + 0.5 * academic_apt * motivation))
county = random_county()
lat, lon = county_centroids[county]
distance = haversine(lat, lon, *uni_coords)
stress = 1 - np.tanh(0.4 * habits + 1.2 * competitions + np.random.normal(0, 0.2))
stress = np.clip(stress, 0, 1)
data.append({
'Életkor': age, 'Nem': gender, **avg_grades,
'Történelem': history, 'Matematika': math, 'Magyar Nyelv és Irodalom': hungarian,
**sci_scores, **lang_scores, **emelt,
'Szülői Végzettség': parental_edu, 'Középiskola Presztízse': prestige,
'Extrakurrikuláris Tevékenységek': extracurriculars, 'Tanulási Szokások': habits,
'Munkatapasztalat': work_exp, 'Ajánlások Száma': recommendation,
'Versenyeken Való Részvétel': competitions, 'Vármegye': county,
'Távolság': distance, 'Stressz Szint': stress
})
df = pd.DataFrame(data)
Code: Feature Engineering.
The scores were calculated using a nonlinear formula, using several nonlinear functions into the final score. This design meant that more complex models were required to capture the underlying patterns effectively.
score_list = []
county_bias_map = {'Budapest Vármegye': 0.5, 'Zala Vármegye': -0.2}
for _, r in df_score.iterrows():
bias = county_bias_map.get(r['Vármegye'], 0)
score = (
6.0 * np.tanh(r['Matematika']) +
4.5 * r['Matematika_emelt'] * r['Matematika'] +
3.0 * r['Fizika'] +
3.0 * r['Informatika'] +
3.0 * r['Informatika_emelt'] * r['Informatika'] +
2.0 * r['Fizika_emelt'] * r['Fizika'] +
2.0 * r['Versenyeken Való Részvétel'] +
2.0 * np.tanh(r['Ajánlások Száma'] / 2) +
2.5 * np.sin(r['Tanulási Szokások'] * np.pi / 8) +
2.0 * r['Középiskola Presztízse'] +
1.5 * np.sqrt(r['Osztályzat_11'] + 1e-3) +
2.0 * np.sqrt(r['Osztályzat_12'] + 1e-3) +
0.8 * r['Munkatapasztalat'] -
4.0 * np.sqrt(r['Távolság']) -
2.5 * r['Stressz Szint'] +
bias
)
score_list.append(score)
scores = np.array(score_list)
Code: Score Calculation.
The Kaggle competition went well, and we ended up with a very promising leaderboard. It’s important to note that the final scores from this online round didn’t carry over to the on-site National Qualifier, it served primarily as a filtering stage to ensure participants could meet deadlines and handle the basics. Shortly after the online round, we held the on-site qualifier for the remaining 14 students, who faced challenges across key areas: Machine Learning, Computer Vision, Natural Language Processing, and Reinforcement Learning.
Side Quest
I had the lovely opportunity to give a talk about the Olympiads at the 2025 INFO Éra Conference in Hajdúszoboszló. My 30-minute presentation focused on student success stories, the opportunities these events create, and the broader mission behind our work.
The organizers were truly some of the kindest people I’ve met, we were warmly welcomed, treated to delicious food, and even surprised with some backstage pálinka to wrap up the session. It was a perfect way to end our section of the event. Beyond the lecture itself, it was also a great chance for me and my colleagues to travel a bit and promote what we do in a relaxed, welcoming environment.

Photo: László Gulyás (ELTE).
National Qualifier
Summer Camp
The National Olympiad
This year’s International Olympiad in Artificial Intelligence will take place in Beijing, China, from August 2nd to 9th. The format has undergone a major shift, moving from a team-based competition to a fully individual one. However, the collaborative spirit lives on through an additional team round, which, while not influencing individual scores, encourages creativity.
A major highlight of this year’s competition is the use of NVIDIA’s Isaac Sim platform. Students will work on embedded AI applications using this tool and present their solutions during the event.
Another key change is that the take-home exercises no longer contribute to the final score. Despite this, I believe they remain a valuable component of the Olympiad. Given that most students are not formally introduced to AI in high school, these preparatory tasks provide much-needed context and practice. For further details, refer to the organizer’s website: [IOAI 2025]
TBC…
1659 Words
04/30/2025 (Last updated: 2025-04-30 18:57:57 +0200)
e94dc04 @ 2025-04-30 18:57:57 +0200
← Back to posts