tamastheactual | International Olympiad in Artificial Intelligence - 2025

>/tta/

← Back to posts

Tamás Takács

9 min read

International Olympiad in Artificial Intelligence (IOAI) 2025


To be honest, we got off to a late start this year. Planning should’ve kicked off right after the IAIO Olympiad wrapped up, but last year was so intense that everyone really needed a break, and maybe we took that a bit too seriously. It wasn’t until January that we started pulling the team back together, and even then, only in bits and pieces. After months of pause, some team members had moved on, so we had definitely faced a rough and rocky beginning to this year’s organizing journey.

[NJSZT]


The National Olympiad

Since we started organizing later than planned, our communication with schools, students, and sponsors was also delayed. This meant we had to quickly jump into every communication channel we could, trying to reach as many people as possible in a short time. On top of that, we kicked off the year with just four core team members, which meant extra effort was needed just to stay coordinated among ourselves.

For student preparation, we opted for a more asynchronous approach this year. Many of the organizers are juggling teaching duties and PhD work, making regular live sessions difficult to sustain. So instead, we curated a broad set of topics, including Machine Learning, Deep Learning, Computer Vision, NLP, Reinforcement Learning, and various AI tools, and created a wide range of practice materials in Hungarian. These were sent to students to work on at their own pace and were designed to resemble both past Olympiad tasks and national qualifier problems.

Our Discord server is thankfully still alive and well. There’s a small group of students who are quite active there, which definitely lifts a weight off my shoulders—I often feel like I’m not the best at online communication with students, so it’s encouraging to see engagement. Hopefully we can keep that momentum going.

Besides the national on-site qualifier, we also launched a small Kaggle competition as an extra step in the selection process. This online challenge gave students two hours to solve a task without heavy supervision. The goal was to see how well they could navigate Kaggle, open a notebook, run it, and tweak the starter code provided. The task was a binary classification problem evaluated by ROC AUC, and students competed live on a public leaderboard. The dataset was fully synthetic and centered around university admissions—a little side project I built myself for this purpose.

Kaggle

Snippet of the Leaderboard. Photo: Me 😊.


Online Round

The code behind the task was admittedly a bit convoluted, but intentionally so. I aimed to create a problem that was approachable yet challenging, where students could achieve a high ROC AUC score with smart application of ML models. I incorporated geospatial data using a GeoJSON file and included a key feature: the distance between a student’s home and the university they were applying to. In general, the closer someone lived to the university, the better their chances of admission.

Beyond that, I added a mix of features, some that had no real impact on the outcome, like gender or age, and others that had either positive or negative correlations with admission. I also introduced correlations between features using proxy variables, so the data wasn’t fully independent. From this setup, I generated a final admission score, applying a fixed threshold to select the top 8% and a probabilistic threshold for the next 33% to introduce a bit of noise into the dataset.

def random_county():
    """50% Zipf, 50% teljesen véletlenszerű megyeloszlás."""
    county_list = list(county_centroids.keys())

    if np.random.rand() < 0.5:
        zipf_indices = np.random.zipf(a=1.3, size=len(county_list))
        zipf_indices = np.clip(zipf_indices, 1, len(county_list))
        return county_list[zipf_indices[np.random.randint(0, len(zipf_indices))] - 1]
    else:
        return np.random.choice(county_list)

def gen_score(a, t=0.5):
        raw = np.clip(np.random.beta(1 + a * 10 + t * 3, 1 + (1 - a) * 3), 0, 1)
        return int(np.round(raw * 100))

for _ in range(num_students):
    age = np.random.randint(17, 20)
    gender = np.random.choice([0, 1])

    academic_apt = np.clip(np.random.normal(0.6, 0.15), 0, 1)
    socioeconomic = np.clip(np.random.normal(0.6, 0.2), 0, 1)
    motivation = np.clip(np.random.normal(0.6, 0.2), 0, 1)
    humanities_tendency = np.clip(np.random.normal(academic_apt * 0.8 + motivation * 0.2, 0.1), 0, 1)
    science_tendency = np.clip(np.random.normal(academic_apt * 0.8 + (1 - motivation) * 0.2, 0.1), 0, 1)

    avg_grades = {
        f'Osztályzat_{grade}': round(np.interp(np.random.beta(1 + academic_apt * 4, 2), [0, 1], [2, 5]) * 4) / 4
        for grade in range(9, 13)
    }

    history = gen_score(academic_apt, humanities_tendency)
    math = gen_score(academic_apt, science_tendency)
    hungarian = gen_score(academic_apt, humanities_tendency)

    chosen_science = np.random.choice(science_subjects, p=[0.5, 0.2, 0.3])
    chosen_language = np.random.choice(language_subjects, p=[0.75, 0.25])
    sci_scores = {s: gen_score(academic_apt, science_tendency) if s == chosen_science else -1 for s in science_subjects}
    lang_scores = {s: gen_score(academic_apt) if s == chosen_language else -1 for s in language_subjects}

    emelt_subjects = np.random.choice(all_possible_emelt_subjects, size=2, replace=False,
                                      p=[0.2, 0.05, 0.1, 0.25, 0.05, 0.1, 0.15, 0.1])
    emelt = {}
    for subject in all_possible_emelt_subjects:
        if subject in ['Matematika', 'Történelem', 'Magyar Nyelv és Irodalom']:
            emelt[subject + '_emelt'] = 1 if subject in emelt_subjects else 0
        elif subject in science_subjects and sci_scores[subject] != -1:
            emelt[subject + '_emelt'] = 1 if subject in emelt_subjects else 0
        elif subject in language_subjects and lang_scores[subject] != -1:
            emelt[subject + '_emelt'] = 1 if subject in emelt_subjects else 0
        else:
            emelt[subject + '_emelt'] = -1

    parental_edu = int(np.clip(np.round(socioeconomic * 4 + np.random.normal(0, 0.5)), 1, 5))
    prestige = np.round(np.clip(socioeconomic * 8 + np.random.normal(1, 1), 1, 10), 2)
    extracurriculars = int(np.random.rand() < humanities_tendency)
    habits = int(np.clip(np.round(motivation * 7 + np.random.normal(0, 1)), 1, 8))
    work_exp = int(np.random.rand() < (0.05 + 0.1 * (1 - socioeconomic)))
    recommendation = int(np.clip(np.random.poisson(1.5 + motivation), 1, 6))
    competitions = int(np.random.rand() < (0.1 + 0.5 * academic_apt * motivation))
    county = random_county()
    lat, lon = county_centroids[county]
    distance = haversine(lat, lon, *uni_coords)

    stress = 1 - np.tanh(0.4 * habits + 1.2 * competitions + np.random.normal(0, 0.2))
    stress = np.clip(stress, 0, 1)

    data.append({
        'Életkor': age, 'Nem': gender, **avg_grades,
        'Történelem': history, 'Matematika': math, 'Magyar Nyelv és Irodalom': hungarian,
        **sci_scores, **lang_scores, **emelt,
        'Szülői Végzettség': parental_edu, 'Középiskola Presztízse': prestige,
        'Extrakurrikuláris Tevékenységek': extracurriculars, 'Tanulási Szokások': habits,
        'Munkatapasztalat': work_exp, 'Ajánlások Száma': recommendation,
        'Versenyeken Való Részvétel': competitions, 'Vármegye': county,
        'Távolság': distance, 'Stressz Szint': stress
    })

df = pd.DataFrame(data)

Code: Feature Engineering.

The scores were calculated using a nonlinear formula, using several nonlinear functions into the final score. This design meant that more complex models were required to capture the underlying patterns effectively.

score_list = []
county_bias_map = {'Budapest Vármegye': 0.5, 'Zala Vármegye': -0.2}
for _, r in df_score.iterrows():
    bias = county_bias_map.get(r['Vármegye'], 0)
    score = (
        6.0 * np.tanh(r['Matematika']) +
        4.5 * r['Matematika_emelt'] * r['Matematika'] +
        3.0 * r['Fizika'] +
        3.0 * r['Informatika'] +
        3.0 * r['Informatika_emelt'] * r['Informatika'] +
        2.0 * r['Fizika_emelt'] * r['Fizika'] +
        2.0 * r['Versenyeken Való Részvétel'] +
        2.0 * np.tanh(r['Ajánlások Száma'] / 2) +
        2.5 * np.sin(r['Tanulási Szokások'] * np.pi / 8) +
        2.0 * r['Középiskola Presztízse'] +
        1.5 * np.sqrt(r['Osztályzat_11'] + 1e-3) +
        2.0 * np.sqrt(r['Osztályzat_12'] + 1e-3) +
        0.8 * r['Munkatapasztalat'] -
        4.0 * np.sqrt(r['Távolság']) -
        2.5 * r['Stressz Szint'] +
        bias
    )
    score_list.append(score)
scores = np.array(score_list)

Code: Score Calculation.

The Kaggle competition went well, and we ended up with a very promising leaderboard. It’s important to note that the final scores from this online round didn’t carry over to the on-site National Qualifier, it served primarily as a filtering stage to ensure participants could meet deadlines and handle the basics. Shortly after the online round, we held the on-site qualifier for the remaining 14 students, who faced challenges across key areas: Machine Learning, Computer Vision, Natural Language Processing, and Reinforcement Learning.

Side Quest

I had the lovely opportunity to give a talk about the Olympiads at the 2025 INFO Éra Conference in Hajdúszoboszló. My 30-minute presentation focused on student success stories, the opportunities these events create, and the broader mission behind our work.

The organizers were truly some of the kindest people I’ve met, we were warmly welcomed, treated to delicious food, and even surprised with some backstage pálinka to wrap up the session. It was a perfect way to end our section of the event. Beyond the lecture itself, it was also a great chance for me and my colleagues to travel a bit and promote what we do in a relaxed, welcoming environment.

InfoEra

Photo: László Gulyás (ELTE).


National Qualifier


Summer Camp


The National Olympiad

This year’s International Olympiad in Artificial Intelligence will take place in Beijing, China, from August 2nd to 9th. The format has undergone a major shift, moving from a team-based competition to a fully individual one. However, the collaborative spirit lives on through an additional team round, which, while not influencing individual scores, encourages creativity.

A major highlight of this year’s competition is the use of NVIDIA’s Isaac Sim platform. Students will work on embedded AI applications using this tool and present their solutions during the event.

Another key change is that the take-home exercises no longer contribute to the final score. Despite this, I believe they remain a valuable component of the Olympiad. Given that most students are not formally introduced to AI in high school, these preparatory tasks provide much-needed context and practice. For further details, refer to the organizer’s website: [IOAI 2025]


TBC…

1659 Words

04/30/2025 (Last updated: 2025-04-30 18:57:57 +0200)

e94dc04 @ 2025-04-30 18:57:57 +0200


← Back to posts
Licensed under  CC BY-NC-ND 4.0 . © Tamás Takács, 2025.
  • International Artificial Intelligence Olympiad - 2024

    International Artificial Intelligence Olympiad - 2024

    Following the event in Bulgaria, a new international competition emerged, hosted by the Kingdom of Saudi Arabia. As new participants, we were welcomed with the same remarkable generosity as in the previous event. While the competition topics largely remained the same, math problems and reinforcement learning were added. The event is taking place in Riyadh, the capital of Saudi Arabia and one of the country’s most advanced financial hubs.

    Published on: August 29

    Additional tags:

  • International Olympiad in Artificial Intelligence - 2024

    International Olympiad in Artificial Intelligence - 2024

    As a first-time mentor and participant in the 2024 AI Olympiad held in Burgas, Bulgaria, I gained invaluable insights into teaching high school students and organizing large-scale events. As part of the organizational team, I learned the crucial steps involved in coordinating a national qualifier event for Hungary and Erdély in Romania. I am thrilled to have been part of this experience and look forward to participating again next year.

    Published on: August 2

    Additional tags:

  • The Tragedy of Man and Socialism in Kádár's Hungary

    The Tragedy of Man and Socialism in Kádár's Hungary

    Romhányi József, renowned poet, transformed animation with his groundbreaking work on the Mezga Family series. In the first season, he offered a nuanced critique of family life in Kádár-era Hungary, capturing the complexities of average domestic dynamics. The second season delved into mental health and human psychology, exploring deeper emotional and psychological themes. The third season, set during a vacation, served as an allegorical critique of both socialism and capitalism, using the family’s journey to reflect on the contradictions and flaws inherent in both systems.

    Published on: July 29

    Additional tags: