Case Study
KInIT (Kempelen Institute of Intelligent Technologies) is an independent research institute focused on advancing artificial intelligence through a combination of cutting-edge research and real-world impact. Our mission is to develop trustworthy, human-centric AI technologies and to support their effective adoption in practice. Positioned at the intersection of academia and industry, KInIT plays a key role in strengthening the AI ecosystem in Slovakia and contributing to broader European efforts in building sovereign and competitive AI capabilities.
At KInIT, we focus on some of the most pressing challenges in contemporary AI research. A major part of our work is dedicated to generative AI and large language models, where we investigate not only their performance but also their limitations and risks. This includes evaluating how such models behave across different tasks, how they perform in low-resource languages such as Slovak, how they can be adapted efficiently to domain-specific use cases and different tasks, and how they can be aligned to reflect requirements on specific behavior (e.g., not to generate harmful content).
An important aspect of this research is understanding the societal implications of AI systems. For example, we study the ability of generative models to produce disinformation, as well as methods for detecting and mitigating such risks. At the same time, we explore efficient adaptation techniques, including various forms of fine-tuning and parameter-efficient learning, enabling organisations to deploy advanced AI systems without prohibitive computational costs.
Research at KInIT is primarily application-oriented. By researching and pushing the frontiers of various state-of-the-art methods, we build deep expertise that can be transferred into practice through collaborations with industry and public sector partners. Our research is applied across a range of domains, including natural language processing, multimodal AI, disinformation analysis, recommender systems, predictive modeling or predictions related to financial data.
This research and its applications require not only strong methodological foundations but also the ability to experiment at scale, validate solutions, and deploy them in suitable environments.
To support this broad spectrum of activities, KInIT needs a comprehensive and versatile computing infrastructure based on open technologies. The infrastructure will serve as a foundation for the entire AI lifecycle – from data processing and experimentation to large-scale model training and deployment.
Unlike traditional HPC environments that are primarily optimized for batch processing, our infrastructure is designed to support both research experimentation and production-grade AI services. Built on an OpenStack-based architecture, it allows researchers to flexibly allocate resources, run complex workloads, and deploy services in isolated environments, including isolated on-demand Kubernetes clusters.
This enables a seamless workflow in which models can be developed, tested, scaled, and deployed within a single integrated platform. Researchers can start with small-scale experiments, iterate rapidly, and then scale their workloads to more powerful configurations as needed.
The computational capabilities of the infrastructure allow us to explore state-of-the-art and emerging AI paradigms. This includes training and evaluating large-scale generative models, experimenting with multimodal and multi-agent systems, and investigating advanced approaches such as reinforcement learning or physics-informed models.
In addition, the infrastructure supports research into explainability and mechanistic interpretability, helping us better understand how complex AI systems operate internally. This is an important component of developing trustworthy AI systems, as it enables deeper insights into model behavior, decision-making processes, and potential failure modes.
The ability to run computationally intensive experiments locally is essential for maintaining research agility and independence, while also allowing us to prepare and optimize workloads before scaling them to external high-performance computing systems when needed (e.g., national or European HPC infrastructure).
One of the key strengths of our infrastructure is its versatility. It is not limited to experimentation but is equally suited for deploying and operating AI systems in production-like environments. This includes hosting machine learning models, serving APIs, and running complex AI pipelines that integrate multiple components.
The platform’s design provides adequate infrastructural support for complex distributed setups like multi-agent systems, enabling experimentation with advanced AI architectures while also providing the reliability and scalability required for real-world applications. This makes it possible to bridge the gap between research prototypes and deployable solutions, accelerating the transfer of knowledge into practice.
Requirements were defined and prioritized as follows:
dNation has won a tender for cluster installation. In the following text we will describe how they have been addressing requirements above.
Yaook distribution of OpenStack has been used for seamless installation and operations. Yaook has been developed by ALASCA (Association for Operational, Open Cloud Infrastructures e.V.) non-profit consortium.
Sovereign Cloud Stack (SCS) non-profit is a European initiative that creates an open, transparent and vendor-neutral cloud ecosystem. Part of its activities covered by Forum SCS-Standards are to define, document and develop standards and certifications to ensure expected level of quality of sovereign clusters. Yaook is an SCS certified solution.
Yaook runs OpenStack containerized within Kubernetes running on bare metal nodes:
Yaook’s architecture allows:
Yaook’s architecture allows:
To provide fast network, following leaf-spine network topology has been used:
Each cable from a node to leaf switch has 100 Gbps network speed. Two cables are combined on a software level and combine a bond with 200 Gbps network speed.
Leaf switches are connected on layer-2 to virtual chassis using MLAG protocol while communication with spine switches is performed on layer-3.
Failure of one cable from a bond causes that another cable is transparently used. This way both improved speed and resiliency is achieved as there is no single point of failure.
Multiple storage types with 1 PB raw capacity have been used to support various storage needs:
1. Local NVMe
2. Redundant network CEPH storage
3. Cold storage
The cluster is protected from the Internet by the OPNsense – open source, FreeBSD-based firewall and routing software.
ALASCA Arko is a standardized platform for efficient monitoring of hybrid cloud infrastructures. We are using it for constant cluster monitoring, alerting and supporting Day 2 operations.
Arko follows these design principles:
OpenStack doesn’t contain sufficient built-in reporting capabilities, so we used RacStack – A Billing Dashboard For OpenStack developed by Pacifico Digital Explorations.
Do not hesitate to contact us:
For AI research related questions: info@kinit.sk
For Infrastructure related questions: cloud@dNation.cloud