📊 Full opportunity report: AMÁLIA · The Three Hard Questions. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Portugal’s state-funded AMÁLIA large language model is now operational, outperforming several benchmarks, but experts question its openness, native data sufficiency, and optimization goals. These issues highlight broader challenges for European sovereign-LLMs.
Portugal’s €5.5 million investment in the AMÁLIA large language model has resulted in a functioning, benchmark-beating system that is now accessible to thousands of academic users. However, critical questions about its openness, native-language data, and strategic goals remain unanswered, raising concerns about the broader European sovereign-LLM movement.
AMÁLIA is a consortium project involving approximately 60 researchers from Portugal’s top institutions, including NOVA and IST, announced in December 2024, with the base version completed by September 2025. The model, which handles Portuguese text, is built as a continuation of the EuroLLM multilingual foundation, not trained from scratch, and has demonstrated superior performance on Portuguese benchmarks, surpassing many open models and most of Qwen 3-8B.
Despite its technical achievements, questions persist about how open the model truly is, given its development under a government-funded initiative with public accountability. Additionally, the model’s native Portuguese data constitutes about 5.8 billion tokens out of 107 billion in extended pre-training, raising doubts about whether this amount is sufficient for truly native-language competence. Finally, there is debate about what the model is optimized for, with some experts questioning whether current training strategies align with long-term strategic goals for Portuguese AI development.
AMÁLIA
The three hard
questions.
Portugal spent €5.5M to build a European Portuguese LLM. The base version is operational, the benchmarks beat Qwen 3-8B on most pt-PT tasks. So why are the most important questions still unanswered?
Last month, Duarte O.Carmo published the sharpest public analysis of AMÁLIA — Portugal’s state-funded European Portuguese large language model. He prefaces his critique with the necessary diplomatic apparatus before doing what almost nobody else in the European-sovereign-LLM discourse has been willing to do publicly: asking hard questions about whether the work, as released, actually does what it set out to do. This piece is a structural extension of his analysis. The AMÁLIA case study exposes three hard questions every national LLM effort needs to answer publicly — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
Three questions every national LLM effort needs to answer publicly.
Duarte O.Carmo’s framing maps cleanly onto the structural argument. Each question lands specifically in AMÁLIA — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
The three questions form a structural feedback loop. Q3 (optimization target) determines Q2 (data volume needed) which conditions Q1 (openness sufficient for community contribution). The European sovereign-LLM movement collectively benefits from these questions becoming standard methodology disclosure, not exceptional critique.

Official Jetson AGX Orin 64GB Developer Kit 275 Tops, with 2TB SSD AI Embodied Intelligence Development Provides AI Large Models/Ubuntu
AGX Orin 64GB Development Kit makes it easy to get started with AGX Orin. Its compact size, rich…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
107 billion tokens. 5.8 billion clearly pt-PT.
The structurally tractable question with a structurally surprising answer. For a model whose entire stated purpose is European Portuguese prioritization, the native-language share of extended pre-training is 5.5%. The implications cascade into every other question.

Portuguese Flash Cards – Learn Portuguese Language Vocabulary Words and Phrases – Basic Language for Beginners – Gift for Travelers, Kids, and Adults by Travelflips
PORTUGUESE FLASH CARDS – Basic Portuguese words and phrases for beginners and travelers
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
The Olmo standard. AMÁLIA’s current state.
Allen Institute for AI’s Olmo project defines what “fully open” operationally requires. Olmo doesn’t lead frontier benchmarks. That’s not the point. The point is to be the structural reference for openness. AMÁLIA’s “fully open source” claim should track to the operational standard.

Evals for AI Engineers: Systematically Measuring and Improving AI Applications
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Four strategic positions. AMÁLIA between two and three.
Approximately €100M+ in publicly disclosed European sovereign-LLM funding across the major initiatives. The structural question every project faces: what is the actual competitive position you’re staking? Four options — none mutually exclusive — but each requiring different commitments.

Hugging Face Transformers Cookbook: Fine-Tuning Open-Source Models for Niche Industry Tasks
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three standards. For AMÁLIA and the movement.
The structural critique generalizes beyond AMÁLIA. Italy, France, Germany, Switzerland, the OpenEuroLLM consortium, and every subsequent national project benefit from public discourse holding national LLM efforts to operational standards on openness, data accounting, and strategic positioning.
The European sovereign-AI agenda is a serious strategic project that deserves serious public discourse. O.Carmo’s analysis is what serious public discourse looks like. Appropriately diplomatic. Structurally rigorous. Willing to ask the hard questions in public when the public investment justifies it. More of this is needed — across every European sovereign-LLM project, not just AMÁLIA.
Impact of Unanswered Structural Questions on European AI
The unresolved questions about AMÁLIA’s openness, native data, and objectives are emblematic of broader challenges facing Europe’s sovereign-language AI efforts. These issues influence national policy, strategic autonomy, and the future of local language AI development. Addressing them is critical for ensuring that models like AMÁLIA meet both technical and societal expectations, and that European countries can confidently rely on their own AI systems without overdependence on external providers.
European Sovereign-Language Model Development Landscape
Across Europe, multiple countries are investing in native-language large language models, including Italy’s Minerva, Germany’s Aleph Alpha, France’s Mistral, and regional initiatives like OpenEuroLLM. These efforts are driven by strategic goals of technological sovereignty and linguistic preservation. However, they often face common challenges: defining openness, sourcing sufficient native data, and aligning model objectives with national priorities. Portugal’s AMÁLIA exemplifies these issues, being one of the most publicly funded and scrutinized efforts to date, with its development closely tied to public accountability and national strategy.
“AMÁLIA is an impressive piece of work, but it raises fundamental questions about openness, data, and goals that the community must address.”
— Duarte O.Carmo
Key Unknowns in AMÁLIA’s Development and Strategy
It is not yet clear how open AMÁLIA truly is, given the scope of its data sources and licensing. The sufficiency of native Portuguese data remains debated, and the strategic objectives of the model—whether it prioritizes openness, performance, or strategic autonomy—are still under discussion. Additionally, the final version’s capabilities and alignment with Portugal’s long-term AI goals are yet to be fully revealed, with several gaps expected to be addressed before June 2026.
Upcoming Milestones and Critical Evaluations for AMÁLIA
The next 12-24 months will be pivotal for AMÁLIA, with the final version scheduled for release in June 2026. During this period, researchers and policymakers will scrutinize its native data sources, assess its openness, and clarify its strategic objectives. Further benchmarks and real-world applications will inform whether AMÁLIA can serve as a model for other European languages and sovereign AI initiatives. Public and expert evaluations are expected to intensify as the project approaches its final release.
Key Questions
What are the main concerns about AMÁLIA’s openness?
Experts question whether the model’s data sources and licensing truly allow full openness, given the limited native Portuguese data and the proprietary aspects of its development process.
How does AMÁLIA compare to other European language models?
AMÁLIA outperforms most open models on Portuguese benchmarks and beats Qwen 3-8B on many tasks, but still trails on some benchmarks like ALBA, indicating room for improvement.
Why is native-language data important for AMÁLIA?
Native data is crucial for the model to understand cultural nuances, idiomatic expressions, and context-specific language use, impacting its usefulness and accuracy for Portuguese speakers.
What are the strategic implications of AMÁLIA’s development?
AMÁLIA’s progress reflects Portugal’s aim for technological sovereignty and linguistic preservation, but uncertainties about its objectives could influence future policy and funding decisions.
When will the final version of AMÁLIA be available?
The final version is scheduled for release in June 2026, with ongoing evaluations and potential adjustments in the months leading up to it.
Source: ThorstenMeyerAI.com