MINA

Multilingual legal assistant agent for access to justice
ACL 2026 · 📑 Findings
Multilingual Legal NLP
LLM Agent · Two-Stage RAG
Bengali · English

MINA: A Multilingual LLM-Powered Legal Assistant Agent for Empowering Access to Justice in Bangladesh

Azmine Toushik Wasi, Wahid Faisal, Mst Rafia Islam, Md Rizwan Parvez

Computational Intelligence and Operations Laboratory (CIOL) • Cohere Labs Community • Shahjalal University of Science and Technology (SUST) • Independent University, Bangladesh (IUB) • Qatar Computing Research Institute (QCRI)

Correspondence: azmine32@student.sust.edu

Accepted to the Findings of the Association for Computational Linguistics: ACL 2026

📄 arXiv

Bangladesh's low-income population faces major barriers to affordable legal advice due to complex legal language, procedural opacity, and high costs. Existing AI legal assistants lack Bengali support and jurisdiction-specific adaptation. We present MINA, a multilingual LLM-based legal assistant for the Bangladeshi context. It combines multilingual embeddings with a RAG-based chain-of-tools framework for retrieval, reasoning, translation, and document generation, delivering context-aware legal drafts, citations, and plain-language explanations through an interactive chat interface. Evaluated by law faculty across the 2022 and 2023 Bangladesh Bar Council Examinations, MINA scored 75-80% on the MCQ, written, and simulated viva voce components, matching or surpassing average human performance while operating at roughly 0.1-0.6% of the cost of a human lawyer.

MINA system architecture: an Orchestrator Agent routes a Bengali or English query through a two-stage RAG Agent over an Acts and Sections vector database and a selective tool suite to produce a final grounded legal answer
Click to zoom
System architecture: the Orchestrator Agent checks context and delegates to a two-stage RAG Agent over Bangladeshi statutes, with tools invoked only when needed.

The Access-to-Justice Crisis

Bangladesh follows a Common Law system rooted in colonial-era codes such as the Code of Civil Procedure, 1908 and the Penal Code, 1860, with Farsi-influenced terminology that general models cannot interpret reliably. The judiciary carries a large case backlog while serving a population of more than 170 million, and unregulated fees place formal representation out of reach for most. The result is a structural access-to-justice gap that disproportionately affects marginalized groups.

3.7-4.4M
Pending cases in the judicial backlog
2,100
Judges nationwide (about 1 per 90,000 people)
10-60 yr
Typical delay before case resolution
2k-10k ৳
Cost of a single basic legal consultation

Legal NLP can narrow this gap by simplifying complex texts, automating document analysis, and providing accessible guidance. Yet existing tools are inadequate for Bangladesh: Bengali suffers from scarce annotated data and limited tooling, while English-centric models struggle with Bengali morphology, archaic legal registers, and jurisdiction-specific doctrine. MINA targets exactly this combination of linguistic, legal, and socio-economic constraints.

The System: An Agentic, RAG-Grounded Pipeline

MINA is anchored by an Orchestrator Agent that evaluates the user input, chat history, and any uploaded documents to select a response pathway. When internal context is insufficient, it delegates to a RAG Agent that retrieves relevant Acts and Sections using LLM-generated keywords and multilingual embeddings over Chroma vector stores. External tools are used only when necessary.

Both agents run inside a LangGraph-based state machine that maintains persistent memory across turns, supporting multi-turn consultations and conditional execution. A custom legal dictionary interprets colonial-era and Farsi-influenced terms, and a socio-economic simulation module contextualizes outcomes within structural inequality.

Two-stage retrieval, separated to prevent statute conflation

01 — ACT LEVEL

Act retrieval

Semantic keywords retrieve the top n=5 Acts from a database of LLM-generated Act summaries, giving broad coverage of the relevant legislative units.

595 Acts
02 — SECTION LEVEL

Section retrieval

Retrieved Act IDs filter a chunked Section database to find the top n=10 provisions, providing precise grounding at the Section level.

18,023 Sections
03 — QUALITY CHECK

Relevance assessment

An LLM filtering step marks each section relevant or irrelevant. If insufficient, keywords are refined and retrieval is rerun, reducing misapplied statutes.

refine and rerun

Two separate vector databases (Acts and Sections) prevent the naive failure mode in which content from unrelated Acts is combined. Retrieval uses Cohere embed-multilingual-light-v3.0, which supports semantic similarity across Bengali and English.

The Tool Suite

MINA integrates specialized tools for preprocessing, retrieval, interpretation, and simulation. Tools are invoked selectively (up to three times per query), which controls per-query overhead while preserving accuracy. The averaged tool-call frequencies below come from system-log analysis across the 2022 and 2023 exams.

Tool Role in the workflow Avg calls / query (MCQ) Avg calls / query (Written)
Keyword GeneratorProduces 5-10 semantic search terms; regex fallback for low context1.8-2.02.3-2.4
Question Relevance AnalyzerEmbeds current and prior queries to maintain multi-turn coherence1.2-1.31.5-1.6
Legal DictionaryExplains colonial-era and Farsi-influenced legal terms0.8-0.90.8-0.9
Web SearchDuckDuckGo top results for external references0.4-0.50.8
Web Page ParserBeautifulSoup extraction of up to 5,000 visible characters0.3-0.40.7-0.8
Socio-Economic SimulationModels how income, literacy, and geography shape legal outcomes0.2-0.30.8-0.9
File Content ReaderParses uploaded .pdf, .docx, .pptx into clean text0.10.3
Chat AnalyzerReconstructs topic continuity and user intent in chat mode0.10.1

The Keyword Generator and Question Relevance Analyzer are the most frequently used tools, reflecting their central role in query formulation and retrieval accuracy. The File Content Reader and Chat Analyzer are rarely invoked, consistent with the single-turn, non-document nature of the exam.

Results: The Bangladesh Bar Council Examination

We benchmark MINA against the national qualification exam for legal practice, across its three stages (MCQ, Written, Viva Voce) for both 2022 and 2023. MCQ is auto-marked; written and viva responses were scored by law faculty from leading Bangladeshi universities (five viva evaluators, three written), applying official Bar Council criteria. Inter-annotator agreement was high (Cohen's κ = 0.827).

Grouped bar chart of MCQ accuracy under four retrieval setups for Gemini-2.5-Flash and Command-A-8B, with a dashed line at the 50 percent human pass threshold
Click to zoom
Two-step RAG with tools lifts both proprietary and open models well past the 50% pass threshold; the gain from retrieval strategy often exceeds the gain from model size.

Best configuration vs. human benchmarks

StageBest MINA score (Gemini-2.5-Flash, 2-step RAG + tools)Human benchmark
Preliminary MCQ77.0Pass rate 17.96% (2023) to 25.86% (2022); threshold 50%
Written81.0 - 81.8Typical candidate average 40-60%
Viva Voce81.0Pass rate about 96-97%

Open-source models stay competitive (2022, 2-step RAG + tools)

ModelMCQWrittenViva
Qwen3-30B-A3B-Instruct70.878.279.4
Llama3.1-70B-Instruct42.479.880.2
Command-A-8B47.474.471.2
Gemma-3-27B-it64.472.472.4

Mid-tier and large open-source models, paired with structured retrieval, approach proprietary performance on written and viva tasks, supporting cost-effective deployment in Global South contexts. A full-set evaluation without confidence-based answer selection changed scores by only 1.0-3.8%, leaving model rankings unchanged.

COST ANALYSIS

High-quality legal assistance at public-service prices

Per-query inference costs roughly 0.2-0.6 cents for MCQ-style queries and 0.8-2.0 cents for longer written queries, with non-LLM tool overhead negligible. Even a conservative 10-cent multi-turn interaction (about 12.2 BDT) is only 0.12%-0.61% of the 2,000-10,000 BDT cost of a basic consultation, a cost reduction of approximately 99.4%-99.9% relative to human-provided legal services.

Key Findings and Error Analysis

  • Retrieval is the operational core, not an add-on. Two-step RAG dramatically amplifies weak baselines; for example Command-A-8B rose from 10 to 47 on 2022 MCQ, and Gemini-2.5-Flash from 35.2% to 78.6% on the written exam.
  • Strategy can outweigh scale. Structured retrieval and tool integration close much of the gap between small and large models, so retrieval pipelines and prompt design deserve as much attention as parameter count.
  • Hierarchical and procedural reasoning is the bottleneck. Models flatten jurisdictional hierarchy and misclassify doctrine, indicating that retrieval alone is insufficient without explicit structured reasoning.
  • Oral tasks need interaction. In viva, models rarely asked clarifying questions and applied rules before resolving fact-sensitive details; dialogue-state tracking and clarification policies improved adaptive accuracy in pilot tests.
  • Tools are precision enhancers. Calculators and re-ranking add incremental but high-impact gains on procedural arithmetic, not a substitute for reasoning capacity.

Representative failure modes

LINGUISTIC NUANCE

Conjunction "O" (and) vs. "ba" (or)

A civil suit can concern property or office; the model read the conjunction as "and", narrowing the legal scope. Small lexical shifts in Bengali carry large legal weight.

DOCTRINE MAPPING

Res Judicata: Section 11, not 151

The model assigned Res Judicata to Section 151 (inherent powers) instead of Section 11, exposing difficulty in linking abstract doctrine to its codified provision.

FACTUAL RELIABILITY

Hallucinated case law

Some answers fabricated cases with false procedural detail, reflecting fluency-focused generation overriding verified knowledge in high-stakes settings.

DOMAIN CONFUSION

Civil vs. criminal conflation

In loan-recovery scenarios the model treated breach of contract as a criminal offense, misapplying Penal Code Section 420 without fraudulent intent.

Where MINA goes next

  • Hierarchical and symbolic reasoning. A low-latency symbolic validation layer to enforce procedural constraints in real time.
  • Deterministic procedural calculators. Correct handling of limitation-period arithmetic and statutory tolling.
  • Interactive clarification and dialogue-state tracking. Targeted follow-up questions and cross-turn fact verification for viva-style use.
  • Contrastive fine-tuning and post-generation verification. Training on paired correct/incorrect bilingual exam answers, plus a layer that validates cited statutes and flags missing prerequisites.
  • Multilingual and dialectal expansion. Evaluation across Bengali dialects and formal versus informal registers.
RESPONSIBLE USE

A supportive tool, not a replacement

MINA is designed for legal professionals, legal aid workers, and exam preparation. It is not infallible: outputs must be used under human supervision and cross-checked against authoritative statutes and case law. Liability for any legal action remains with the human professional using the system.

Citation

Please cite the paper as below:

@inproceedings{wasi2026mina,
  title     = {{MINA}: A Multilingual {LLM}-Powered Legal Assistant Agent
               for Empowering Access to Justice in Bangladesh},
  author    = {Wasi, Azmine Toushik and Faisal, Wahid and
               Islam, Mst Rafia and Parvez, Md Rizwan},
  booktitle = {Findings of the Association for Computational Linguistics: ACL 2026},
  year      = {2026},
  publisher = {Association for Computational Linguistics}
}
CIOL Logo SUST Logo Independent University Bangladesh Logo Cohere Labs Community Logo QCRI Logo