Deductly (Tax Analytics RAG)
RAG-powered financial tool for tax-saving suggestions via semantic search.
Project Scope
What this project covers — systems owned, responsibilities, and integrations.
The Story
Tax laws are labyrinthine, opaque, and constantly shifting. For the average freelancer or small business owner, finding valid deductions feels like searching for a needle in a haystack—often resulting in overpayment or dangerous audits. Deductly leverages Retrieval-Augmented Generation (RAG) to solve this. By indexing massive volumes of tax codes and financial datasets, it empowers users to ask plain-English questions and receive highly contextual, legally grounded suggestions for tax savings. It turns a stressful compliance chore into a strategic advantage.
Challenges Faced
Real technical and design problems encountered during development — and how they were resolved.
Grounding LLM answers in dense legal text
Generic vector search returned irrelevant passages from unrelated sections. Solved with a two-tier retrieval strategy: Tier 1 uses metadata-filtered (section/rule ID) precision search; Tier 2 falls back to semantic search. This near-eliminated hallucinated deduction claims.
OCR quality on scanned government PDFs
CBDT notifications and ITR forms were scanned images with low DPI, causing poor text extraction. Used PyMuPDF + Tesseract with custom pre-processing (contrast enhancement, deskew) to raise OCR accuracy to a usable level for chunking.
Stateful conversation resumption across sessions
LangGraph's human-in-the-loop interrupt pattern required persisting graph state between HTTP requests. Used MongoDB as the LangGraph checkpointer, enabling users to close the browser and resume an interrupted deduction session hours later.
Real-World Impact
Measurable outcomes and meaningful results this project delivered.
5 legal document collections indexed
Income Tax Act 2025, Rules 1962, CBDT Notifications, Capital Gain Case Laws, and ITR-1 Forms — all queryable by the RAG pipeline.
10 deductions calculated deterministically
Statutory deductions (80C, 80D, 24b, 80G, etc.) are calculated using hard-coded FY 2024-25 limits, not estimated by the LLM — eliminating a major source of financial inaccuracy.
Zero hallucination on supported deductions
The guardrail system (topic-locked system prompts + Pydantic validation + deterministic calculator) prevents the model from fabricating deduction amounts or sections.
SWOT Analysis
Strengths
- •Context-aware RAG pipeline prevents generic LLM hallucinations
- •Semantic search accurately maps plain-English to complex tax codes
- •User-friendly chat interface for non-technical users
Weaknesses
- •Accuracy strictly gatekept by the quality and frequency of vector DB updates
- •Does not replace certified CPA sign-off (Legal liability constraint)
Opportunities
- •Integration directly into accounting software (Quickbooks, Xero)
- •Expansion to multi-region compliance mapping
Threats
- •Strict fintech regulatory compliance changes
- •Hallucinations causing incorrect tax guidance