Back to Projects

Deductly (Tax Analytics RAG)

FinTech
Jun 2025 - Ongoing

RAG-powered financial tool for tax-saving suggestions via semantic search.

TypeScriptPythonRAG

Project Scope

What this project covers — systems owned, responsibilities, and integrations.

Multi-collection Qdrant RAG knowledge base (Income Tax Act, Rules, CBDT notifications, ITR forms)
LangGraph stateful agentic pipeline with human-in-the-loop interruption
Deterministic tax calculator for 10 Indian IT deductions (FY 2024-25)
OCR-based PDF ingestion pipeline for scanned government documents
Full-stack: React/Vite frontend + FastAPI backend + MongoDB session persistence

The Story

Tax laws are labyrinthine, opaque, and constantly shifting. For the average freelancer or small business owner, finding valid deductions feels like searching for a needle in a haystack—often resulting in overpayment or dangerous audits. Deductly leverages Retrieval-Augmented Generation (RAG) to solve this. By indexing massive volumes of tax codes and financial datasets, it empowers users to ask plain-English questions and receive highly contextual, legally grounded suggestions for tax savings. It turns a stressful compliance chore into a strategic advantage.

Challenges Faced

Real technical and design problems encountered during development — and how they were resolved.

1

Grounding LLM answers in dense legal text

Generic vector search returned irrelevant passages from unrelated sections. Solved with a two-tier retrieval strategy: Tier 1 uses metadata-filtered (section/rule ID) precision search; Tier 2 falls back to semantic search. This near-eliminated hallucinated deduction claims.

2

OCR quality on scanned government PDFs

CBDT notifications and ITR forms were scanned images with low DPI, causing poor text extraction. Used PyMuPDF + Tesseract with custom pre-processing (contrast enhancement, deskew) to raise OCR accuracy to a usable level for chunking.

3

Stateful conversation resumption across sessions

LangGraph's human-in-the-loop interrupt pattern required persisting graph state between HTTP requests. Used MongoDB as the LangGraph checkpointer, enabling users to close the browser and resume an interrupted deduction session hours later.

Real-World Impact

Measurable outcomes and meaningful results this project delivered.

5 legal document collections indexed

Income Tax Act 2025, Rules 1962, CBDT Notifications, Capital Gain Case Laws, and ITR-1 Forms — all queryable by the RAG pipeline.

10 deductions calculated deterministically

Statutory deductions (80C, 80D, 24b, 80G, etc.) are calculated using hard-coded FY 2024-25 limits, not estimated by the LLM — eliminating a major source of financial inaccuracy.

Zero hallucination on supported deductions

The guardrail system (topic-locked system prompts + Pydantic validation + deterministic calculator) prevents the model from fabricating deduction amounts or sections.

SWOT Analysis

Strengths

  • Context-aware RAG pipeline prevents generic LLM hallucinations
  • Semantic search accurately maps plain-English to complex tax codes
  • User-friendly chat interface for non-technical users

Weaknesses

  • Accuracy strictly gatekept by the quality and frequency of vector DB updates
  • Does not replace certified CPA sign-off (Legal liability constraint)

Opportunities

  • Integration directly into accounting software (Quickbooks, Xero)
  • Expansion to multi-region compliance mapping

Threats

  • Strict fintech regulatory compliance changes
  • Hallucinations causing incorrect tax guidance
Full Technical Documentation
GitHub