Subscription Sleuth

Project Blueprint: Subscription Sleuth

1. The Business Problem (Why build this?)

In today's digital economy, consumers are constantly signing up for subscription services – from streaming platforms and productivity tools to fitness apps and online courses. While these services offer immense value, they collectively create a significant challenge: "subscription fatigue" and "bill shock." The average individual juggles multiple recurring payments, often spread across various financial institutions, credit cards, and email statements. This fragmentation makes it incredibly difficult to maintain a clear overview of one's recurring financial commitments.

The core problems users face are:

Lack of Visibility: Subscriptions are often "out of sight, out of mind." A service signed up for months or years ago might still be billing a user without their active awareness or continued use.
Hidden Costs: Small, individual subscription fees accumulate rapidly, leading to a substantial drain on monthly budgets that users only discover when reviewing large aggregated statements.
Difficulty in Cancellation: Identifying billing cycles, finding cancellation policies, or remembering to cancel before the next charge can be cumbersome, leading to unintended renewals.
Manual Tracking Burden: Attempting to manually track subscriptions using spreadsheets or personal notes is time-consuming, prone to error, and rarely kept up-to-date.
Fraud and Unwanted Charges: Forgotten subscriptions can sometimes mask fraudulent activities or simply represent services that are no longer desired or used, yet continue to incur costs.

"Subscription Sleuth" directly addresses these pain points by providing an automated, centralized, and intelligent solution. It aims to empower users to reclaim control over their recurring expenses, promoting better financial health and reducing unnecessary spending. For beginners in personal finance, this tool offers a low-friction entry point into understanding and managing one of the most pervasive aspects of modern spending.

2. Solution Overview

Subscription Sleuth is a personal finance application designed to autonomously detect, analyze, and help users manage all their recurring subscriptions. It acts as a vigilant financial assistant, uncovering forgotten services and providing actionable insights into spending patterns.

The core solution workflow involves:

Secure Data Ingestion: Users securely upload financial statements (bank, credit card, utility bills) in PDF format. The application is designed with privacy in mind, processing sensitive document text locally where possible.
AI-Powered Detection: Leveraging advanced AI (Gemini API), the system intelligently scans the extracted text for patterns indicative of recurring subscriptions, identifying merchant names, amounts, and billing frequencies.
Consolidated View: All detected subscriptions are aggregated into a single, intuitive dashboard, providing a transparent overview of all recurring financial commitments.
Recurring Cost Analysis: The application calculates total monthly and annual subscription spending, breaks down costs by category, and highlights the most expensive services.
Proactive Reminders: Users receive timely reminders for upcoming subscription renewals, free trial expirations, and cancellation deadlines, preventing unwanted charges.
Spending Insights: Interactive visualizations and reports help users understand their spending habits, identify areas for cost reduction, and make informed financial decisions.

The application’s goal is to transform the complex and often frustrating task of subscription management into an effortless and empowering experience, making users the "sleuths" of their own finances.

3. Architecture & Tech Stack Justification

The architecture for Subscription Sleuth is designed for agility, scalability, and leveraging modern serverless paradigms. It adopts a robust client-server model, with a focus on a seamless user experience and intelligent backend processing.

Overall Architecture Diagram:

[User Device]
      |
      V
[Next.js Frontend (Client-side Rendering/Hydration)]
      |
      | 1. Upload PDF (PDF.js extracts text locally)
      V
[Next.js API Routes (Serverless Functions on Firebase Hosting/Cloud Functions)]
      |
      | 2. Raw Text & User ID
      V
[Firebase Cloud Functions (Backend Logic & Orchestration)]
      |
      | 3. Gemini API Prompt
      V
[Google Gemini API (AI-powered Detection)]
      |
      | 4. Structured Subscription Data (JSON)
      V
[Firebase Cloud Functions]
      |
      | 5. Store/Update Data
      V
[Google Firestore (NoSQL Database)]
      ^
      | 6. Real-time Subscription Data & Insights
      |
[Next.js Frontend]
      |
      V
[User Dashboard / Notifications]

Tech Stack Justification:

Next.js (Frontend & API Routes):
- Full-stack Capabilities: Next.js allows us to build both the user interface (React) and backend API endpoints within a single framework. This simplifies development, tooling, and deployment.
- Server-Side Rendering (SSR) / Static Site Generation (SSG): While the core application will be behind authentication, Next.js provides flexibility for a performant landing page or marketing site. For authenticated routes, client-side rendering with pre-rendering (hydration) offers excellent responsiveness.
- Developer Experience: Leveraging the React ecosystem ensures a rich, interactive, and maintainable user interface.
- API Routes: These act as lightweight, serverless functions, perfect for proxying requests to Firebase Cloud Functions or directly handling simple data operations, without needing a separate backend server setup.
Google Firebase (Backend Services):
- Firebase Authentication: Provides a comprehensive and secure identity platform. It supports email/password, social logins (Google, Apple, etc.), and anonymous authentication, significantly reducing development effort for user management.
- Firestore (NoSQL Database): A highly scalable, flexible, and real-time NoSQL document database. Its real-time synchronization capabilities are ideal for instantly updating user dashboards with newly detected subscriptions or modified data. Its schema-less nature accommodates the evolving needs of subscription data.
- Cloud Functions for Firebase: A serverless execution environment perfect for backend logic. This is where the heavy lifting of interacting with the Gemini API, performing data transformations, and sending notifications will occur. It scales automatically and eliminates server management overhead.
- Firebase Storage: Provides secure object storage, which can be used for temporary storage of uploaded PDFs before processing (though PDF.js will handle most of this client-side).
- Firebase Hosting: Offers fast, secure, and globally distributed hosting for the Next.js frontend, complete with SSL and CDN capabilities. It integrates seamlessly with Cloud Functions for API routing.
Gemini API (AI-Powered Subscription Detection):
- Core Intelligence: Gemini is fundamental to the "Subscription Detection" feature. Its advanced natural language understanding and ability to process long text inputs are crucial for extracting structured data from unstructured financial statements.
- Multimodal Potential: While initially used for text analysis, Gemini's multimodal capabilities offer future expansion opportunities, such as analyzing payment screenshots or invoices.
- Google Ecosystem Integration: Seamless integration with other Google Cloud services (like Cloud Functions) and robust infrastructure.
PDF.js (Client-side PDF Parsing):
- Privacy-First Approach: A critical choice for handling sensitive financial documents. PDF.js allows us to extract text directly within the user's browser (client-side). This means the raw PDF document never leaves the user's device. Only the extracted text is then sent to the backend for AI processing. This significantly enhances user trust and privacy compliance.
- Efficiency: Client-side processing offloads compute from the server, improving responsiveness and reducing backend load.
- Open-Source & Robust: A well-maintained and widely used library for PDF rendering and text extraction.

This integrated stack provides a powerful, scalable, and cost-effective foundation for Subscription Sleuth, allowing us to focus on delivering core features rather than infrastructure management.

4. Core Feature Implementation Guide

This section details the implementation strategy for the key features, outlining data flow, pseudo-code, and specific technologies.

A. User Onboarding & Data Ingestion

Authentication (Firebase Auth):

Flow: Users sign up or log in using email/password or Google SSO. Firebase UI library or custom React components will handle the UI.

Implementation:

import { getAuth, signInWithEmailAndPassword, createUserWithEmailAndPassword, GoogleAuthProvider, signInWithPopup } from 'firebase/auth';
const auth = getAuth();

// Email/Password Signup
async function signUp(email, password) {
    try {
        const userCredential = await createUserWithEmailAndPassword(auth, email, password);
        console.log('User signed up:', userCredential.user.uid);
    } catch (error) {
        console.error('Signup error:', error.message);
    }
}

// Google Sign-in
async function signInWithGoogle() {
    const provider = new GoogleAuthProvider();
    try {
        const result = await signInWithPopup(auth, provider);
        console.log('Signed in with Google:', result.user.uid);
    } catch (error) {
        console.error('Google Sign-in error:', error.message);
    }
}

Data Upload & PDF Parsing (Client-side with PDF.js):

UI: A drag-and-drop file input component (e.g., using react-dropzone).
Processing:
- User uploads a PDF file (e.g., bank_statement.pdf).
- PDF.js reads the file as a FileReader ArrayBuffer.
- It then loads the PDF document and extracts text content from each page.
- The aggregated raw text is then sent to a Next.js API route.

Implementation (Next.js client component):

import * as pdfjsLib from 'pdfjs-dist/build/pdf';
pdfjsLib.GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjsLib.version}/pdf.worker.min.js`;

async function extractTextFromPdf(file) {
    const reader = new FileReader();
    reader.readAsArrayBuffer(file);

    return new Promise((resolve, reject) => {
        reader.onload = async () => {
            const typedarray = new Uint8Array(reader.result);
            try {
                const pdf = await pdfjsLib.getDocument({ data: typedarray }).promise;
                let fullText = '';
                for (let i = 1; i <= pdf.numPages; i++) {
                    const page = await pdf.getPage(i);
                    const textContent = await page.getTextContent();
                    fullText += textContent.items.map(item => item.str).join(' ') + '\n';
                }
                resolve(fullText);
            } catch (error) {
                console.error('Error extracting text from PDF:', error);
                reject(error);
            }
        };
        reader.onerror = (error) => reject(error);
    });
}

// In your React component onChange handler for file input:
async function handleFileUpload(event) {
    const file = event.target.files[0];
    if (file && file.type === 'application/pdf') {
        const rawText = await extractTextFromPdf(file);
        // Send rawText to Next.js API route for further processing
        const response = await fetch('/api/process-statement', {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({ rawText, fileName: file.name })
        });
        const result = await response.json();
        console.log('API response:', result);
    }
}

B. Subscription Detection (Gemini & Cloud Functions)

This is the core intelligence pipeline.

Next.js API Route (/api/process-statement):

Receives rawText and fileName from the client.
Authenticates the user (e.g., using Firebase Auth token from the client).
Calls a Firebase Cloud Function securely.

Implementation (Next.js API Route pages/api/process-statement.js):

import { getAuth } from 'firebase-admin/auth'; // Using firebase-admin for server-side
import { initializeApp, getApps, cert } from 'firebase-admin/app';
import { getFunctions } from 'firebase-admin/functions';
import serviceAccount from 'path/to/your/serviceAccountKey.json'; // Securely load credentials

if (!getApps().length) {
  initializeApp({
    credential: cert(serviceAccount),
    projectId: process.env.NEXT_PUBLIC_FIREBASE_PROJECT_ID, // Use project ID for functions
  });
}

export default async function handler(req, res) {
    if (req.method !== 'POST') {
        return res.status(405).json({ error: 'Method Not Allowed' });
    }

    const idToken = req.headers.authorization?.split('Bearer ')[1];
    if (!idToken) {
        return res.status(401).json({ error: 'Unauthorized' });
    }

    try {
        const decodedToken = await getAuth().verifyIdToken(idToken);
        const userId = decodedToken.uid;
        const { rawText, fileName } = req.body;

        // Call the Firebase Cloud Function
        const functions = getFunctions();
        const detectSubscriptionsFunction = functions.httpsCallable('detectSubscriptions');
        const result = await detectSubscriptionsFunction({ rawText, userId, fileName });

        res.status(200).json({ success: true, data: result.data });

    } catch (error) {
        console.error('Error in /api/process-statement:', error);
        res.status(500).json({ error: error.message });
    }
}

Firebase Cloud Function (detectSubscriptions):

Trigger: HTTPS Callable function.
Inputs: rawText (from PDF.js via Next.js API), userId.
Process:
1. Constructs a detailed Gemini prompt with rawText.
2. Calls the Gemini API (model.generateContent).
3. Parses the JSON response from Gemini.
4. Validates and sanitizes the extracted data.
5. Stores the detected subscriptions in Firestore under users/{userId}/subscriptions.

Firestore Data Model for subscriptions:

{
  "name": "Netflix Premium",
  "amount": 15.99,
  "currency": "USD",
  "frequency": "monthly", // "monthly", "annually", "weekly", "quarterly", "bi-weekly"
  "lastBilledDate": "2024-02-15", // YYYY-MM-DD
  "nextBilledDate": "2024-03-15", // Calculated
  "status": "active", // "active", "paused", "cancelled", "pending_review"
  "sourceDocumentId": "doc_xyz123", // Reference to the uploaded document (optional)
  "sourceFileName": "statement_feb_2024.pdf",
  "confidenceScore": 0.95, // Optional, if Gemini provides or can be inferred
  "createdAt": "Timestamp",
  "updatedAt": "Timestamp"
}

Implementation (Firebase Cloud Function functions/index.js):

const functions = require('firebase-functions');
const admin = require('firebase-admin');
const { GoogleGenerativeAI } = require('@google/generative-ai');

admin.initializeApp();
const db = admin.firestore();

const GEMINI_API_KEY = functions.config().gemini.api_key; // Stored securely in Firebase config
const genAI = new GoogleGenerativeAI(GEMINI_API_KEY);
const model = genAI.getGenerativeModel({ model: "gemini-pro" });

const generateGeminiPrompt = (rawText) => {
    // See Section 5 for detailed prompt construction
    return `You are an expert financial assistant... [FULL PROMPT HERE] ...\nNow, analyze the following text:\n${rawText}`;
};

exports.detectSubscriptions = functions.https.onCall(async (data, context) => {
    if (!context.auth) {
        throw new functions.https.HttpsError('unauthenticated', 'The request requires authentication.');
    }
    const { rawText, fileName } = data;
    const userId = context.auth.uid;

    if (!rawText || typeof rawText !== 'string' || rawText.length === 0) {
        throw new functions.https.HttpsError('invalid-argument', 'The `rawText` parameter is required and must be a non-empty string.');
    }

    try {
        const prompt = generateGeminiPrompt(rawText);
        const result = await model.generateContent(prompt);
        const responseText = result.response.text();

        let detectedSubscriptions;
        try {
            detectedSubscriptions = JSON.parse(responseText);
            // Basic validation: ensure it's an array and contains objects with expected fields
            if (!Array.isArray(detectedSubscriptions)) {
                throw new Error('Gemini response is not a JSON array.');
            }
            detectedSubscriptions = detectedSubscriptions.filter(sub =>
                sub.name && typeof sub.name === 'string' &&
                typeof sub.amount === 'number' && sub.amount > 0 &&
                sub.currency && typeof sub.currency === 'string' &&
                sub.frequency && typeof sub.frequency === 'string' &&
                sub.lastBilledDate && typeof sub.lastBilledDate === 'string'
            );

        } catch (jsonError) {
            console.error('Failed to parse Gemini response or validation failed:', jsonError, 'Raw response:', responseText);
            throw new functions.https.HttpsError('internal', 'Failed to process AI response.');
        }

        const batch = db.batch();
        const userSubscriptionsRef = db.collection(`users/${userId}/subscriptions`);

        detectedSubscriptions.forEach(sub => {
            const subRef = userSubscriptionsRef.doc(); // Auto-generate document ID
            const nextBilledDate = calculateNextBilledDate(sub.lastBilledDate, sub.frequency); // Helper function
            batch.set(subRef, {
                ...sub,
                nextBilledDate: nextBilledDate, // Store as string 'YYYY-MM-DD'
                sourceFileName: fileName,
                createdAt: admin.firestore.FieldValue.serverTimestamp(),
                status: 'pending_review', // User must review/confirm first
                updatedAt: admin.firestore.FieldValue.serverTimestamp(),
            });
        });

        await batch.commit();

        return { success: true, count: detectedSubscriptions.length, detected: detectedSubscriptions };

    } catch (error) {
        console.error('Error detecting subscriptions:', error);
        throw new functions.https.HttpsError('internal', 'An error occurred during subscription detection.', error.message);
    }
});

// Helper function for calculating next billed date
function calculateNextBilledDate(lastBilledDateStr, frequency) {
    const lastDate = new Date(lastBilledDateStr);
    let nextDate = new Date(lastDate);

    switch (frequency.toLowerCase()) {
        case 'weekly': nextDate.setDate(lastDate.getDate() + 7); break;
        case 'bi-weekly': nextDate.setDate(lastDate.getDate() + 14); break;
        case 'monthly': nextDate.setMonth(lastDate.getMonth() + 1); break;
        case 'quarterly': nextDate.setMonth(lastDate.getMonth() + 3); break;
        case 'annually':
        case 'yearly': nextDate.setFullYear(lastDate.getFullYear() + 1); break;
        default:
            return null; // Handle unknown frequencies
    }
    return nextDate.toISOString().split('T')[0]; // Format as YYYY-MM-DD
}

C. Recurring Cost Analysis

Data Retrieval (Client-side):

The Next.js frontend fetches subscription data directly from Firestore using the Firebase JS SDK, leveraging real-time listeners (onSnapshot) for immediate updates.

Implementation (React Hook):

import { useState, useEffect } from 'react';
import { getFirestore, collection, query, where, onSnapshot } from 'firebase/firestore';
import { getAuth } from 'firebase/auth';

function useSubscriptions() {
    const [subscriptions, setSubscriptions] = useState([]);
    const [loading, setLoading] = useState(true);
    const [error, setError] = useState(null);
    const auth = getAuth();
    const db = getFirestore();

    useEffect(() => {
        const user = auth.currentUser;
        if (!user) {
            setLoading(false);
            setError(new Error('User not logged in.'));
            return;
        }

        const q = query(collection(db, `users/${user.uid}/subscriptions`), where('status', '==', 'active'));
        const unsubscribe = onSnapshot(q, (snapshot) => {
            const subsData = snapshot.docs.map(doc => ({ id: doc.id, ...doc.data() }));
            setSubscriptions(subsData);
            setLoading(false);
        }, (err) => {
            console.error('Error fetching subscriptions:', err);
            setError(err);
            setLoading(false);
        });

        return () => unsubscribe(); // Clean up listener
    }, [auth, db]);

    return { subscriptions, loading, error };
}

Calculations & Aggregation (Client-side):
- Frontend logic aggregates amount based on frequency to calculate total monthly/annual spend.
- Normalization Logic:
  - weekly: amount * (365 / 7 / 12) for monthly, amount * 52 for annual.
  - bi-weekly: amount * (365 / 14 / 12) for monthly, amount * 26 for annual.
  - monthly: amount for monthly, amount * 12 for annual.
  - quarterly: amount / 3 for monthly, amount * 4 for annual.
  - annually/yearly: amount / 12 for monthly, amount for annual.
- Display: Render charts (e.g., using Chart.js or Recharts) for monthly spend trends, category breakdowns, and costliest subscriptions.

D. Cancellation Reminders

User Configuration: Users can set a reminderDate (e.g., 2024-03-08) and reminderMethod (e.g., 'email', 'in-app') for each subscription in Firestore.

Cloud Function (sendCancellationReminders):

Trigger: Scheduled Cloud Function (e.g., pubsub.schedule('every 24 hours')).
Process:
1. Queries all active subscriptions where reminderDate is within a configurable window (e.g., today to today + 7 days) and reminderSent is false.
2. For each matching subscription, it triggers an email or an in-app notification.
3. Updates the reminderSent flag to true to prevent duplicate notifications.

Implementation (Firebase Cloud Function):

const functions = require('firebase-functions');
const admin = require('firebase-admin');
const db = admin.firestore();

// Consider using a Firebase Extension for sending emails (e.g., "Trigger Email")
// Or integrate with a service like SendGrid, Nodemailer.

exports.sendCancellationReminders = functions.pubsub.schedule('every 24 hours').onRun(async (context) => {
    const today = new Date();
    const sevenDaysFromNow = new Date();
    sevenDaysFromNow.setDate(today.getDate() + 7);

    const q = db.collectionGroup('subscriptions') // Query across all user subscriptions
        .where('status', '==', 'active')
        .where('reminderDate', '<=', sevenDaysFromNow.toISOString().split('T')[0])
        .where('reminderSent', '==', false);

    const snapshot = await q.get();
    const updates = db.batch();
    const emailsToSend = [];

    snapshot.forEach(doc => {
        const sub = doc.data();
        const userId = doc.ref.parent.parent.id; // Extract userId from path

        // Prepare notification data
        emailsToSend.push({
            to: `users/${userId}/profile/email`, // Needs to fetch user email
            subject: `Subscription Sleuth Reminder: ${sub.name} is due soon!`,
            html: `<p>Your subscription for <b>${sub.name}</b> (${sub.amount} ${sub.currency} ${sub.frequency}) is due to renew on ${sub.nextBilledDate}.</p>
                   <p>Consider reviewing or cancelling. <a href="[APP_LINK]/subscriptions/${doc.id}">Manage Subscription</a></p>`,
            // Use a template if using SendGrid or similar
        });

        updates.update(doc.ref, { reminderSent: true, updatedAt: admin.firestore.FieldValue.serverTimestamp() });
    });

    // In a real scenario, fetch user emails for userId in emailsToSend, then send.
    // Example using a hypothetical email service:
    // for (const emailData of emailsToSend) {
    //     const userDoc = await db.collection('users').doc(emailData.to.split('/')[1]).get();
    //     if (userDoc.exists) {
    //         const userEmail = userDoc.data().email;
    //         await sendEmailService.send({ to: userEmail, subject: emailData.subject, html: emailData.html });
    //     }
    // }

    await updates.commit();
    console.log(`Sent reminders for ${emailsToSend.length} subscriptions.`);
    return null;
});

E. Spending Insights

Dashboard View: A central hub showing a quick summary.
Key Metrics:
- Total Monthly Spend
- Total Annual Spend
- Upcoming Bills (list of next 5-10 subscriptions)
- Top 3 Costliest Subscriptions
Visualizations:
- Monthly Spend Trend: Line chart showing total monthly subscription spending over the last 12-24 months. (Requires storing monthly_summary data in Firestore, e.g., users/{userId}/summaries/{YYYY-MM}).
- Category Breakdown: Pie or bar chart showing spend distribution across categories (requires users to categorize subscriptions).
Interactive List: A table or card view of all subscriptions, with filtering, sorting, and search capabilities.
Implementation: All insights are derived from the users/{userId}/subscriptions collection, potentially augmented by a users/{userId}/summaries collection updated by a daily/monthly scheduled Cloud Function. Frontend UI components fetch this data and render it using libraries like Recharts or Chart.js.

5. Gemini Prompting Strategy

The effectiveness of "Subscription Sleuth" hinges on Gemini's ability to accurately parse unstructured financial text. A well-crafted, few-shot prompt is crucial for achieving high accuracy and consistent output.

Core Principles for Gemini Prompting:

Clear Role and Goal: Define Gemini's persona and the exact task.
Strict Output Format: Insist on a JSON array matching a predefined schema. This is paramount for programmatic parsing.
Contextual Information: Specify the type of input text (e.g., "bank statement transaction list," "email receipt").
Few-Shot Examples: Provide 1-2 good examples of input text and their corresponding desired JSON output. This significantly guides the model to the correct output structure and content.
Handling Edge Cases: Instruct on how to handle scenarios like no subscriptions found, ambiguous dates, or partial information.
Iterative Refinement: Real-world financial statements are messy. Expect to iterate on the prompt with diverse examples to cover various formats and nuances.

Example Gemini Prompt Structure (for Bank Statement):

You are an expert financial assistant. Your task is to meticulously analyze raw financial transaction text from bank statements or credit card statements. Your primary objective is to identify and extract all recurring subscription services.

You MUST return the results as a strict JSON array of objects, where each object represents a detected subscription. If no recurring subscriptions are found, return an empty JSON array: `[]`.

**Expected JSON Output Schema:**
```json
[
  {
    "name": "string (the name of the service or merchant, e.g., 'Netflix', 'Spotify Premium', 'Adobe Creative Cloud')",
    "amount": "number (the recurring charge amount, e.g., 15.99)",
    "currency": "string (the three-letter ISO 4217 currency code, e.g., 'USD', 'EUR', 'GBP')",
    "frequency": "string (the inferred or explicit billing cycle, choose from: 'monthly', 'annually', 'weekly', 'quarterly', 'bi-weekly', 'yearly'. Prioritize monthly/annually if ambiguous.)",
    "lastBilledDate": "string (the date of the most recent recurring transaction for this subscription, in YYYY-MM-DD format. Infer if only month/year or approximate date is given.)"
  }
]

Instructions for Analysis:

Look for patterns of consistent merchant names and transaction amounts over time.
Ignore one-time purchases, refunds, or irregular charges.
Only include active or recently active subscriptions (within the last 3-6 months).
If a frequency is not explicitly stated, infer it based on the timing of transactions for the same merchant and amount. For example, if transactions appear roughly every 30 days, assume 'monthly'.
If multiple transactions for the same service with slightly varying amounts exist (e.g., due to tax changes), extract the most recent consistent amount.
Be cautious with descriptions that might look like subscriptions but are not (e.g., "PAYMENT FOR LOAN", "TRANSFER TO SAVINGS"). Focus strictly on external service providers.

Example Input 1 (with multiple subscriptions):

Date       Description                  Amount
01/05/2024 NETFLIX.COM LTD              -15.99 USD
01/10/2024 Spotify Premium              -10.99 USD
01/12/2024 AMAZON PRIME MEMBERSHIP      -14.99 USD
02/05/2024 NETFLIX.COM LTD              -15.99 USD
02/10/2024 Spotify Premium              -10.99 USD
03/05/2024 NETFLIX.COM LTD              -15.99 USD
03/10/2024 Spotify Premium              -10.99 USD
03/15/2024 COFFEE SHOP PURCHASE         -4.50 USD

Example Output 1:

[
  {
    "name": "Netflix",
    "amount": 15.99,
    "currency": "USD",
    "frequency": "monthly",
    "lastBilledDate": "2024-03-05"
  },
  {
    "name": "Spotify Premium",
    "amount": 10.99,
    "currency": "USD",
    "frequency": "monthly",
    "lastBilledDate": "2024-03-10"
  },
  {
    "name": "Amazon Prime Membership",
    "amount": 14.99,
    "currency": "USD",
    "frequency": "monthly",
    "lastBilledDate": "2024-01-12"
  }
]

Example Input 2 (no subscriptions):

Date       Description                  Amount
04/01/2024 Grocery Store                -85.20 USD
04/02/2024 Gas Station                  -50.00 USD
04/03/2024 Restaurant                   -35.75 USD

Example Output 2:

[]

Now, analyze the following raw financial transaction text: [INSERT_RAW_TEXT_HERE]


**Refinement Strategy:**

*   **Test with real data:** Collect anonymized snippets from various bank/credit card statements.
*   **Handle ambiguities:** What if "Amazon" appears for a subscription and a regular purchase? Add instructions for disambiguation (e.g., look for "Prime," "Membership," "Subscription").
*   **Currency detection:** Explicitly instruct on currency code inference if not present.
*   **Date formats:** Provide examples of common date formats and instruct Gemini to normalize to YYYY-MM-DD.
*   **Length constraints:** Be mindful of Gemini's context window. For very long statements, consider chunking the text and processing chunks separately, then consolidating.

## 6. Deployment & Scaling

### Deployment Strategy (Firebase Ecosystem)

1.  **Firebase Project Setup:**
    *   Create a new Firebase project in the Google Cloud Console.
    *   Enable Firebase Authentication, Firestore, Cloud Functions, and Hosting.
2.  **Next.js Application Deployment:**
    *   **Build:** `npm run build` (or `yarn build`) to generate the optimized production build in the `.next` directory.
    *   **Firebase Hosting Configuration (`firebase.json`):** Configure rewrites to direct API routes to Cloud Functions.
        ```json
        {
          "hosting": {
            "public": ".next/static", // Or "out" for static export, but we need SSR/API routes
            "ignore": [
              "firebase.json",
              "**/.*",
              "**/node_modules/**"
            ],
            "headers": [ // Recommended for security and performance
              {
                "source": "**",
                "headers": [
                  { "key": "Strict-Transport-Security", "value": "max-age=31536000; includeSubDomains; preload" },
                  { "key": "X-Content-Type-Options", "value": "nosniff" },
                  { "key": "X-Frame-Options", "value": "DENY" },
                  { "key": "X-XSS-Protection", "value": "1; mode=block" },
                  { "key": "Content-Security-Policy", "value": "default-src 'self'; script-src 'self' 'unsafe-eval' https://www.gstatic.com; connect-src 'self' https://firebase-url.com https://generativelanguage.googleapis.com; style-src 'self' 'unsafe-inline'; img-src 'self' data:; font-src 'self' data:;" }
                ]
              }
            ],
            "rewrites": [
              {
                "source": "/api/**",
                "function": "nextJsApi" // Points to a Cloud Function wrapping Next.js API routes
              },
              {
                "source": "**",
                "destination": "/index.html" // Fallback for client-side routing, or specific functions for SSR pages
              }
            ]
          },
          "functions": {
            "source": "functions", // Directory for Cloud Functions
            "runtime": "nodejs20"
          }
        }
        ```
    *   **Next.js as Cloud Function:** For full Next.js SSR and API route support, deploy the Next.js app itself as a Cloud Function (e.g., using `@firebase/webframeworks`). This allows Firebase Hosting to proxy requests to the Next.js serverless function.
        ```bash
        firebase deploy --only hosting,functions
        ```
3.  **Cloud Functions Deployment:**
    *   Place all Cloud Functions code in the `functions/` directory.
    *   Install dependencies: `cd functions && npm install`.
    *   Configure Gemini API key: `firebase functions:config:set gemini.api_key="YOUR_GEMINI_KEY"` (ensures key is secure and not hardcoded).
    *   Deploy: `firebase deploy --only functions`.
4.  **Firestore Security Rules:** Essential for protecting user data.
    ```
    rules_version = '2';
    service cloud.firestore {
      match /databases/{database}/documents {
        // User profiles
        match /users/{userId} {
          allow read, write: if request.auth != null && request.auth.uid == userId;

          // User's specific subscriptions
          match /subscriptions/{subscriptionId} {
            allow read, write: if request.auth != null && request.auth.uid == userId;
          }

          // User's specific summaries (e.g., monthly spending reports)
          match /summaries/{summaryId} {
            allow read: if request.auth != null && request.auth.uid == userId;
          }
        }
      }
    }
    ```
    Deploy rules: `firebase deploy --only firestore:rules`.

### Scaling Considerations

1.  **Firebase Authentication:** Designed for massive scale, handling millions of users effortlessly.
2.  **Firestore:**
    *   **Horizontal Scaling:** Automatically scales with data volume and reads/writes.
    *   **Indexing:** Ensure proper indexes are defined for all queries to maintain performance. `collectionGroup` queries require specific index setup.
    *   **Data Model Optimization:** Avoid overly large documents; flatten data or use sub-collections for growing lists (e.g., individual transactions within a statement if stored).
    *   **Batch Operations:** Use `writeBatch` for multiple writes to reduce network calls and ensure atomicity.
3.  **Cloud Functions:**
    *   **Automatic Scaling:** Serverless nature means functions scale up/down based on demand.
    *   **Cold Starts:** For less frequently called functions, a cold start might introduce latency. For critical, latency-sensitive functions (like `detectSubscriptions`), consider setting minimum instances (though this incurs constant cost).
    *   **Memory/CPU Allocation:** Adjust memory and CPU based on function complexity (e.g., Gemini calls can be memory/CPU intensive for large text inputs).
    *   **Concurrency:** Tune function concurrency settings for optimal performance without overwhelming downstream services.
4.  **Gemini API:**
    *   **Quotas and Rate Limits:** Monitor usage in Google Cloud Console. Implement robust retry mechanisms with exponential backoff for API calls to handle transient errors or rate limit excursions.
    *   **Payload Size:** Be mindful of the maximum input token limit for Gemini. For extremely large PDF texts, consider segmenting the text and processing in multiple calls, then aggregating results.
5.  **Next.js Frontend:**
    *   **CDN:** Firebase Hosting automatically uses a global CDN, ensuring low-latency asset delivery for users worldwide.
    *   **Bundle Size:** Keep client-side JavaScript bundles small for fast loading times.
6.  **Monitoring & Logging:**
    *   **Cloud Logging:** Crucial for monitoring Cloud Function invocations, errors, and performance.
    *   **Firebase Performance Monitoring & Crashlytics:** Integrate for client-side performance and crash reporting in the Next.js app.
    *   **Alerting:** Set up alerts in Google Cloud for critical errors, resource utilization thresholds, or unexpected Gemini API spending spikes.

### Security & Compliance

*   **Data Minimization & Encryption:** PDF.js processes files locally, sending only extracted text. All Firebase data is encrypted at rest and in transit.
*   **Authentication & Authorization:** Firebase Auth for robust user identity, Firestore Security Rules for data access control (ensuring users only access their own data).
*   **API Key Management:** Gemini API key is stored securely in Firebase Cloud Functions environment variables, never exposed client-side.
*   **Input Validation:** Sanitize all user inputs before processing (especially `rawText` before sending to Gemini) to prevent prompt injection or other vulnerabilities.
*   **GDPR/CCPA/PIPEDA:** Transparent privacy policy, data consent, and proper handling of user financial data are paramount. Users should be aware of what data is collected, how it's used (e.g., "AI processing for detection"), and for how long it's retained.

By adhering to this blueprint, Subscription Sleuth can be built as a robust, scalable, and secure application, effectively addressing the "subscription fatigue" problem with the power of Google's AI and cloud infrastructure.

Project Blueprint: Subscription Sleuth

1. The Business Problem (Why build this?)

The core problems users face are:

Lack of Visibility: Subscriptions are often "out of sight, out of mind." A service signed up for months or years ago might still be billing a user without their active awareness or continued use.
Hidden Costs: Small, individual subscription fees accumulate rapidly, leading to a substantial drain on monthly budgets that users only discover when reviewing large aggregated statements.
Difficulty in Cancellation: Identifying billing cycles, finding cancellation policies, or remembering to cancel before the next charge can be cumbersome, leading to unintended renewals.
Manual Tracking Burden: Attempting to manually track subscriptions using spreadsheets or personal notes is time-consuming, prone to error, and rarely kept up-to-date.
Fraud and Unwanted Charges: Forgotten subscriptions can sometimes mask fraudulent activities or simply represent services that are no longer desired or used, yet continue to incur costs.

2. Solution Overview

The core solution workflow involves:

Secure Data Ingestion: Users securely upload financial statements (bank, credit card, utility bills) in PDF format. The application is designed with privacy in mind, processing sensitive document text locally where possible.
AI-Powered Detection: Leveraging advanced AI (Gemini API), the system intelligently scans the extracted text for patterns indicative of recurring subscriptions, identifying merchant names, amounts, and billing frequencies.
Consolidated View: All detected subscriptions are aggregated into a single, intuitive dashboard, providing a transparent overview of all recurring financial commitments.
Recurring Cost Analysis: The application calculates total monthly and annual subscription spending, breaks down costs by category, and highlights the most expensive services.
Proactive Reminders: Users receive timely reminders for upcoming subscription renewals, free trial expirations, and cancellation deadlines, preventing unwanted charges.
Spending Insights: Interactive visualizations and reports help users understand their spending habits, identify areas for cost reduction, and make informed financial decisions.

3. Architecture & Tech Stack Justification

Overall Architecture Diagram:

[User Device]
      |
      V
[Next.js Frontend (Client-side Rendering/Hydration)]
      |
      | 1. Upload PDF (PDF.js extracts text locally)
      V
[Next.js API Routes (Serverless Functions on Firebase Hosting/Cloud Functions)]
      |
      | 2. Raw Text & User ID
      V
[Firebase Cloud Functions (Backend Logic & Orchestration)]
      |
      | 3. Gemini API Prompt
      V
[Google Gemini API (AI-powered Detection)]
      |
      | 4. Structured Subscription Data (JSON)
      V
[Firebase Cloud Functions]
      |
      | 5. Store/Update Data
      V
[Google Firestore (NoSQL Database)]
      ^
      | 6. Real-time Subscription Data & Insights
      |
[Next.js Frontend]
      |
      V
[User Dashboard / Notifications]

Tech Stack Justification:

Next.js (Frontend & API Routes):
- Full-stack Capabilities: Next.js allows us to build both the user interface (React) and backend API endpoints within a single framework. This simplifies development, tooling, and deployment.
- Server-Side Rendering (SSR) / Static Site Generation (SSG): While the core application will be behind authentication, Next.js provides flexibility for a performant landing page or marketing site. For authenticated routes, client-side rendering with pre-rendering (hydration) offers excellent responsiveness.
- Developer Experience: Leveraging the React ecosystem ensures a rich, interactive, and maintainable user interface.
- API Routes: These act as lightweight, serverless functions, perfect for proxying requests to Firebase Cloud Functions or directly handling simple data operations, without needing a separate backend server setup.
Google Firebase (Backend Services):
- Firebase Authentication: Provides a comprehensive and secure identity platform. It supports email/password, social logins (Google, Apple, etc.), and anonymous authentication, significantly reducing development effort for user management.
- Firestore (NoSQL Database): A highly scalable, flexible, and real-time NoSQL document database. Its real-time synchronization capabilities are ideal for instantly updating user dashboards with newly detected subscriptions or modified data. Its schema-less nature accommodates the evolving needs of subscription data.
- Cloud Functions for Firebase: A serverless execution environment perfect for backend logic. This is where the heavy lifting of interacting with the Gemini API, performing data transformations, and sending notifications will occur. It scales automatically and eliminates server management overhead.
- Firebase Storage: Provides secure object storage, which can be used for temporary storage of uploaded PDFs before processing (though PDF.js will handle most of this client-side).
- Firebase Hosting: Offers fast, secure, and globally distributed hosting for the Next.js frontend, complete with SSL and CDN capabilities. It integrates seamlessly with Cloud Functions for API routing.
Gemini API (AI-Powered Subscription Detection):
- Core Intelligence: Gemini is fundamental to the "Subscription Detection" feature. Its advanced natural language understanding and ability to process long text inputs are crucial for extracting structured data from unstructured financial statements.
- Multimodal Potential: While initially used for text analysis, Gemini's multimodal capabilities offer future expansion opportunities, such as analyzing payment screenshots or invoices.
- Google Ecosystem Integration: Seamless integration with other Google Cloud services (like Cloud Functions) and robust infrastructure.
PDF.js (Client-side PDF Parsing):
- Privacy-First Approach: A critical choice for handling sensitive financial documents. PDF.js allows us to extract text directly within the user's browser (client-side). This means the raw PDF document never leaves the user's device. Only the extracted text is then sent to the backend for AI processing. This significantly enhances user trust and privacy compliance.
- Efficiency: Client-side processing offloads compute from the server, improving responsiveness and reducing backend load.
- Open-Source & Robust: A well-maintained and widely used library for PDF rendering and text extraction.

This integrated stack provides a powerful, scalable, and cost-effective foundation for Subscription Sleuth, allowing us to focus on delivering core features rather than infrastructure management.

4. Core Feature Implementation Guide

This section details the implementation strategy for the key features, outlining data flow, pseudo-code, and specific technologies.

A. User Onboarding & Data Ingestion

Authentication (Firebase Auth):

Flow: Users sign up or log in using email/password or Google SSO. Firebase UI library or custom React components will handle the UI.

Implementation:

import { getAuth, signInWithEmailAndPassword, createUserWithEmailAndPassword, GoogleAuthProvider, signInWithPopup } from 'firebase/auth';
const auth = getAuth();

// Email/Password Signup
async function signUp(email, password) {
    try {
        const userCredential = await createUserWithEmailAndPassword(auth, email, password);
        console.log('User signed up:', userCredential.user.uid);
    } catch (error) {
        console.error('Signup error:', error.message);
    }
}

// Google Sign-in
async function signInWithGoogle() {
    const provider = new GoogleAuthProvider();
    try {
        const result = await signInWithPopup(auth, provider);
        console.log('Signed in with Google:', result.user.uid);
    } catch (error) {
        console.error('Google Sign-in error:', error.message);
    }
}

Data Upload & PDF Parsing (Client-side with PDF.js):

UI: A drag-and-drop file input component (e.g., using react-dropzone).
Processing:
- User uploads a PDF file (e.g., bank_statement.pdf).
- PDF.js reads the file as a FileReader ArrayBuffer.
- It then loads the PDF document and extracts text content from each page.
- The aggregated raw text is then sent to a Next.js API route.

Implementation (Next.js client component):

import * as pdfjsLib from 'pdfjs-dist/build/pdf';
pdfjsLib.GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjsLib.version}/pdf.worker.min.js`;

async function extractTextFromPdf(file) {
    const reader = new FileReader();
    reader.readAsArrayBuffer(file);

    return new Promise((resolve, reject) => {
        reader.onload = async () => {
            const typedarray = new Uint8Array(reader.result);
            try {
                const pdf = await pdfjsLib.getDocument({ data: typedarray }).promise;
                let fullText = '';
                for (let i = 1; i <= pdf.numPages; i++) {
                    const page = await pdf.getPage(i);
                    const textContent = await page.getTextContent();
                    fullText += textContent.items.map(item => item.str).join(' ') + '\n';
                }
                resolve(fullText);
            } catch (error) {
                console.error('Error extracting text from PDF:', error);
                reject(error);
            }
        };
        reader.onerror = (error) => reject(error);
    });
}

// In your React component onChange handler for file input:
async function handleFileUpload(event) {
    const file = event.target.files[0];
    if (file && file.type === 'application/pdf') {
        const rawText = await extractTextFromPdf(file);
        // Send rawText to Next.js API route for further processing
        const response = await fetch('/api/process-statement', {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({ rawText, fileName: file.name })
        });
        const result = await response.json();
        console.log('API response:', result);
    }
}

B. Subscription Detection (Gemini & Cloud Functions)

This is the core intelligence pipeline.

Next.js API Route (/api/process-statement):

Receives rawText and fileName from the client.
Authenticates the user (e.g., using Firebase Auth token from the client).
Calls a Firebase Cloud Function securely.

Implementation (Next.js API Route pages/api/process-statement.js):

import { getAuth } from 'firebase-admin/auth'; // Using firebase-admin for server-side
import { initializeApp, getApps, cert } from 'firebase-admin/app';
import { getFunctions } from 'firebase-admin/functions';
import serviceAccount from 'path/to/your/serviceAccountKey.json'; // Securely load credentials

if (!getApps().length) {
  initializeApp({
    credential: cert(serviceAccount),
    projectId: process.env.NEXT_PUBLIC_FIREBASE_PROJECT_ID, // Use project ID for functions
  });
}

export default async function handler(req, res) {
    if (req.method !== 'POST') {
        return res.status(405).json({ error: 'Method Not Allowed' });
    }

    const idToken = req.headers.authorization?.split('Bearer ')[1];
    if (!idToken) {
        return res.status(401).json({ error: 'Unauthorized' });
    }

    try {
        const decodedToken = await getAuth().verifyIdToken(idToken);
        const userId = decodedToken.uid;
        const { rawText, fileName } = req.body;

        // Call the Firebase Cloud Function
        const functions = getFunctions();
        const detectSubscriptionsFunction = functions.httpsCallable('detectSubscriptions');
        const result = await detectSubscriptionsFunction({ rawText, userId, fileName });

        res.status(200).json({ success: true, data: result.data });

    } catch (error) {
        console.error('Error in /api/process-statement:', error);
        res.status(500).json({ error: error.message });
    }
}

Firebase Cloud Function (detectSubscriptions):

Trigger: HTTPS Callable function.
Inputs: rawText (from PDF.js via Next.js API), userId.
Process:
1. Constructs a detailed Gemini prompt with rawText.
2. Calls the Gemini API (model.generateContent).
3. Parses the JSON response from Gemini.
4. Validates and sanitizes the extracted data.
5. Stores the detected subscriptions in Firestore under users/{userId}/subscriptions.

Firestore Data Model for subscriptions:

{
  "name": "Netflix Premium",
  "amount": 15.99,
  "currency": "USD",
  "frequency": "monthly", // "monthly", "annually", "weekly", "quarterly", "bi-weekly"
  "lastBilledDate": "2024-02-15", // YYYY-MM-DD
  "nextBilledDate": "2024-03-15", // Calculated
  "status": "active", // "active", "paused", "cancelled", "pending_review"
  "sourceDocumentId": "doc_xyz123", // Reference to the uploaded document (optional)
  "sourceFileName": "statement_feb_2024.pdf",
  "confidenceScore": 0.95, // Optional, if Gemini provides or can be inferred
  "createdAt": "Timestamp",
  "updatedAt": "Timestamp"
}

Implementation (Firebase Cloud Function functions/index.js):

const functions = require('firebase-functions');
const admin = require('firebase-admin');
const { GoogleGenerativeAI } = require('@google/generative-ai');

admin.initializeApp();
const db = admin.firestore();

const GEMINI_API_KEY = functions.config().gemini.api_key; // Stored securely in Firebase config
const genAI = new GoogleGenerativeAI(GEMINI_API_KEY);
const model = genAI.getGenerativeModel({ model: "gemini-pro" });

const generateGeminiPrompt = (rawText) => {
    // See Section 5 for detailed prompt construction
    return `You are an expert financial assistant... [FULL PROMPT HERE] ...\nNow, analyze the following text:\n${rawText}`;
};

exports.detectSubscriptions = functions.https.onCall(async (data, context) => {
    if (!context.auth) {
        throw new functions.https.HttpsError('unauthenticated', 'The request requires authentication.');
    }
    const { rawText, fileName } = data;
    const userId = context.auth.uid;

    if (!rawText || typeof rawText !== 'string' || rawText.length === 0) {
        throw new functions.https.HttpsError('invalid-argument', 'The `rawText` parameter is required and must be a non-empty string.');
    }

    try {
        const prompt = generateGeminiPrompt(rawText);
        const result = await model.generateContent(prompt);
        const responseText = result.response.text();

        let detectedSubscriptions;
        try {
            detectedSubscriptions = JSON.parse(responseText);
            // Basic validation: ensure it's an array and contains objects with expected fields
            if (!Array.isArray(detectedSubscriptions)) {
                throw new Error('Gemini response is not a JSON array.');
            }
            detectedSubscriptions = detectedSubscriptions.filter(sub =>
                sub.name && typeof sub.name === 'string' &&
                typeof sub.amount === 'number' && sub.amount > 0 &&
                sub.currency && typeof sub.currency === 'string' &&
                sub.frequency && typeof sub.frequency === 'string' &&
                sub.lastBilledDate && typeof sub.lastBilledDate === 'string'
            );

        } catch (jsonError) {
            console.error('Failed to parse Gemini response or validation failed:', jsonError, 'Raw response:', responseText);
            throw new functions.https.HttpsError('internal', 'Failed to process AI response.');
        }

        const batch = db.batch();
        const userSubscriptionsRef = db.collection(`users/${userId}/subscriptions`);

        detectedSubscriptions.forEach(sub => {
            const subRef = userSubscriptionsRef.doc(); // Auto-generate document ID
            const nextBilledDate = calculateNextBilledDate(sub.lastBilledDate, sub.frequency); // Helper function
            batch.set(subRef, {
                ...sub,
                nextBilledDate: nextBilledDate, // Store as string 'YYYY-MM-DD'
                sourceFileName: fileName,
                createdAt: admin.firestore.FieldValue.serverTimestamp(),
                status: 'pending_review', // User must review/confirm first
                updatedAt: admin.firestore.FieldValue.serverTimestamp(),
            });
        });

        await batch.commit();

        return { success: true, count: detectedSubscriptions.length, detected: detectedSubscriptions };

    } catch (error) {
        console.error('Error detecting subscriptions:', error);
        throw new functions.https.HttpsError('internal', 'An error occurred during subscription detection.', error.message);
    }
});

// Helper function for calculating next billed date
function calculateNextBilledDate(lastBilledDateStr, frequency) {
    const lastDate = new Date(lastBilledDateStr);
    let nextDate = new Date(lastDate);

    switch (frequency.toLowerCase()) {
        case 'weekly': nextDate.setDate(lastDate.getDate() + 7); break;
        case 'bi-weekly': nextDate.setDate(lastDate.getDate() + 14); break;
        case 'monthly': nextDate.setMonth(lastDate.getMonth() + 1); break;
        case 'quarterly': nextDate.setMonth(lastDate.getMonth() + 3); break;
        case 'annually':
        case 'yearly': nextDate.setFullYear(lastDate.getFullYear() + 1); break;
        default:
            return null; // Handle unknown frequencies
    }
    return nextDate.toISOString().split('T')[0]; // Format as YYYY-MM-DD
}

C. Recurring Cost Analysis

Data Retrieval (Client-side):

The Next.js frontend fetches subscription data directly from Firestore using the Firebase JS SDK, leveraging real-time listeners (onSnapshot) for immediate updates.

Implementation (React Hook):

import { useState, useEffect } from 'react';
import { getFirestore, collection, query, where, onSnapshot } from 'firebase/firestore';
import { getAuth } from 'firebase/auth';

function useSubscriptions() {
    const [subscriptions, setSubscriptions] = useState([]);
    const [loading, setLoading] = useState(true);
    const [error, setError] = useState(null);
    const auth = getAuth();
    const db = getFirestore();

    useEffect(() => {
        const user = auth.currentUser;
        if (!user) {
            setLoading(false);
            setError(new Error('User not logged in.'));
            return;
        }

        const q = query(collection(db, `users/${user.uid}/subscriptions`), where('status', '==', 'active'));
        const unsubscribe = onSnapshot(q, (snapshot) => {
            const subsData = snapshot.docs.map(doc => ({ id: doc.id, ...doc.data() }));
            setSubscriptions(subsData);
            setLoading(false);
        }, (err) => {
            console.error('Error fetching subscriptions:', err);
            setError(err);
            setLoading(false);
        });

        return () => unsubscribe(); // Clean up listener
    }, [auth, db]);

    return { subscriptions, loading, error };
}

Calculations & Aggregation (Client-side):
- Frontend logic aggregates amount based on frequency to calculate total monthly/annual spend.
- Normalization Logic:
  - weekly: amount * (365 / 7 / 12) for monthly, amount * 52 for annual.
  - bi-weekly: amount * (365 / 14 / 12) for monthly, amount * 26 for annual.
  - monthly: amount for monthly, amount * 12 for annual.
  - quarterly: amount / 3 for monthly, amount * 4 for annual.
  - annually/yearly: amount / 12 for monthly, amount for annual.
- Display: Render charts (e.g., using Chart.js or Recharts) for monthly spend trends, category breakdowns, and costliest subscriptions.

D. Cancellation Reminders

User Configuration: Users can set a reminderDate (e.g., 2024-03-08) and reminderMethod (e.g., 'email', 'in-app') for each subscription in Firestore.

Cloud Function (sendCancellationReminders):

Trigger: Scheduled Cloud Function (e.g., pubsub.schedule('every 24 hours')).
Process:
1. Queries all active subscriptions where reminderDate is within a configurable window (e.g., today to today + 7 days) and reminderSent is false.
2. For each matching subscription, it triggers an email or an in-app notification.
3. Updates the reminderSent flag to true to prevent duplicate notifications.

Implementation (Firebase Cloud Function):

const functions = require('firebase-functions');
const admin = require('firebase-admin');
const db = admin.firestore();

// Consider using a Firebase Extension for sending emails (e.g., "Trigger Email")
// Or integrate with a service like SendGrid, Nodemailer.

exports.sendCancellationReminders = functions.pubsub.schedule('every 24 hours').onRun(async (context) => {
    const today = new Date();
    const sevenDaysFromNow = new Date();
    sevenDaysFromNow.setDate(today.getDate() + 7);

    const q = db.collectionGroup('subscriptions') // Query across all user subscriptions
        .where('status', '==', 'active')
        .where('reminderDate', '<=', sevenDaysFromNow.toISOString().split('T')[0])
        .where('reminderSent', '==', false);

    const snapshot = await q.get();
    const updates = db.batch();
    const emailsToSend = [];

    snapshot.forEach(doc => {
        const sub = doc.data();
        const userId = doc.ref.parent.parent.id; // Extract userId from path

        // Prepare notification data
        emailsToSend.push({
            to: `users/${userId}/profile/email`, // Needs to fetch user email
            subject: `Subscription Sleuth Reminder: ${sub.name} is due soon!`,
            html: `<p>Your subscription for <b>${sub.name}</b> (${sub.amount} ${sub.currency} ${sub.frequency}) is due to renew on ${sub.nextBilledDate}.</p>
                   <p>Consider reviewing or cancelling. <a href="[APP_LINK]/subscriptions/${doc.id}">Manage Subscription</a></p>`,
            // Use a template if using SendGrid or similar
        });

        updates.update(doc.ref, { reminderSent: true, updatedAt: admin.firestore.FieldValue.serverTimestamp() });
    });

    // In a real scenario, fetch user emails for userId in emailsToSend, then send.
    // Example using a hypothetical email service:
    // for (const emailData of emailsToSend) {
    //     const userDoc = await db.collection('users').doc(emailData.to.split('/')[1]).get();
    //     if (userDoc.exists) {
    //         const userEmail = userDoc.data().email;
    //         await sendEmailService.send({ to: userEmail, subject: emailData.subject, html: emailData.html });
    //     }
    // }

    await updates.commit();
    console.log(`Sent reminders for ${emailsToSend.length} subscriptions.`);
    return null;
});

E. Spending Insights

Dashboard View: A central hub showing a quick summary.
Key Metrics:
- Total Monthly Spend
- Total Annual Spend
- Upcoming Bills (list of next 5-10 subscriptions)
- Top 3 Costliest Subscriptions
Visualizations:
- Monthly Spend Trend: Line chart showing total monthly subscription spending over the last 12-24 months. (Requires storing monthly_summary data in Firestore, e.g., users/{userId}/summaries/{YYYY-MM}).
- Category Breakdown: Pie or bar chart showing spend distribution across categories (requires users to categorize subscriptions).
Interactive List: A table or card view of all subscriptions, with filtering, sorting, and search capabilities.
Implementation: All insights are derived from the users/{userId}/subscriptions collection, potentially augmented by a users/{userId}/summaries collection updated by a daily/monthly scheduled Cloud Function. Frontend UI components fetch this data and render it using libraries like Recharts or Chart.js.

5. Gemini Prompting Strategy

Core Principles for Gemini Prompting:

Clear Role and Goal: Define Gemini's persona and the exact task.
Strict Output Format: Insist on a JSON array matching a predefined schema. This is paramount for programmatic parsing.
Contextual Information: Specify the type of input text (e.g., "bank statement transaction list," "email receipt").
Few-Shot Examples: Provide 1-2 good examples of input text and their corresponding desired JSON output. This significantly guides the model to the correct output structure and content.
Handling Edge Cases: Instruct on how to handle scenarios like no subscriptions found, ambiguous dates, or partial information.
Iterative Refinement: Real-world financial statements are messy. Expect to iterate on the prompt with diverse examples to cover various formats and nuances.

Example Gemini Prompt Structure (for Bank Statement):

You are an expert financial assistant. Your task is to meticulously analyze raw financial transaction text from bank statements or credit card statements. Your primary objective is to identify and extract all recurring subscription services.

You MUST return the results as a strict JSON array of objects, where each object represents a detected subscription. If no recurring subscriptions are found, return an empty JSON array: `[]`.

**Expected JSON Output Schema:**
```json
[
  {
    "name": "string (the name of the service or merchant, e.g., 'Netflix', 'Spotify Premium', 'Adobe Creative Cloud')",
    "amount": "number (the recurring charge amount, e.g., 15.99)",
    "currency": "string (the three-letter ISO 4217 currency code, e.g., 'USD', 'EUR', 'GBP')",
    "frequency": "string (the inferred or explicit billing cycle, choose from: 'monthly', 'annually', 'weekly', 'quarterly', 'bi-weekly', 'yearly'. Prioritize monthly/annually if ambiguous.)",
    "lastBilledDate": "string (the date of the most recent recurring transaction for this subscription, in YYYY-MM-DD format. Infer if only month/year or approximate date is given.)"
  }
]

Instructions for Analysis:

Look for patterns of consistent merchant names and transaction amounts over time.
Ignore one-time purchases, refunds, or irregular charges.
Only include active or recently active subscriptions (within the last 3-6 months).
If a frequency is not explicitly stated, infer it based on the timing of transactions for the same merchant and amount. For example, if transactions appear roughly every 30 days, assume 'monthly'.
If multiple transactions for the same service with slightly varying amounts exist (e.g., due to tax changes), extract the most recent consistent amount.
Be cautious with descriptions that might look like subscriptions but are not (e.g., "PAYMENT FOR LOAN", "TRANSFER TO SAVINGS"). Focus strictly on external service providers.

Example Input 1 (with multiple subscriptions):

Date       Description                  Amount
01/05/2024 NETFLIX.COM LTD              -15.99 USD
01/10/2024 Spotify Premium              -10.99 USD
01/12/2024 AMAZON PRIME MEMBERSHIP      -14.99 USD
02/05/2024 NETFLIX.COM LTD              -15.99 USD
02/10/2024 Spotify Premium              -10.99 USD
03/05/2024 NETFLIX.COM LTD              -15.99 USD
03/10/2024 Spotify Premium              -10.99 USD
03/15/2024 COFFEE SHOP PURCHASE         -4.50 USD

Example Output 1:

[
  {
    "name": "Netflix",
    "amount": 15.99,
    "currency": "USD",
    "frequency": "monthly",
    "lastBilledDate": "2024-03-05"
  },
  {
    "name": "Spotify Premium",
    "amount": 10.99,
    "currency": "USD",
    "frequency": "monthly",
    "lastBilledDate": "2024-03-10"
  },
  {
    "name": "Amazon Prime Membership",
    "amount": 14.99,
    "currency": "USD",
    "frequency": "monthly",
    "lastBilledDate": "2024-01-12"
  }
]

Example Input 2 (no subscriptions):

Date       Description                  Amount
04/01/2024 Grocery Store                -85.20 USD
04/02/2024 Gas Station                  -50.00 USD
04/03/2024 Restaurant                   -35.75 USD

Example Output 2:

[]

Now, analyze the following raw financial transaction text: [INSERT_RAW_TEXT_HERE]


**Refinement Strategy:**

*   **Test with real data:** Collect anonymized snippets from various bank/credit card statements.
*   **Handle ambiguities:** What if "Amazon" appears for a subscription and a regular purchase? Add instructions for disambiguation (e.g., look for "Prime," "Membership," "Subscription").
*   **Currency detection:** Explicitly instruct on currency code inference if not present.
*   **Date formats:** Provide examples of common date formats and instruct Gemini to normalize to YYYY-MM-DD.
*   **Length constraints:** Be mindful of Gemini's context window. For very long statements, consider chunking the text and processing chunks separately, then consolidating.

## 6. Deployment & Scaling

### Deployment Strategy (Firebase Ecosystem)

1.  **Firebase Project Setup:**
    *   Create a new Firebase project in the Google Cloud Console.
    *   Enable Firebase Authentication, Firestore, Cloud Functions, and Hosting.
2.  **Next.js Application Deployment:**
    *   **Build:** `npm run build` (or `yarn build`) to generate the optimized production build in the `.next` directory.
    *   **Firebase Hosting Configuration (`firebase.json`):** Configure rewrites to direct API routes to Cloud Functions.
        ```json
        {
          "hosting": {
            "public": ".next/static", // Or "out" for static export, but we need SSR/API routes
            "ignore": [
              "firebase.json",
              "**/.*",
              "**/node_modules/**"
            ],
            "headers": [ // Recommended for security and performance
              {
                "source": "**",
                "headers": [
                  { "key": "Strict-Transport-Security", "value": "max-age=31536000; includeSubDomains; preload" },
                  { "key": "X-Content-Type-Options", "value": "nosniff" },
                  { "key": "X-Frame-Options", "value": "DENY" },
                  { "key": "X-XSS-Protection", "value": "1; mode=block" },
                  { "key": "Content-Security-Policy", "value": "default-src 'self'; script-src 'self' 'unsafe-eval' https://www.gstatic.com; connect-src 'self' https://firebase-url.com https://generativelanguage.googleapis.com; style-src 'self' 'unsafe-inline'; img-src 'self' data:; font-src 'self' data:;" }
                ]
              }
            ],
            "rewrites": [
              {
                "source": "/api/**",
                "function": "nextJsApi" // Points to a Cloud Function wrapping Next.js API routes
              },
              {
                "source": "**",
                "destination": "/index.html" // Fallback for client-side routing, or specific functions for SSR pages
              }
            ]
          },
          "functions": {
            "source": "functions", // Directory for Cloud Functions
            "runtime": "nodejs20"
          }
        }
        ```
    *   **Next.js as Cloud Function:** For full Next.js SSR and API route support, deploy the Next.js app itself as a Cloud Function (e.g., using `@firebase/webframeworks`). This allows Firebase Hosting to proxy requests to the Next.js serverless function.
        ```bash
        firebase deploy --only hosting,functions
        ```
3.  **Cloud Functions Deployment:**
    *   Place all Cloud Functions code in the `functions/` directory.
    *   Install dependencies: `cd functions && npm install`.
    *   Configure Gemini API key: `firebase functions:config:set gemini.api_key="YOUR_GEMINI_KEY"` (ensures key is secure and not hardcoded).
    *   Deploy: `firebase deploy --only functions`.
4.  **Firestore Security Rules:** Essential for protecting user data.
    ```
    rules_version = '2';
    service cloud.firestore {
      match /databases/{database}/documents {
        // User profiles
        match /users/{userId} {
          allow read, write: if request.auth != null && request.auth.uid == userId;

          // User's specific subscriptions
          match /subscriptions/{subscriptionId} {
            allow read, write: if request.auth != null && request.auth.uid == userId;
          }

          // User's specific summaries (e.g., monthly spending reports)
          match /summaries/{summaryId} {
            allow read: if request.auth != null && request.auth.uid == userId;
          }
        }
      }
    }
    ```
    Deploy rules: `firebase deploy --only firestore:rules`.

### Scaling Considerations

1.  **Firebase Authentication:** Designed for massive scale, handling millions of users effortlessly.
2.  **Firestore:**
    *   **Horizontal Scaling:** Automatically scales with data volume and reads/writes.
    *   **Indexing:** Ensure proper indexes are defined for all queries to maintain performance. `collectionGroup` queries require specific index setup.
    *   **Data Model Optimization:** Avoid overly large documents; flatten data or use sub-collections for growing lists (e.g., individual transactions within a statement if stored).
    *   **Batch Operations:** Use `writeBatch` for multiple writes to reduce network calls and ensure atomicity.
3.  **Cloud Functions:**
    *   **Automatic Scaling:** Serverless nature means functions scale up/down based on demand.
    *   **Cold Starts:** For less frequently called functions, a cold start might introduce latency. For critical, latency-sensitive functions (like `detectSubscriptions`), consider setting minimum instances (though this incurs constant cost).
    *   **Memory/CPU Allocation:** Adjust memory and CPU based on function complexity (e.g., Gemini calls can be memory/CPU intensive for large text inputs).
    *   **Concurrency:** Tune function concurrency settings for optimal performance without overwhelming downstream services.
4.  **Gemini API:**
    *   **Quotas and Rate Limits:** Monitor usage in Google Cloud Console. Implement robust retry mechanisms with exponential backoff for API calls to handle transient errors or rate limit excursions.
    *   **Payload Size:** Be mindful of the maximum input token limit for Gemini. For extremely large PDF texts, consider segmenting the text and processing in multiple calls, then aggregating results.
5.  **Next.js Frontend:**
    *   **CDN:** Firebase Hosting automatically uses a global CDN, ensuring low-latency asset delivery for users worldwide.
    *   **Bundle Size:** Keep client-side JavaScript bundles small for fast loading times.
6.  **Monitoring & Logging:**
    *   **Cloud Logging:** Crucial for monitoring Cloud Function invocations, errors, and performance.
    *   **Firebase Performance Monitoring & Crashlytics:** Integrate for client-side performance and crash reporting in the Next.js app.
    *   **Alerting:** Set up alerts in Google Cloud for critical errors, resource utilization thresholds, or unexpected Gemini API spending spikes.

### Security & Compliance

*   **Data Minimization & Encryption:** PDF.js processes files locally, sending only extracted text. All Firebase data is encrypted at rest and in transit.
*   **Authentication & Authorization:** Firebase Auth for robust user identity, Firestore Security Rules for data access control (ensuring users only access their own data).
*   **API Key Management:** Gemini API key is stored securely in Firebase Cloud Functions environment variables, never exposed client-side.
*   **Input Validation:** Sanitize all user inputs before processing (especially `rawText` before sending to Gemini) to prevent prompt injection or other vulnerabilities.
*   **GDPR/CCPA/PIPEDA:** Transparent privacy policy, data consent, and proper handling of user financial data are paramount. Users should be aware of what data is collected, how it's used (e.g., "AI processing for detection"), and for how long it's retained.

By adhering to this blueprint, Subscription Sleuth can be built as a robust, scalable, and secure application, effectively addressing the "subscription fatigue" problem with the power of Google's AI and cloud infrastructure.

Project Blueprint: Subscription Sleuth

1. The Business Problem (Why build this?)

2. Solution Overview

3. Architecture & Tech Stack Justification

4. Core Feature Implementation Guide

A. User Onboarding & Data Ingestion

B. Subscription Detection (Gemini & Cloud Functions)

C. Recurring Cost Analysis

D. Cancellation Reminders

E. Spending Insights

5. Gemini Prompting Strategy

Core Capabilities

Technology Stack

Ready to build?

Subscription Sleuth

Project Blueprint: Subscription Sleuth

1. The Business Problem (Why build this?)

2. Solution Overview

3. Architecture & Tech Stack Justification

4. Core Feature Implementation Guide

A. User Onboarding & Data Ingestion

B. Subscription Detection (Gemini & Cloud Functions)

C. Recurring Cost Analysis

D. Cancellation Reminders

E. Spending Insights

5. Gemini Prompting Strategy

Core Capabilities

Technology Stack

Ready to build?