# AIMeet - Requirements Document ## 1. Functional Requirements ### 1.1 User Management - FR-1.1: Users must be able to register with username, email, and password - FR-1.2: Users must be able to log in with credentials - FR-1.3: Users must be able to log out - FR-1.4: Users must be able to reset password via email - FR-1.5: User profiles must store name, email, profile picture ### 1.2 Meeting Management - FR-2.1: Users can create a meeting with title, description, and max participants - FR-2.2: System generates unique shareable room code for each meeting - FR-2.3: Users can join meetings using room code - FR-2.4: Meeting host can end the meeting - FR-2.5: Meeting state tracks: active, ended, archived - FR-2.6: Users can view list of their meetings (hosted and joined) - FR-2.7: Users can delete or archive completed meetings ### 1.3 Real-Time Video & Audio - FR-3.1: Video streaming using Agora RTC SDK - FR-3.2: Audio streaming with VP8 codec - FR-3.3: Dynamic bitrate adjustment based on network - FR-3.4: Participants can mute/unmute audio and video - FR-3.5: Host can kick participants - FR-3.6: Screen sharing capability (optional, future) ### 1.4 Recording - FR-4.1: Audio is automatically recorded during meeting using MediaRecorder - FR-4.2: Recording saved as WebM format locally - FR-4.3: Users can upload recording after meeting - FR-4.4: Recording uploaded to AWS S3 - FR-4.5: System stores recording metadata (size, duration, upload time) - FR-4.6: Presigned URLs generated for private S3 access ### 1.5 Transcription - FR-5.1: Uploaded recordings sent to AssemblyAI for transcription - FR-5.2: System polls AssemblyAI for transcription status - FR-5.3: Completed transcripts saved to database - FR-5.4: Transcript status tracked: not_started, processing, completed, failed - FR-5.5: Transcript linked to meeting record ### 1.6 Knowledge Processing (RAG) - FR-6.1: Users can trigger "Prepare for Search" to process transcript - FR-6.2: System chunks transcript using recursive character splitting (500 tokens, 50 overlap) - FR-6.3: Chunks stored in TranscriptChunk model - FR-6.4: Chunks embedded using OpenAI text-embedding-3-small - FR-6.5: Embeddings stored in Qdrant vector database - FR-6.6: Idempotent processing: check timestamps to avoid reprocessing ### 1.7 Question Answering (RAG Query) - FR-7.1: Users can ask questions about meeting content - FR-7.2: Question embedded using same OpenAI model - FR-7.3: System searches Qdrant for top-5 similar chunks - FR-7.4: Conversation history retrieved for context - FR-7.5: GPT-4o called with context + history + question - FR-7.6: Response generated and displayed to user - FR-7.7: Q&A turn saved to ConversationHistory ### 1.8 Meeting Preparation (Sticky Notes) - FR-8.1: When creating new meeting, system suggests related past meetings - FR-8.2: Suggestions based on meeting title/agenda keywords - FR-8.3: Shows what was discussed about same topics before - FR-8.4: Users can expand sticky notes to see full context - FR-8.5: Helps prevent duplicate discussions ### 1.9 Document Management - FR-9.1: Users can upload documents (PDF, DOCX, TXT) - FR-9.2: Documents stored in S3 - FR-9.3: Document text extracted and stored - FR-9.4: Documents chunked same way as transcripts - FR-9.5: Document chunks embedded and stored in Qdrant - FR-9.6: Users can view list of documents per meeting - FR-9.7: Users can delete documents ### 1.10 Unified Search - FR-10.1: Questions search both transcripts and documents - FR-10.2: Results include source type (meeting transcript vs document) - FR-10.3: Search results show relevance scores - FR-10.4: Source metadata (timestamps, document names) included ### 1.11 Chat - FR-11.1: Real-time chat during meetings using WebSocket - FR-11.2: Chat messages saved to database - FR-11.3: Users can view chat history - FR-11.4: Message timestamps tracked - FR-11.5: Messages linked to user and meeting ### 1.12 Reporting & Analytics (Future) - FR-12.1: Meeting duration and participant count - FR-12.2: Transcript statistics (word count, duration) - FR-12.3: Q&A usage statistics - FR-12.4: Most discussed topics across meetings --- ## 2. Non-Functional Requirements ### 2.1 Performance - NFR-1.1: Q&A response time: <4 seconds (including LLM latency) - NFR-1.2: Vector search latency: <500ms - NFR-1.3: API response time: <1 second for non-AI endpoints - NFR-1.4: Page load time: <3 seconds - NFR-1.5: Concurrent users: 100+ with auto-scaling - NFR-1.6: Transcript processing: <1 minute for typical meeting ### 2.2 Scalability - NFR-2.1: Horizontal scaling via EC2 Auto Scaling Groups - NFR-2.2: Database: RDS with read replicas - NFR-2.3: S3 handles unlimited storage - NFR-2.4: Qdrant Cloud manages vector scaling - NFR-2.5: Support growth from 10 to 10,000 users ### 2.3 Reliability - NFR-3.1: 99.5% uptime SLA - NFR-3.2: Automated daily database backups - NFR-3.3: Multi-AZ RDS for failover - NFR-3.4: CloudFront CDN for static assets - NFR-3.5: Graceful error handling and user feedback ### 2.4 Security - NFR-4.1: HTTPS for all communications - NFR-4.2: Password hashing with bcrypt - NFR-4.3: JWT tokens for API authentication - NFR-4.4: SQL injection protection via ORM - NFR-4.5: XSS protection via template escaping - NFR-4.6: CSRF protection on forms - NFR-4.7: S3 encryption at rest (AES-256) - NFR-4.8: Database encryption (KMS) - NFR-4.9: API keys in Secrets Manager (no hardcoding) - NFR-4.10: Private S3 access via presigned URLs - NFR-4.11: Private subnet for RDS (no public IP) - NFR-4.12: Rate limiting: 100 requests/minute per user ### 2.5 Usability - NFR-5.1: Responsive design for mobile (375px+) and desktop - NFR-5.2: Accessibility: WCAG 2.1 Level AA compliance - NFR-5.3: Intuitive UI with clear navigation - NFR-5.4: Error messages explain what went wrong - NFR-5.5: Dark and light mode support (future) ### 2.6 Maintainability - NFR-6.1: Code documented with docstrings - NFR-6.2: DRY principle: no code duplication - NFR-6.3: Clear separation of concerns - NFR-6.4: Comprehensive logging with timestamps - NFR-6.5: Automated testing (unit + integration) ### 2.7 Compatibility - NFR-7.1: Browser support: Chrome, Firefox, Safari, Edge (latest 2 versions) - NFR-7.2: Mobile support: iOS Safari, Android Chrome - NFR-7.3: Python 3.13+ support - NFR-7.4: PostgreSQL 12+ support --- ## 3. System Requirements ### 3.1 Software Requirements - **Backend**: Django 4.x, Python 3.13+ - **Database**: PostgreSQL 12+ (or SQLite for dev) - **Web Server**: Gunicorn + Nginx - **Vector DB**: Qdrant 1.x - **Message Queue** (future): Celery + Redis ### 3.2 Hardware Requirements (Production) - **Compute**: EC2 t3.medium (2 vCPU, 4GB RAM) minimum - Development: t3.small sufficient - Production: t3.large+ with auto-scaling 2-10 instances - **Database**: RDS t4g.medium (2 vCPU, 1GB RAM) - Storage: 100GB gp3 (auto-scaling) - **Bandwidth**: 10 Mbps minimum (up to 1 Gbps for scaling) ### 3.3 Browser Requirements - Minimum: Chrome 90+, Firefox 88+, Safari 14+, Edge 90+ - WebRTC support required for video - LocalStorage and SessionStorage support - WebSocket support --- ## 4. Dependencies ### 4.1 Backend Dependencies ``` Django==4.2 djangorestframework==3.14.0 psycopg2-binary==2.9.0 python-dotenv==1.0.0 # AI & ML openai==2.16.0 qdrant-client==1.16.2 requests==2.31.0 # Transcription AssemblyAI (API, no package) # Cloud boto3==1.26.137 # Real-time pusher==3.3.1 # Video agora-rtm (Agora SDK) agora-token-builder (Token generation) # Utilities python-dateutil==2.8.2 pytz==2023.3 Pillow==10.0.0 ``` ### 4.2 Frontend Dependencies ``` Agora RTC SDK v4.24.2 (JavaScript) Bootstrap 5.3 jQuery 3.6 (optional, for DOM manipulation) ``` ### 4.3 External Services - **OpenAI API**: Embeddings (text-embedding-3-small) + LLM (GPT-4o) - **AssemblyAI API**: Speech-to-text transcription - **Qdrant Cloud**: Vector database hosting - **AWS Services**: EC2, RDS, S3, CloudWatch, Secrets Manager, ALB - **Agora**: Video/audio RTC - **Pusher**: WebSocket for chat --- ## 5. API Requirements ### 5.1 REST API Specifications - **Base URL**: `/api/` or `/` (depending on endpoint) - **Content-Type**: `application/json` - **Authentication**: Django session + optional JWT for API clients - **Response Format**: JSON with status, data, and error fields - **Pagination**: Limit + offset for list endpoints - **Versioning**: Not required initially (v1 implicit) ### 5.2 WebSocket Requirements - **Protocol**: WebSocket (Pusher-managed) - **Channels**: Per-meeting chat channels - **Message Format**: JSON - **Auto-reconnect**: Client-side retry logic ### 5.3 Rate Limiting - 100 requests/minute per user - 1000 requests/minute per IP - Q&A queries: 10 per minute per user --- ## 6. Infrastructure Requirements ### 6.1 AWS Services Required - **Compute**: EC2 (application server) - **Database**: RDS PostgreSQL (relational data) - **Storage**: S3 (recordings, documents) - **CDN**: CloudFront (static assets, S3 downloads) - **Load Balancer**: Application Load Balancer (ALB) - **Monitoring**: CloudWatch (logs, metrics, alarms) - **Secrets**: Secrets Manager (API keys, credentials) - **Networking**: VPC, Security Groups, NAT Gateway ### 6.2 Third-Party Services Required - **Qdrant Cloud**: Vector database (managed) - **OpenAI**: API access (embeddings + GPT-4o) - **AssemblyAI**: Transcription API - **Agora**: RTC infrastructure - **Pusher**: WebSocket infrastructure ### 6.3 Monitoring & Logging - CloudWatch Logs: All application logs - CloudWatch Metrics: CPU, memory, request latency - CloudWatch Alarms: Errors, latency spikes, service degradation - Application Insights: APM for performance tracking (optional) --- ## 7. Data Requirements ### 7.1 Database Schema - **Users**: id, username, email, password_hash, created_at - **MeetingRoom**: id, room_code, host_id, title, description, status, recording data, transcript data, embedding metadata - **TranscriptChunk**: id, meeting_id, chunk_text, chunk_index, embedding_vector_id - **DocumentUpload**: id, meeting_id, file_name, file_type, s3_url, raw_text - **DocumentChunk**: id, document_id, chunk_text, chunk_index, embedding_vector_id - **ConversationHistory**: id, meeting_id, user_id, user_question, assistant_response, relevant_chunks - **ChatMessage**: id, user_id, content, created_at ### 7.2 Vector Database Schema - **Collection**: meeting_transcripts - Dimension: 1536 (OpenAI text-embedding-3-small) - Distance: Cosine Similarity - Payload: meeting_id, chunk_index, text, timestamps ### 7.3 Storage (S3) Structure ``` s3://aimeet-s3-bucket/ ├── recordings/ │ ├── meeting_123_audio.webm │ └── meeting_124_audio.webm ├── documents/ │ ├── document_456.pdf │ └── document_457.txt └── transcripts/ ├── transcript_123.txt └── transcript_124.txt ``` ### 7.4 Data Retention Policy - Recordings: Keep indefinitely (archive to Glacier after 90 days) - Transcripts: Keep indefinitely - Chat messages: Keep indefinitely - Documents: Keep indefinitely - Database backups: 35-day retention - Logs: 30-day retention --- ## 8. Integration Requirements ### 8.1 External API Integrations - **OpenAI API**: Embeddings (batch and single) - **AssemblyAI API**: Transcription (async polling) - **Qdrant API**: Vector search and storage - **AWS SDK (Boto3)**: S3 operations - **Agora SDK**: Token generation and RTC - **Pusher API**: WebSocket messaging ### 8.2 Authentication Integrations - Django authentication (built-in) - Optional: OAuth2 (Google, GitHub) - future - Optional: SAML - future --- ## 9. Testing Requirements ### 9.1 Unit Testing - Models: Test data validation and relationships - Views: Test API endpoints with mocks - Utilities: Test embedding, chunking, RAG functions - Target: >80% code coverage ### 9.2 Integration Testing - End-to-end meeting flow - Recording upload and transcription - RAG pipeline (chunk → embed → search → query) - Document upload and search ### 9.3 Performance Testing - Load test: 100 concurrent users - Transcription processing time - Q&A response latency - Vector search speed ### 9.4 Security Testing - OWASP Top 10 vulnerability scanning - SQL injection attempts - XSS payloads - CSRF validation --- ## 10. Documentation Requirements ### 10.1 Code Documentation - Docstrings for all functions/methods - Inline comments for complex logic - README.md for setup and usage - API documentation (Swagger/OpenAPI) ### 10.2 User Documentation - Quick start guide - Feature tutorials - FAQ - Troubleshooting guide ### 10.3 System Documentation - ARCHITECTURE.md (system design) - DESIGN.md (diagrams and flows) - REQUIREMENTS.md (this document) - Deployment guide --- ## 11. Future Enhancements ### 11.1 Planned Features - Speaker diarization (identify who said what) - Automatic action item detection - Topic summaries and key moments - Calendar integration - Role-based access control - Multi-language support - Slack/Teams integration - Custom embedding models ### 11.2 Optimization Opportunities - Redis caching layer (conversation history, user sessions) - Celery background jobs (transcription polling, document processing) - WebRTC data channels (peer-to-peer communication) - Progressive Web App (PWA) capabilities --- ## 12. Success Criteria ### 12.1 Functional Success - All FR requirements fully implemented - All tests passing - No critical bugs in production ### 12.2 Performance Success - Page load time <3 seconds (95th percentile) - Q&A response time <4 seconds (95th percentile) - 99.5% uptime maintained - <1 second vector search latency ### 12.3 User Success - User registration completion rate >90% - Meeting creation to Q&A within 5 minutes - >80% of users try Q&A feature within first week ### 12.4 Business Success - Support 1000+ concurrent users - Cost <$1000/month at 1000-user scale - Document uploaded for >50% of meetings - Sticky notes used in >40% of meetings --- ## 13. Constraints & Assumptions ### 13.1 Constraints - OpenAI API rate limits (depends on plan) - AssemblyAI transcription queue - AWS service quotas - Budget limitations for cloud services ### 13.2 Assumptions - Users have stable internet connection (>2 Mbps) - Meetings typically 30 minutes to 2 hours - Transcripts typically 5K-20K tokens - Users have modern browsers (2020+) - Organizations want to keep data private (not shared) --- ## 14. Compliance & Standards ### 14.1 Security Standards - SSL/TLS 1.3 for encryption - OWASP Top 10 compliance - GDPR compliance (user data protection) - HIPAA compliance (if health data involved) - future ### 14.2 Coding Standards - PEP 8 for Python code style - Django best practices - RESTful API design - Semantic versioning for releases ### 14.3 Accessibility Standards - WCAG 2.1 Level AA compliance - Keyboard navigation support - Screen reader compatibility - Color contrast ratios >4.5:1