Aimeetapplication / REQUIREMENTS.md
prashantdubeypng
Deploy Aimeet to HuggingFace Spaces
4db0a21

AIMeet - Requirements Document

1. Functional Requirements

1.1 User Management

  • FR-1.1: Users must be able to register with username, email, and password
  • FR-1.2: Users must be able to log in with credentials
  • FR-1.3: Users must be able to log out
  • FR-1.4: Users must be able to reset password via email
  • FR-1.5: User profiles must store name, email, profile picture

1.2 Meeting Management

  • FR-2.1: Users can create a meeting with title, description, and max participants
  • FR-2.2: System generates unique shareable room code for each meeting
  • FR-2.3: Users can join meetings using room code
  • FR-2.4: Meeting host can end the meeting
  • FR-2.5: Meeting state tracks: active, ended, archived
  • FR-2.6: Users can view list of their meetings (hosted and joined)
  • FR-2.7: Users can delete or archive completed meetings

1.3 Real-Time Video & Audio

  • FR-3.1: Video streaming using Agora RTC SDK
  • FR-3.2: Audio streaming with VP8 codec
  • FR-3.3: Dynamic bitrate adjustment based on network
  • FR-3.4: Participants can mute/unmute audio and video
  • FR-3.5: Host can kick participants
  • FR-3.6: Screen sharing capability (optional, future)

1.4 Recording

  • FR-4.1: Audio is automatically recorded during meeting using MediaRecorder
  • FR-4.2: Recording saved as WebM format locally
  • FR-4.3: Users can upload recording after meeting
  • FR-4.4: Recording uploaded to AWS S3
  • FR-4.5: System stores recording metadata (size, duration, upload time)
  • FR-4.6: Presigned URLs generated for private S3 access

1.5 Transcription

  • FR-5.1: Uploaded recordings sent to AssemblyAI for transcription
  • FR-5.2: System polls AssemblyAI for transcription status
  • FR-5.3: Completed transcripts saved to database
  • FR-5.4: Transcript status tracked: not_started, processing, completed, failed
  • FR-5.5: Transcript linked to meeting record

1.6 Knowledge Processing (RAG)

  • FR-6.1: Users can trigger "Prepare for Search" to process transcript
  • FR-6.2: System chunks transcript using recursive character splitting (500 tokens, 50 overlap)
  • FR-6.3: Chunks stored in TranscriptChunk model
  • FR-6.4: Chunks embedded using OpenAI text-embedding-3-small
  • FR-6.5: Embeddings stored in Qdrant vector database
  • FR-6.6: Idempotent processing: check timestamps to avoid reprocessing

1.7 Question Answering (RAG Query)

  • FR-7.1: Users can ask questions about meeting content
  • FR-7.2: Question embedded using same OpenAI model
  • FR-7.3: System searches Qdrant for top-5 similar chunks
  • FR-7.4: Conversation history retrieved for context
  • FR-7.5: GPT-4o called with context + history + question
  • FR-7.6: Response generated and displayed to user
  • FR-7.7: Q&A turn saved to ConversationHistory

1.8 Meeting Preparation (Sticky Notes)

  • FR-8.1: When creating new meeting, system suggests related past meetings
  • FR-8.2: Suggestions based on meeting title/agenda keywords
  • FR-8.3: Shows what was discussed about same topics before
  • FR-8.4: Users can expand sticky notes to see full context
  • FR-8.5: Helps prevent duplicate discussions

1.9 Document Management

  • FR-9.1: Users can upload documents (PDF, DOCX, TXT)
  • FR-9.2: Documents stored in S3
  • FR-9.3: Document text extracted and stored
  • FR-9.4: Documents chunked same way as transcripts
  • FR-9.5: Document chunks embedded and stored in Qdrant
  • FR-9.6: Users can view list of documents per meeting
  • FR-9.7: Users can delete documents

1.10 Unified Search

  • FR-10.1: Questions search both transcripts and documents
  • FR-10.2: Results include source type (meeting transcript vs document)
  • FR-10.3: Search results show relevance scores
  • FR-10.4: Source metadata (timestamps, document names) included

1.11 Chat

  • FR-11.1: Real-time chat during meetings using WebSocket
  • FR-11.2: Chat messages saved to database
  • FR-11.3: Users can view chat history
  • FR-11.4: Message timestamps tracked
  • FR-11.5: Messages linked to user and meeting

1.12 Reporting & Analytics (Future)

  • FR-12.1: Meeting duration and participant count
  • FR-12.2: Transcript statistics (word count, duration)
  • FR-12.3: Q&A usage statistics
  • FR-12.4: Most discussed topics across meetings

2. Non-Functional Requirements

2.1 Performance

  • NFR-1.1: Q&A response time: <4 seconds (including LLM latency)
  • NFR-1.2: Vector search latency: <500ms
  • NFR-1.3: API response time: <1 second for non-AI endpoints
  • NFR-1.4: Page load time: <3 seconds
  • NFR-1.5: Concurrent users: 100+ with auto-scaling
  • NFR-1.6: Transcript processing: <1 minute for typical meeting

2.2 Scalability

  • NFR-2.1: Horizontal scaling via EC2 Auto Scaling Groups
  • NFR-2.2: Database: RDS with read replicas
  • NFR-2.3: S3 handles unlimited storage
  • NFR-2.4: Qdrant Cloud manages vector scaling
  • NFR-2.5: Support growth from 10 to 10,000 users

2.3 Reliability

  • NFR-3.1: 99.5% uptime SLA
  • NFR-3.2: Automated daily database backups
  • NFR-3.3: Multi-AZ RDS for failover
  • NFR-3.4: CloudFront CDN for static assets
  • NFR-3.5: Graceful error handling and user feedback

2.4 Security

  • NFR-4.1: HTTPS for all communications
  • NFR-4.2: Password hashing with bcrypt
  • NFR-4.3: JWT tokens for API authentication
  • NFR-4.4: SQL injection protection via ORM
  • NFR-4.5: XSS protection via template escaping
  • NFR-4.6: CSRF protection on forms
  • NFR-4.7: S3 encryption at rest (AES-256)
  • NFR-4.8: Database encryption (KMS)
  • NFR-4.9: API keys in Secrets Manager (no hardcoding)
  • NFR-4.10: Private S3 access via presigned URLs
  • NFR-4.11: Private subnet for RDS (no public IP)
  • NFR-4.12: Rate limiting: 100 requests/minute per user

2.5 Usability

  • NFR-5.1: Responsive design for mobile (375px+) and desktop
  • NFR-5.2: Accessibility: WCAG 2.1 Level AA compliance
  • NFR-5.3: Intuitive UI with clear navigation
  • NFR-5.4: Error messages explain what went wrong
  • NFR-5.5: Dark and light mode support (future)

2.6 Maintainability

  • NFR-6.1: Code documented with docstrings
  • NFR-6.2: DRY principle: no code duplication
  • NFR-6.3: Clear separation of concerns
  • NFR-6.4: Comprehensive logging with timestamps
  • NFR-6.5: Automated testing (unit + integration)

2.7 Compatibility

  • NFR-7.1: Browser support: Chrome, Firefox, Safari, Edge (latest 2 versions)
  • NFR-7.2: Mobile support: iOS Safari, Android Chrome
  • NFR-7.3: Python 3.13+ support
  • NFR-7.4: PostgreSQL 12+ support

3. System Requirements

3.1 Software Requirements

  • Backend: Django 4.x, Python 3.13+
  • Database: PostgreSQL 12+ (or SQLite for dev)
  • Web Server: Gunicorn + Nginx
  • Vector DB: Qdrant 1.x
  • Message Queue (future): Celery + Redis

3.2 Hardware Requirements (Production)

  • Compute: EC2 t3.medium (2 vCPU, 4GB RAM) minimum
    • Development: t3.small sufficient
    • Production: t3.large+ with auto-scaling 2-10 instances
  • Database: RDS t4g.medium (2 vCPU, 1GB RAM)
    • Storage: 100GB gp3 (auto-scaling)
  • Bandwidth: 10 Mbps minimum (up to 1 Gbps for scaling)

3.3 Browser Requirements

  • Minimum: Chrome 90+, Firefox 88+, Safari 14+, Edge 90+
  • WebRTC support required for video
  • LocalStorage and SessionStorage support
  • WebSocket support

4. Dependencies

4.1 Backend Dependencies

Django==4.2
djangorestframework==3.14.0
psycopg2-binary==2.9.0
python-dotenv==1.0.0

# AI & ML
openai==2.16.0
qdrant-client==1.16.2
requests==2.31.0

# Transcription
AssemblyAI (API, no package)

# Cloud
boto3==1.26.137

# Real-time
pusher==3.3.1

# Video
agora-rtm (Agora SDK)
agora-token-builder (Token generation)

# Utilities
python-dateutil==2.8.2
pytz==2023.3
Pillow==10.0.0

4.2 Frontend Dependencies

Agora RTC SDK v4.24.2 (JavaScript)
Bootstrap 5.3
jQuery 3.6 (optional, for DOM manipulation)

4.3 External Services

  • OpenAI API: Embeddings (text-embedding-3-small) + LLM (GPT-4o)
  • AssemblyAI API: Speech-to-text transcription
  • Qdrant Cloud: Vector database hosting
  • AWS Services: EC2, RDS, S3, CloudWatch, Secrets Manager, ALB
  • Agora: Video/audio RTC
  • Pusher: WebSocket for chat

5. API Requirements

5.1 REST API Specifications

  • Base URL: /api/ or / (depending on endpoint)
  • Content-Type: application/json
  • Authentication: Django session + optional JWT for API clients
  • Response Format: JSON with status, data, and error fields
  • Pagination: Limit + offset for list endpoints
  • Versioning: Not required initially (v1 implicit)

5.2 WebSocket Requirements

  • Protocol: WebSocket (Pusher-managed)
  • Channels: Per-meeting chat channels
  • Message Format: JSON
  • Auto-reconnect: Client-side retry logic

5.3 Rate Limiting

  • 100 requests/minute per user
  • 1000 requests/minute per IP
  • Q&A queries: 10 per minute per user

6. Infrastructure Requirements

6.1 AWS Services Required

  • Compute: EC2 (application server)
  • Database: RDS PostgreSQL (relational data)
  • Storage: S3 (recordings, documents)
  • CDN: CloudFront (static assets, S3 downloads)
  • Load Balancer: Application Load Balancer (ALB)
  • Monitoring: CloudWatch (logs, metrics, alarms)
  • Secrets: Secrets Manager (API keys, credentials)
  • Networking: VPC, Security Groups, NAT Gateway

6.2 Third-Party Services Required

  • Qdrant Cloud: Vector database (managed)
  • OpenAI: API access (embeddings + GPT-4o)
  • AssemblyAI: Transcription API
  • Agora: RTC infrastructure
  • Pusher: WebSocket infrastructure

6.3 Monitoring & Logging

  • CloudWatch Logs: All application logs
  • CloudWatch Metrics: CPU, memory, request latency
  • CloudWatch Alarms: Errors, latency spikes, service degradation
  • Application Insights: APM for performance tracking (optional)

7. Data Requirements

7.1 Database Schema

  • Users: id, username, email, password_hash, created_at
  • MeetingRoom: id, room_code, host_id, title, description, status, recording data, transcript data, embedding metadata
  • TranscriptChunk: id, meeting_id, chunk_text, chunk_index, embedding_vector_id
  • DocumentUpload: id, meeting_id, file_name, file_type, s3_url, raw_text
  • DocumentChunk: id, document_id, chunk_text, chunk_index, embedding_vector_id
  • ConversationHistory: id, meeting_id, user_id, user_question, assistant_response, relevant_chunks
  • ChatMessage: id, user_id, content, created_at

7.2 Vector Database Schema

  • Collection: meeting_transcripts
    • Dimension: 1536 (OpenAI text-embedding-3-small)
    • Distance: Cosine Similarity
    • Payload: meeting_id, chunk_index, text, timestamps

7.3 Storage (S3) Structure

s3://aimeet-s3-bucket/
β”œβ”€β”€ recordings/
β”‚   β”œβ”€β”€ meeting_123_audio.webm
β”‚   └── meeting_124_audio.webm
β”œβ”€β”€ documents/
β”‚   β”œβ”€β”€ document_456.pdf
β”‚   └── document_457.txt
└── transcripts/
    β”œβ”€β”€ transcript_123.txt
    └── transcript_124.txt

7.4 Data Retention Policy

  • Recordings: Keep indefinitely (archive to Glacier after 90 days)
  • Transcripts: Keep indefinitely
  • Chat messages: Keep indefinitely
  • Documents: Keep indefinitely
  • Database backups: 35-day retention
  • Logs: 30-day retention

8. Integration Requirements

8.1 External API Integrations

  • OpenAI API: Embeddings (batch and single)
  • AssemblyAI API: Transcription (async polling)
  • Qdrant API: Vector search and storage
  • AWS SDK (Boto3): S3 operations
  • Agora SDK: Token generation and RTC
  • Pusher API: WebSocket messaging

8.2 Authentication Integrations

  • Django authentication (built-in)
  • Optional: OAuth2 (Google, GitHub) - future
  • Optional: SAML - future

9. Testing Requirements

9.1 Unit Testing

  • Models: Test data validation and relationships
  • Views: Test API endpoints with mocks
  • Utilities: Test embedding, chunking, RAG functions
  • Target: >80% code coverage

9.2 Integration Testing

  • End-to-end meeting flow
  • Recording upload and transcription
  • RAG pipeline (chunk β†’ embed β†’ search β†’ query)
  • Document upload and search

9.3 Performance Testing

  • Load test: 100 concurrent users
  • Transcription processing time
  • Q&A response latency
  • Vector search speed

9.4 Security Testing

  • OWASP Top 10 vulnerability scanning
  • SQL injection attempts
  • XSS payloads
  • CSRF validation

10. Documentation Requirements

10.1 Code Documentation

  • Docstrings for all functions/methods
  • Inline comments for complex logic
  • README.md for setup and usage
  • API documentation (Swagger/OpenAPI)

10.2 User Documentation

  • Quick start guide
  • Feature tutorials
  • FAQ
  • Troubleshooting guide

10.3 System Documentation

  • ARCHITECTURE.md (system design)
  • DESIGN.md (diagrams and flows)
  • REQUIREMENTS.md (this document)
  • Deployment guide

11. Future Enhancements

11.1 Planned Features

  • Speaker diarization (identify who said what)
  • Automatic action item detection
  • Topic summaries and key moments
  • Calendar integration
  • Role-based access control
  • Multi-language support
  • Slack/Teams integration
  • Custom embedding models

11.2 Optimization Opportunities

  • Redis caching layer (conversation history, user sessions)
  • Celery background jobs (transcription polling, document processing)
  • WebRTC data channels (peer-to-peer communication)
  • Progressive Web App (PWA) capabilities

12. Success Criteria

12.1 Functional Success

  • All FR requirements fully implemented
  • All tests passing
  • No critical bugs in production

12.2 Performance Success

  • Page load time <3 seconds (95th percentile)
  • Q&A response time <4 seconds (95th percentile)
  • 99.5% uptime maintained
  • <1 second vector search latency

12.3 User Success

  • User registration completion rate >90%
  • Meeting creation to Q&A within 5 minutes
  • 80% of users try Q&A feature within first week

12.4 Business Success

  • Support 1000+ concurrent users
  • Cost <$1000/month at 1000-user scale
  • Document uploaded for >50% of meetings
  • Sticky notes used in >40% of meetings

13. Constraints & Assumptions

13.1 Constraints

  • OpenAI API rate limits (depends on plan)
  • AssemblyAI transcription queue
  • AWS service quotas
  • Budget limitations for cloud services

13.2 Assumptions

  • Users have stable internet connection (>2 Mbps)
  • Meetings typically 30 minutes to 2 hours
  • Transcripts typically 5K-20K tokens
  • Users have modern browsers (2020+)
  • Organizations want to keep data private (not shared)

14. Compliance & Standards

14.1 Security Standards

  • SSL/TLS 1.3 for encryption
  • OWASP Top 10 compliance
  • GDPR compliance (user data protection)
  • HIPAA compliance (if health data involved) - future

14.2 Coding Standards

  • PEP 8 for Python code style
  • Django best practices
  • RESTful API design
  • Semantic versioning for releases

14.3 Accessibility Standards

  • WCAG 2.1 Level AA compliance
  • Keyboard navigation support
  • Screen reader compatibility
  • Color contrast ratios >4.5:1