Spaces:
Runtime error
Runtime error
AIMeet - Requirements Document
1. Functional Requirements
1.1 User Management
- FR-1.1: Users must be able to register with username, email, and password
- FR-1.2: Users must be able to log in with credentials
- FR-1.3: Users must be able to log out
- FR-1.4: Users must be able to reset password via email
- FR-1.5: User profiles must store name, email, profile picture
1.2 Meeting Management
- FR-2.1: Users can create a meeting with title, description, and max participants
- FR-2.2: System generates unique shareable room code for each meeting
- FR-2.3: Users can join meetings using room code
- FR-2.4: Meeting host can end the meeting
- FR-2.5: Meeting state tracks: active, ended, archived
- FR-2.6: Users can view list of their meetings (hosted and joined)
- FR-2.7: Users can delete or archive completed meetings
1.3 Real-Time Video & Audio
- FR-3.1: Video streaming using Agora RTC SDK
- FR-3.2: Audio streaming with VP8 codec
- FR-3.3: Dynamic bitrate adjustment based on network
- FR-3.4: Participants can mute/unmute audio and video
- FR-3.5: Host can kick participants
- FR-3.6: Screen sharing capability (optional, future)
1.4 Recording
- FR-4.1: Audio is automatically recorded during meeting using MediaRecorder
- FR-4.2: Recording saved as WebM format locally
- FR-4.3: Users can upload recording after meeting
- FR-4.4: Recording uploaded to AWS S3
- FR-4.5: System stores recording metadata (size, duration, upload time)
- FR-4.6: Presigned URLs generated for private S3 access
1.5 Transcription
- FR-5.1: Uploaded recordings sent to AssemblyAI for transcription
- FR-5.2: System polls AssemblyAI for transcription status
- FR-5.3: Completed transcripts saved to database
- FR-5.4: Transcript status tracked: not_started, processing, completed, failed
- FR-5.5: Transcript linked to meeting record
1.6 Knowledge Processing (RAG)
- FR-6.1: Users can trigger "Prepare for Search" to process transcript
- FR-6.2: System chunks transcript using recursive character splitting (500 tokens, 50 overlap)
- FR-6.3: Chunks stored in TranscriptChunk model
- FR-6.4: Chunks embedded using OpenAI text-embedding-3-small
- FR-6.5: Embeddings stored in Qdrant vector database
- FR-6.6: Idempotent processing: check timestamps to avoid reprocessing
1.7 Question Answering (RAG Query)
- FR-7.1: Users can ask questions about meeting content
- FR-7.2: Question embedded using same OpenAI model
- FR-7.3: System searches Qdrant for top-5 similar chunks
- FR-7.4: Conversation history retrieved for context
- FR-7.5: GPT-4o called with context + history + question
- FR-7.6: Response generated and displayed to user
- FR-7.7: Q&A turn saved to ConversationHistory
1.8 Meeting Preparation (Sticky Notes)
- FR-8.1: When creating new meeting, system suggests related past meetings
- FR-8.2: Suggestions based on meeting title/agenda keywords
- FR-8.3: Shows what was discussed about same topics before
- FR-8.4: Users can expand sticky notes to see full context
- FR-8.5: Helps prevent duplicate discussions
1.9 Document Management
- FR-9.1: Users can upload documents (PDF, DOCX, TXT)
- FR-9.2: Documents stored in S3
- FR-9.3: Document text extracted and stored
- FR-9.4: Documents chunked same way as transcripts
- FR-9.5: Document chunks embedded and stored in Qdrant
- FR-9.6: Users can view list of documents per meeting
- FR-9.7: Users can delete documents
1.10 Unified Search
- FR-10.1: Questions search both transcripts and documents
- FR-10.2: Results include source type (meeting transcript vs document)
- FR-10.3: Search results show relevance scores
- FR-10.4: Source metadata (timestamps, document names) included
1.11 Chat
- FR-11.1: Real-time chat during meetings using WebSocket
- FR-11.2: Chat messages saved to database
- FR-11.3: Users can view chat history
- FR-11.4: Message timestamps tracked
- FR-11.5: Messages linked to user and meeting
1.12 Reporting & Analytics (Future)
- FR-12.1: Meeting duration and participant count
- FR-12.2: Transcript statistics (word count, duration)
- FR-12.3: Q&A usage statistics
- FR-12.4: Most discussed topics across meetings
2. Non-Functional Requirements
2.1 Performance
- NFR-1.1: Q&A response time: <4 seconds (including LLM latency)
- NFR-1.2: Vector search latency: <500ms
- NFR-1.3: API response time: <1 second for non-AI endpoints
- NFR-1.4: Page load time: <3 seconds
- NFR-1.5: Concurrent users: 100+ with auto-scaling
- NFR-1.6: Transcript processing: <1 minute for typical meeting
2.2 Scalability
- NFR-2.1: Horizontal scaling via EC2 Auto Scaling Groups
- NFR-2.2: Database: RDS with read replicas
- NFR-2.3: S3 handles unlimited storage
- NFR-2.4: Qdrant Cloud manages vector scaling
- NFR-2.5: Support growth from 10 to 10,000 users
2.3 Reliability
- NFR-3.1: 99.5% uptime SLA
- NFR-3.2: Automated daily database backups
- NFR-3.3: Multi-AZ RDS for failover
- NFR-3.4: CloudFront CDN for static assets
- NFR-3.5: Graceful error handling and user feedback
2.4 Security
- NFR-4.1: HTTPS for all communications
- NFR-4.2: Password hashing with bcrypt
- NFR-4.3: JWT tokens for API authentication
- NFR-4.4: SQL injection protection via ORM
- NFR-4.5: XSS protection via template escaping
- NFR-4.6: CSRF protection on forms
- NFR-4.7: S3 encryption at rest (AES-256)
- NFR-4.8: Database encryption (KMS)
- NFR-4.9: API keys in Secrets Manager (no hardcoding)
- NFR-4.10: Private S3 access via presigned URLs
- NFR-4.11: Private subnet for RDS (no public IP)
- NFR-4.12: Rate limiting: 100 requests/minute per user
2.5 Usability
- NFR-5.1: Responsive design for mobile (375px+) and desktop
- NFR-5.2: Accessibility: WCAG 2.1 Level AA compliance
- NFR-5.3: Intuitive UI with clear navigation
- NFR-5.4: Error messages explain what went wrong
- NFR-5.5: Dark and light mode support (future)
2.6 Maintainability
- NFR-6.1: Code documented with docstrings
- NFR-6.2: DRY principle: no code duplication
- NFR-6.3: Clear separation of concerns
- NFR-6.4: Comprehensive logging with timestamps
- NFR-6.5: Automated testing (unit + integration)
2.7 Compatibility
- NFR-7.1: Browser support: Chrome, Firefox, Safari, Edge (latest 2 versions)
- NFR-7.2: Mobile support: iOS Safari, Android Chrome
- NFR-7.3: Python 3.13+ support
- NFR-7.4: PostgreSQL 12+ support
3. System Requirements
3.1 Software Requirements
- Backend: Django 4.x, Python 3.13+
- Database: PostgreSQL 12+ (or SQLite for dev)
- Web Server: Gunicorn + Nginx
- Vector DB: Qdrant 1.x
- Message Queue (future): Celery + Redis
3.2 Hardware Requirements (Production)
- Compute: EC2 t3.medium (2 vCPU, 4GB RAM) minimum
- Development: t3.small sufficient
- Production: t3.large+ with auto-scaling 2-10 instances
- Database: RDS t4g.medium (2 vCPU, 1GB RAM)
- Storage: 100GB gp3 (auto-scaling)
- Bandwidth: 10 Mbps minimum (up to 1 Gbps for scaling)
3.3 Browser Requirements
- Minimum: Chrome 90+, Firefox 88+, Safari 14+, Edge 90+
- WebRTC support required for video
- LocalStorage and SessionStorage support
- WebSocket support
4. Dependencies
4.1 Backend Dependencies
Django==4.2
djangorestframework==3.14.0
psycopg2-binary==2.9.0
python-dotenv==1.0.0
# AI & ML
openai==2.16.0
qdrant-client==1.16.2
requests==2.31.0
# Transcription
AssemblyAI (API, no package)
# Cloud
boto3==1.26.137
# Real-time
pusher==3.3.1
# Video
agora-rtm (Agora SDK)
agora-token-builder (Token generation)
# Utilities
python-dateutil==2.8.2
pytz==2023.3
Pillow==10.0.0
4.2 Frontend Dependencies
Agora RTC SDK v4.24.2 (JavaScript)
Bootstrap 5.3
jQuery 3.6 (optional, for DOM manipulation)
4.3 External Services
- OpenAI API: Embeddings (text-embedding-3-small) + LLM (GPT-4o)
- AssemblyAI API: Speech-to-text transcription
- Qdrant Cloud: Vector database hosting
- AWS Services: EC2, RDS, S3, CloudWatch, Secrets Manager, ALB
- Agora: Video/audio RTC
- Pusher: WebSocket for chat
5. API Requirements
5.1 REST API Specifications
- Base URL:
/api/or/(depending on endpoint) - Content-Type:
application/json - Authentication: Django session + optional JWT for API clients
- Response Format: JSON with status, data, and error fields
- Pagination: Limit + offset for list endpoints
- Versioning: Not required initially (v1 implicit)
5.2 WebSocket Requirements
- Protocol: WebSocket (Pusher-managed)
- Channels: Per-meeting chat channels
- Message Format: JSON
- Auto-reconnect: Client-side retry logic
5.3 Rate Limiting
- 100 requests/minute per user
- 1000 requests/minute per IP
- Q&A queries: 10 per minute per user
6. Infrastructure Requirements
6.1 AWS Services Required
- Compute: EC2 (application server)
- Database: RDS PostgreSQL (relational data)
- Storage: S3 (recordings, documents)
- CDN: CloudFront (static assets, S3 downloads)
- Load Balancer: Application Load Balancer (ALB)
- Monitoring: CloudWatch (logs, metrics, alarms)
- Secrets: Secrets Manager (API keys, credentials)
- Networking: VPC, Security Groups, NAT Gateway
6.2 Third-Party Services Required
- Qdrant Cloud: Vector database (managed)
- OpenAI: API access (embeddings + GPT-4o)
- AssemblyAI: Transcription API
- Agora: RTC infrastructure
- Pusher: WebSocket infrastructure
6.3 Monitoring & Logging
- CloudWatch Logs: All application logs
- CloudWatch Metrics: CPU, memory, request latency
- CloudWatch Alarms: Errors, latency spikes, service degradation
- Application Insights: APM for performance tracking (optional)
7. Data Requirements
7.1 Database Schema
- Users: id, username, email, password_hash, created_at
- MeetingRoom: id, room_code, host_id, title, description, status, recording data, transcript data, embedding metadata
- TranscriptChunk: id, meeting_id, chunk_text, chunk_index, embedding_vector_id
- DocumentUpload: id, meeting_id, file_name, file_type, s3_url, raw_text
- DocumentChunk: id, document_id, chunk_text, chunk_index, embedding_vector_id
- ConversationHistory: id, meeting_id, user_id, user_question, assistant_response, relevant_chunks
- ChatMessage: id, user_id, content, created_at
7.2 Vector Database Schema
- Collection: meeting_transcripts
- Dimension: 1536 (OpenAI text-embedding-3-small)
- Distance: Cosine Similarity
- Payload: meeting_id, chunk_index, text, timestamps
7.3 Storage (S3) Structure
s3://aimeet-s3-bucket/
βββ recordings/
β βββ meeting_123_audio.webm
β βββ meeting_124_audio.webm
βββ documents/
β βββ document_456.pdf
β βββ document_457.txt
βββ transcripts/
βββ transcript_123.txt
βββ transcript_124.txt
7.4 Data Retention Policy
- Recordings: Keep indefinitely (archive to Glacier after 90 days)
- Transcripts: Keep indefinitely
- Chat messages: Keep indefinitely
- Documents: Keep indefinitely
- Database backups: 35-day retention
- Logs: 30-day retention
8. Integration Requirements
8.1 External API Integrations
- OpenAI API: Embeddings (batch and single)
- AssemblyAI API: Transcription (async polling)
- Qdrant API: Vector search and storage
- AWS SDK (Boto3): S3 operations
- Agora SDK: Token generation and RTC
- Pusher API: WebSocket messaging
8.2 Authentication Integrations
- Django authentication (built-in)
- Optional: OAuth2 (Google, GitHub) - future
- Optional: SAML - future
9. Testing Requirements
9.1 Unit Testing
- Models: Test data validation and relationships
- Views: Test API endpoints with mocks
- Utilities: Test embedding, chunking, RAG functions
- Target: >80% code coverage
9.2 Integration Testing
- End-to-end meeting flow
- Recording upload and transcription
- RAG pipeline (chunk β embed β search β query)
- Document upload and search
9.3 Performance Testing
- Load test: 100 concurrent users
- Transcription processing time
- Q&A response latency
- Vector search speed
9.4 Security Testing
- OWASP Top 10 vulnerability scanning
- SQL injection attempts
- XSS payloads
- CSRF validation
10. Documentation Requirements
10.1 Code Documentation
- Docstrings for all functions/methods
- Inline comments for complex logic
- README.md for setup and usage
- API documentation (Swagger/OpenAPI)
10.2 User Documentation
- Quick start guide
- Feature tutorials
- FAQ
- Troubleshooting guide
10.3 System Documentation
- ARCHITECTURE.md (system design)
- DESIGN.md (diagrams and flows)
- REQUIREMENTS.md (this document)
- Deployment guide
11. Future Enhancements
11.1 Planned Features
- Speaker diarization (identify who said what)
- Automatic action item detection
- Topic summaries and key moments
- Calendar integration
- Role-based access control
- Multi-language support
- Slack/Teams integration
- Custom embedding models
11.2 Optimization Opportunities
- Redis caching layer (conversation history, user sessions)
- Celery background jobs (transcription polling, document processing)
- WebRTC data channels (peer-to-peer communication)
- Progressive Web App (PWA) capabilities
12. Success Criteria
12.1 Functional Success
- All FR requirements fully implemented
- All tests passing
- No critical bugs in production
12.2 Performance Success
- Page load time <3 seconds (95th percentile)
- Q&A response time <4 seconds (95th percentile)
- 99.5% uptime maintained
- <1 second vector search latency
12.3 User Success
- User registration completion rate >90%
- Meeting creation to Q&A within 5 minutes
80% of users try Q&A feature within first week
12.4 Business Success
- Support 1000+ concurrent users
- Cost <$1000/month at 1000-user scale
- Document uploaded for >50% of meetings
- Sticky notes used in >40% of meetings
13. Constraints & Assumptions
13.1 Constraints
- OpenAI API rate limits (depends on plan)
- AssemblyAI transcription queue
- AWS service quotas
- Budget limitations for cloud services
13.2 Assumptions
- Users have stable internet connection (>2 Mbps)
- Meetings typically 30 minutes to 2 hours
- Transcripts typically 5K-20K tokens
- Users have modern browsers (2020+)
- Organizations want to keep data private (not shared)
14. Compliance & Standards
14.1 Security Standards
- SSL/TLS 1.3 for encryption
- OWASP Top 10 compliance
- GDPR compliance (user data protection)
- HIPAA compliance (if health data involved) - future
14.2 Coding Standards
- PEP 8 for Python code style
- Django best practices
- RESTful API design
- Semantic versioning for releases
14.3 Accessibility Standards
- WCAG 2.1 Level AA compliance
- Keyboard navigation support
- Screen reader compatibility
- Color contrast ratios >4.5:1