Spaces:

prashantbackenddev
/

Aimeetapplication

Runtime error

App Files Files Community

Aimeetapplication / REQUIREMENTS.md

prashantdubeypng

Deploy Aimeet to HuggingFace Spaces

4db0a21 3 months ago

preview code

raw

history blame contribute delete

14.9 kB

AIMeet - Requirements Document

1. Functional Requirements

1.1 User Management

FR-1.1: Users must be able to register with username, email, and password
FR-1.2: Users must be able to log in with credentials
FR-1.3: Users must be able to log out
FR-1.4: Users must be able to reset password via email
FR-1.5: User profiles must store name, email, profile picture

1.2 Meeting Management

FR-2.1: Users can create a meeting with title, description, and max participants
FR-2.2: System generates unique shareable room code for each meeting
FR-2.3: Users can join meetings using room code
FR-2.4: Meeting host can end the meeting
FR-2.5: Meeting state tracks: active, ended, archived
FR-2.6: Users can view list of their meetings (hosted and joined)
FR-2.7: Users can delete or archive completed meetings

1.3 Real-Time Video & Audio

FR-3.1: Video streaming using Agora RTC SDK
FR-3.2: Audio streaming with VP8 codec
FR-3.3: Dynamic bitrate adjustment based on network
FR-3.4: Participants can mute/unmute audio and video
FR-3.5: Host can kick participants
FR-3.6: Screen sharing capability (optional, future)

1.4 Recording

FR-4.1: Audio is automatically recorded during meeting using MediaRecorder
FR-4.2: Recording saved as WebM format locally
FR-4.3: Users can upload recording after meeting
FR-4.4: Recording uploaded to AWS S3
FR-4.5: System stores recording metadata (size, duration, upload time)
FR-4.6: Presigned URLs generated for private S3 access

1.5 Transcription

FR-5.1: Uploaded recordings sent to AssemblyAI for transcription
FR-5.2: System polls AssemblyAI for transcription status
FR-5.3: Completed transcripts saved to database
FR-5.4: Transcript status tracked: not_started, processing, completed, failed
FR-5.5: Transcript linked to meeting record

1.6 Knowledge Processing (RAG)

FR-6.1: Users can trigger "Prepare for Search" to process transcript
FR-6.2: System chunks transcript using recursive character splitting (500 tokens, 50 overlap)
FR-6.3: Chunks stored in TranscriptChunk model
FR-6.4: Chunks embedded using OpenAI text-embedding-3-small
FR-6.5: Embeddings stored in Qdrant vector database
FR-6.6: Idempotent processing: check timestamps to avoid reprocessing

1.7 Question Answering (RAG Query)

FR-7.1: Users can ask questions about meeting content
FR-7.2: Question embedded using same OpenAI model
FR-7.3: System searches Qdrant for top-5 similar chunks
FR-7.4: Conversation history retrieved for context
FR-7.5: GPT-4o called with context + history + question
FR-7.6: Response generated and displayed to user
FR-7.7: Q&A turn saved to ConversationHistory

1.8 Meeting Preparation (Sticky Notes)

FR-8.1: When creating new meeting, system suggests related past meetings
FR-8.2: Suggestions based on meeting title/agenda keywords
FR-8.3: Shows what was discussed about same topics before
FR-8.4: Users can expand sticky notes to see full context
FR-8.5: Helps prevent duplicate discussions

1.9 Document Management

FR-9.1: Users can upload documents (PDF, DOCX, TXT)
FR-9.2: Documents stored in S3
FR-9.3: Document text extracted and stored
FR-9.4: Documents chunked same way as transcripts
FR-9.5: Document chunks embedded and stored in Qdrant
FR-9.6: Users can view list of documents per meeting
FR-9.7: Users can delete documents

1.10 Unified Search

FR-10.1: Questions search both transcripts and documents
FR-10.2: Results include source type (meeting transcript vs document)
FR-10.3: Search results show relevance scores
FR-10.4: Source metadata (timestamps, document names) included

1.11 Chat

FR-11.1: Real-time chat during meetings using WebSocket
FR-11.2: Chat messages saved to database
FR-11.3: Users can view chat history
FR-11.4: Message timestamps tracked
FR-11.5: Messages linked to user and meeting

1.12 Reporting & Analytics (Future)

FR-12.1: Meeting duration and participant count
FR-12.2: Transcript statistics (word count, duration)
FR-12.3: Q&A usage statistics
FR-12.4: Most discussed topics across meetings

2. Non-Functional Requirements

2.1 Performance

NFR-1.1: Q&A response time: <4 seconds (including LLM latency)
NFR-1.2: Vector search latency: <500ms
NFR-1.3: API response time: <1 second for non-AI endpoints
NFR-1.4: Page load time: <3 seconds
NFR-1.5: Concurrent users: 100+ with auto-scaling
NFR-1.6: Transcript processing: <1 minute for typical meeting

2.2 Scalability

NFR-2.1: Horizontal scaling via EC2 Auto Scaling Groups
NFR-2.2: Database: RDS with read replicas
NFR-2.3: S3 handles unlimited storage
NFR-2.4: Qdrant Cloud manages vector scaling
NFR-2.5: Support growth from 10 to 10,000 users

2.3 Reliability

NFR-3.1: 99.5% uptime SLA
NFR-3.2: Automated daily database backups
NFR-3.3: Multi-AZ RDS for failover
NFR-3.4: CloudFront CDN for static assets
NFR-3.5: Graceful error handling and user feedback

2.4 Security

NFR-4.1: HTTPS for all communications
NFR-4.2: Password hashing with bcrypt
NFR-4.3: JWT tokens for API authentication
NFR-4.4: SQL injection protection via ORM
NFR-4.5: XSS protection via template escaping
NFR-4.6: CSRF protection on forms
NFR-4.7: S3 encryption at rest (AES-256)
NFR-4.8: Database encryption (KMS)
NFR-4.9: API keys in Secrets Manager (no hardcoding)
NFR-4.10: Private S3 access via presigned URLs
NFR-4.11: Private subnet for RDS (no public IP)
NFR-4.12: Rate limiting: 100 requests/minute per user

2.5 Usability

NFR-5.1: Responsive design for mobile (375px+) and desktop
NFR-5.2: Accessibility: WCAG 2.1 Level AA compliance
NFR-5.3: Intuitive UI with clear navigation
NFR-5.4: Error messages explain what went wrong
NFR-5.5: Dark and light mode support (future)

2.6 Maintainability

NFR-6.1: Code documented with docstrings
NFR-6.2: DRY principle: no code duplication
NFR-6.3: Clear separation of concerns
NFR-6.4: Comprehensive logging with timestamps
NFR-6.5: Automated testing (unit + integration)

2.7 Compatibility

NFR-7.1: Browser support: Chrome, Firefox, Safari, Edge (latest 2 versions)
NFR-7.2: Mobile support: iOS Safari, Android Chrome
NFR-7.3: Python 3.13+ support
NFR-7.4: PostgreSQL 12+ support

3. System Requirements

3.1 Software Requirements

Backend: Django 4.x, Python 3.13+
Database: PostgreSQL 12+ (or SQLite for dev)
Web Server: Gunicorn + Nginx
Vector DB: Qdrant 1.x
Message Queue (future): Celery + Redis

3.2 Hardware Requirements (Production)

Compute: EC2 t3.medium (2 vCPU, 4GB RAM) minimum
- Development: t3.small sufficient
- Production: t3.large+ with auto-scaling 2-10 instances
Database: RDS t4g.medium (2 vCPU, 1GB RAM)
- Storage: 100GB gp3 (auto-scaling)
Bandwidth: 10 Mbps minimum (up to 1 Gbps for scaling)

3.3 Browser Requirements

Minimum: Chrome 90+, Firefox 88+, Safari 14+, Edge 90+
WebRTC support required for video
LocalStorage and SessionStorage support
WebSocket support

4. Dependencies

4.1 Backend Dependencies

Django==4.2
djangorestframework==3.14.0
psycopg2-binary==2.9.0
python-dotenv==1.0.0

# AI & ML
openai==2.16.0
qdrant-client==1.16.2
requests==2.31.0

# Transcription
AssemblyAI (API, no package)

# Cloud
boto3==1.26.137

# Real-time
pusher==3.3.1

# Video
agora-rtm (Agora SDK)
agora-token-builder (Token generation)

# Utilities
python-dateutil==2.8.2
pytz==2023.3
Pillow==10.0.0

4.2 Frontend Dependencies

Agora RTC SDK v4.24.2 (JavaScript)
Bootstrap 5.3
jQuery 3.6 (optional, for DOM manipulation)

4.3 External Services

OpenAI API: Embeddings (text-embedding-3-small) + LLM (GPT-4o)
AssemblyAI API: Speech-to-text transcription
Qdrant Cloud: Vector database hosting
AWS Services: EC2, RDS, S3, CloudWatch, Secrets Manager, ALB
Agora: Video/audio RTC
Pusher: WebSocket for chat

5. API Requirements

5.1 REST API Specifications

Base URL: /api/ or / (depending on endpoint)
Content-Type: application/json
Authentication: Django session + optional JWT for API clients
Response Format: JSON with status, data, and error fields
Pagination: Limit + offset for list endpoints
Versioning: Not required initially (v1 implicit)

5.2 WebSocket Requirements

Protocol: WebSocket (Pusher-managed)
Channels: Per-meeting chat channels
Message Format: JSON
Auto-reconnect: Client-side retry logic

5.3 Rate Limiting

100 requests/minute per user
1000 requests/minute per IP
Q&A queries: 10 per minute per user

6. Infrastructure Requirements

6.1 AWS Services Required

Compute: EC2 (application server)
Database: RDS PostgreSQL (relational data)
Storage: S3 (recordings, documents)
CDN: CloudFront (static assets, S3 downloads)
Load Balancer: Application Load Balancer (ALB)
Monitoring: CloudWatch (logs, metrics, alarms)
Secrets: Secrets Manager (API keys, credentials)
Networking: VPC, Security Groups, NAT Gateway

6.2 Third-Party Services Required

Qdrant Cloud: Vector database (managed)
OpenAI: API access (embeddings + GPT-4o)
AssemblyAI: Transcription API
Agora: RTC infrastructure
Pusher: WebSocket infrastructure

6.3 Monitoring & Logging

CloudWatch Logs: All application logs
CloudWatch Metrics: CPU, memory, request latency
CloudWatch Alarms: Errors, latency spikes, service degradation
Application Insights: APM for performance tracking (optional)

7. Data Requirements

7.1 Database Schema

Users: id, username, email, password_hash, created_at
MeetingRoom: id, room_code, host_id, title, description, status, recording data, transcript data, embedding metadata
TranscriptChunk: id, meeting_id, chunk_text, chunk_index, embedding_vector_id
DocumentUpload: id, meeting_id, file_name, file_type, s3_url, raw_text
DocumentChunk: id, document_id, chunk_text, chunk_index, embedding_vector_id
ConversationHistory: id, meeting_id, user_id, user_question, assistant_response, relevant_chunks
ChatMessage: id, user_id, content, created_at

7.2 Vector Database Schema

Collection: meeting_transcripts
- Dimension: 1536 (OpenAI text-embedding-3-small)
- Distance: Cosine Similarity
- Payload: meeting_id, chunk_index, text, timestamps

7.3 Storage (S3) Structure

s3://aimeet-s3-bucket/
├── recordings/
│   ├── meeting_123_audio.webm
│   └── meeting_124_audio.webm
├── documents/
│   ├── document_456.pdf
│   └── document_457.txt
└── transcripts/
    ├── transcript_123.txt
    └── transcript_124.txt

7.4 Data Retention Policy

Recordings: Keep indefinitely (archive to Glacier after 90 days)
Transcripts: Keep indefinitely
Chat messages: Keep indefinitely
Documents: Keep indefinitely
Database backups: 35-day retention
Logs: 30-day retention

8. Integration Requirements

8.1 External API Integrations

OpenAI API: Embeddings (batch and single)
AssemblyAI API: Transcription (async polling)
Qdrant API: Vector search and storage
AWS SDK (Boto3): S3 operations
Agora SDK: Token generation and RTC
Pusher API: WebSocket messaging

8.2 Authentication Integrations

Django authentication (built-in)
Optional: OAuth2 (Google, GitHub) - future
Optional: SAML - future

9. Testing Requirements

9.1 Unit Testing

Models: Test data validation and relationships
Views: Test API endpoints with mocks
Utilities: Test embedding, chunking, RAG functions
Target: >80% code coverage

9.2 Integration Testing

End-to-end meeting flow
Recording upload and transcription
RAG pipeline (chunk → embed → search → query)
Document upload and search

9.3 Performance Testing

Load test: 100 concurrent users
Transcription processing time
Q&A response latency
Vector search speed

9.4 Security Testing

OWASP Top 10 vulnerability scanning
SQL injection attempts
XSS payloads
CSRF validation

10. Documentation Requirements

10.1 Code Documentation

Docstrings for all functions/methods
Inline comments for complex logic
README.md for setup and usage
API documentation (Swagger/OpenAPI)

10.2 User Documentation

Quick start guide
Feature tutorials
FAQ
Troubleshooting guide

10.3 System Documentation

ARCHITECTURE.md (system design)
DESIGN.md (diagrams and flows)
REQUIREMENTS.md (this document)
Deployment guide

11. Future Enhancements

11.1 Planned Features

Speaker diarization (identify who said what)
Automatic action item detection
Topic summaries and key moments
Calendar integration
Role-based access control
Multi-language support
Slack/Teams integration
Custom embedding models

11.2 Optimization Opportunities

Redis caching layer (conversation history, user sessions)
Celery background jobs (transcription polling, document processing)
WebRTC data channels (peer-to-peer communication)
Progressive Web App (PWA) capabilities

12. Success Criteria

12.1 Functional Success

All FR requirements fully implemented
All tests passing
No critical bugs in production

12.2 Performance Success

Page load time <3 seconds (95th percentile)
Q&A response time <4 seconds (95th percentile)
99.5% uptime maintained
<1 second vector search latency

12.3 User Success

User registration completion rate >90%
Meeting creation to Q&A within 5 minutes
80% of users try Q&A feature within first week

12.4 Business Success

Support 1000+ concurrent users
Cost <$1000/month at 1000-user scale
Document uploaded for >50% of meetings
Sticky notes used in >40% of meetings

13. Constraints & Assumptions

13.1 Constraints

OpenAI API rate limits (depends on plan)
AssemblyAI transcription queue
AWS service quotas
Budget limitations for cloud services

13.2 Assumptions

Users have stable internet connection (>2 Mbps)
Meetings typically 30 minutes to 2 hours
Transcripts typically 5K-20K tokens
Users have modern browsers (2020+)
Organizations want to keep data private (not shared)

14. Compliance & Standards

14.1 Security Standards

SSL/TLS 1.3 for encryption
OWASP Top 10 compliance
GDPR compliance (user data protection)
HIPAA compliance (if health data involved) - future

14.2 Coding Standards

PEP 8 for Python code style
Django best practices
RESTful API design
Semantic versioning for releases

14.3 Accessibility Standards

WCAG 2.1 Level AA compliance
Keyboard navigation support
Screen reader compatibility
Color contrast ratios >4.5:1