# Embase Conference Abstract Packaging Scheduler ## Architecture Overview This project follows **Clean Architecture** principles with clear separation of concerns across multiple layers, adhering to Microsoft development best practices. ``` ┌──────────────────────────────────────────────────────────────┐ │ Worker Layer (Host) │ │ - Program.cs (Composition Root) │ │ - Quartz Job Definitions │ │ - DI Configuration │ └────────────────────┬─────────────────────────────────────────┘ │ ┌───────────┴───────────┐ │ │ ┌────────▼──────────┐ ┌────────▼───────────────────┐ │ Application │ │ Infrastructure │ │ - Business Logic │ │ - Database (Dapper) │ │ - Orchestration │ │ - SFTP (SSH.NET) │ │ - Services │ │ - File Operations │ └────────┬──────────┘ └────────┬───────────────────┘ │ │ └───────────┬───────────┘ │ ┌────────▼──────────┐ │ Domain │ │ - Entities │ │ - Interfaces │ │ - Configuration │ └───────────────────┘ ``` ### Layer Responsibilities #### 1. **Domain Layer** (Core) - **No dependencies** on other layers - Contains: - Business entities (`ConferenceAbstractArticle`, `DispatchRecord`) - Repository and service interfaces (contracts) - Configuration models (`SftpSettings`, `PackagingSettings`, `SchedulerSettings`) - Pure business logic and domain rules #### 2. **Application Layer** - **Depends on**: Domain - Contains: - Business orchestration (`PackagingService`) - Use case implementations - Application-level services - Coordinates domain objects and infrastructure #### 3. **Infrastructure Layer** - **Depends on**: Domain - Contains: - Database implementation (`ConferenceAbstractRepository` using Dapper + Npgsql) - External service implementations (`SftpService`, `ZipService`) - Third-party integrations - Implements interfaces defined in Domain #### 4. **Worker Layer** (Composition Root) - **Depends on**: Domain, Application, Infrastructure - Contains: - `Program.cs` with DI container setup - Quartz.NET job definitions - Configuration extensions - Entry point for the application - Wires up all dependencies --- ## Technology Stack | Layer | Technologies | |-------|-------------| | **Framework** | .NET 8 | | **Database** | PostgreSQL with Dapper ORM | | **Scheduler** | Quartz.NET | | **SFTP** | SSH.NET | | **Logging** | Serilog | | **Containerization** | Docker (Linux) | --- ## Project Structure ``` Embase_Conference_Workflow_Scheduler/ ├── EmbaseConferenceScheduler.sln # Solution file │ ├── src/ │ ├── EmbaseConferenceScheduler.Domain/ │ │ ├── Entities/ │ │ │ ├── ConferenceAbstractArticle.cs │ │ │ └── DispatchRecord.cs │ │ ├── Interfaces/ │ │ │ ├── IConferenceAbstractRepository.cs │ │ │ └── IFileServices.cs │ │ └── Configuration/ │ │ └── Settings.cs │ │ │ ├── EmbaseConferenceScheduler.Application/ │ │ └── Services/ │ │ └── PackagingService.cs │ │ │ ├── EmbaseConferenceScheduler.Infrastructure/ │ │ ├── Persistence/ │ │ │ └── ConferenceAbstractRepository.cs │ │ ├── FileTransfer/ │ │ │ └── SftpService.cs │ │ └── FileOperations/ │ │ └── ZipService.cs │ │ │ └── EmbaseConferenceScheduler.Worker/ │ ├── Program.cs │ ├── Jobs/ │ │ └── ConferenceAbstractPackagingJob.cs │ ├── Configuration/ │ │ ├── DependencyInjection.cs │ │ └── QuartzConfiguration.cs │ ├── appsettings.json # Base/common settings │ ├── appsettings.Development.json # Dev overrides │ ├── appsettings.Staging.json # Staging overrides │ └── appsettings.Production.json # Production overrides │ ├── Database/ │ └── create_tracking_table.sql # Database schema │ ├── Dockerfile # Multi-stage build ├── .gitignore └── README_Architecture.md # This file ``` --- ## Configuration Management ### Environment-Specific Settings The application uses **hierarchical configuration** following .NET conventions: 1. **appsettings.json** - Common settings shared across all environments 2. **appsettings.{Environment}.json** - Environment-specific overrides 3. **Environment variables** - Runtime overrides (Docker/K8s) ### Configuration Hierarchy (least to most specific) ``` appsettings.json ↓ (overridden by) appsettings.Development.json / appsettings.Staging.json / appsettings.Production.json ↓ (overridden by) Environment Variables ↓ (overridden by) Command-line arguments ``` ### Settings Sections | Section | Purpose | Location | |---------|---------|----------| | `ConnectionStrings` | PostgreSQL connection | All appsettings + env vars | | `Sftp` | SFTP server configuration | All appsettings + env vars | | `Packaging` | File paths and naming | All appsettings | | `Scheduler` | Quartz CRON schedule | All appsettings | | `Serilog` | Logging configuration | appsettings.json (common) | --- ## Database Integration (Dapper) ### Why Dapper? - **Performance**: Minimal overhead, close to ADO.NET speed - **Control**: Full control over SQL queries - **Simplicity**: No heavy ORM abstractions - **PostgreSQL Native**: Works seamlessly with Npgsql ### Repository Pattern All database operations are abstracted through `IConferenceAbstractRepository`: ```csharp public interface IConferenceAbstractRepository { Task> GetUnprocessedArticlesAsync(CancellationToken ct); Task GetNextSequenceNumberAsync(CancellationToken ct); Task SaveDispatchRecordsAsync(IEnumerable records, CancellationToken ct); } ``` Implementation uses: - **Dapper** for query mapping - **Npgsql** for PostgreSQL connectivity - **Transactions** for atomicity --- ## Dependency Injection All services are registered in `DependencyInjection.cs`: ```csharp services.Configure(config.GetSection(SftpSettings.SectionName)); services.Configure(config.GetSection(PackagingSettings.SectionName)); services.Configure(config.GetSection(SchedulerSettings.SectionName)); services.AddSingleton(); services.AddSingleton(); services.AddSingleton(); services.AddSingleton(); ``` **Benefits:** - Testability (easy to mock) - Loose coupling - Single Responsibility Principle - Inversion of Control --- ## Build & Deployment ### Prerequisites 1. .NET 8 SDK 2. Docker (for containerized deployment) 3. PostgreSQL database with schema created ### Local Development ```bash # Restore dependencies dotnet restore # Build solution dotnet build # Run Worker (Development environment) cd src/EmbaseConferenceScheduler.Worker dotnet run --environment Development ``` ### Environment-Specific Builds ```bash # Staging dotnet run --environment Staging # Production dotnet run --environment Production ``` ### Docker Build & Run ```bash # 1. Create tracking table psql -d embase -f Database/create_tracking_table.sql # 2. Configure appsettings files # Edit src/EmbaseConferenceScheduler.Worker/appsettings.Production.json # Update database connection, SFTP settings, etc. # 3. Build Docker image docker build -t embase-conference-scheduler:latest . # 4. Run container docker run -d \ -e DOTNET_ENVIRONMENT=Production \ -v /data/production/articles/pdf:/production/articles/pdf:ro \ -v embase-logs:/logs \ --name embase-conference-scheduler \ embase-conference-scheduler:latest # 4. View logs docker logs -f embase-conference-scheduler ``` --- ## Scheduler Configuration ### CRON Expressions Default schedules per environment: | Environment | CRON | Description | |-------------|------|-------------| | **Development** | `0 */5 * * * ?` | Every 5 minutes (testing) | | **Staging** | `0 0 3 * * ?` | Daily at 03:00 IST | | **Production** | `0 0 2 * * ?` | Daily at 02:00 IST | ### Override via Environment Variable ```bash docker run -e "Scheduler__CronExpression=0 0 4 * * ?" ... ``` --- ## Business Workflow ``` ┌──────────────────────────────────────────────────────┐ │ 1. Scheduler Triggers (Daily CRON) │ └───────────────────┬──────────────────────────────────┘ │ ┌───────────────────▼──────────────────────────────────┐ │ 2. Query Unprocessed Articles from PostgreSQL │ │ (tbldiscardeditemreport JOIN tblEmbaseConference │ │ WHERE lotid NOT IN dispatched) │ └───────────────────┬──────────────────────────────────┘ │ ┌───────────────────▼──────────────────────────────────┐ │ 3. Get Next Sequence Number (emconflumXXXXXXX) │ └───────────────────┬──────────────────────────────────┘ │ ┌───────────────────▼──────────────────────────────────┐ │ 4. Group Articles by SourceId │ │ (One ZIP per source) │ └───────────────────┬──────────────────────────────────┘ │ ┌───────────┴───────────┐ │ │ ┌───────▼────────┐ ┌────────▼─────────┐ │ 5a. Copy PDFs │ │ 5b. Create ZIP │ │ to temp folder│────▶│ from bundle │ └────────────────┘ └────────┬─────────┘ │ ┌────────▼─────────┐ │ 6. Upload SFTP │ └────────┬─────────┘ │ ┌────────▼─────────┐ │ 7. Save Dispatch│ │ Records to DB │ └──────────────────┘ ``` --- ## Testing Strategy ### Unit Tests (Future) ``` - EmbaseConferenceScheduler.Domain.Tests - EmbaseConferenceScheduler.Application.Tests - EmbaseConferenceScheduler.Infrastructure.Tests ``` **Mock external dependencies:** - `IConferenceAbstractRepository` → in-memory fake - `ISftpService` → mock SFTP - `IZipService` → mock file system ### Integration Tests (Future) - Test with real PostgreSQL (Docker TestContainers) - Test SFTP with test server - End-to-end workflow validation --- ## Design Patterns Used | Pattern | Location | Purpose | |---------|----------|---------| | **Repository** | Infrastructure | Abstract database access | | **Dependency Injection** | Worker (Program.cs) | IoC container | | **Options Pattern** | All layers | Strongly-typed configuration | | **Factory (Quartz)** | Worker | Job instantiation | | **Strategy** | Infrastructure | SFTP auth (password vs key) | --- ## Security Best Practices ### Secrets Management 1. **Never commit secrets** to source control 2. **Configure per environment** in appsettings files: - `appsettings.Development.json` - Local development (can commit with dummy values) - `appsettings.Staging.json` - Staging secrets (git-ignored or stored in CI/CD) - `appsettings.Production.json` - Production secrets (git-ignored or stored in CI/CD) 3. Use **Docker secrets** for SFTP keys (mounted as files) 4. Use **environment variables** to override sensitive settings at runtime 5. Use **Azure Key Vault** / AWS Secrets Manager in cloud deployments ### Configuration Priority Settings are loaded in this priority (last wins): 1. `appsettings.json` (base/common settings) 2. `appsettings.{Environment}.json` (environment-specific) 3. Environment variables (runtime overrides) 4. Command-line arguments (highest priority) ### Connection Strings Override via environment variables in Docker: ```bash # In docker-compose.yml or at runtime environment: ConnectionStrings__EmbaseDb: "Host=secure-db;Port=5432;Database=embase;Username=user;Password=secret" Sftp__Password: "sftp-secret-password" ``` --- ## Troubleshooting ### Common Issues #### Issue: Job not running **Check:** - CRON expression validity: https://www.freeformatter.com/cron-expression-generator-quartz.html - Timezone setting matches server timezone - Logs for Quartz scheduler startup #### Issue: Database connection failure **Check:** - Connection string format - Network connectivity to PostgreSQL - Database user permissions - Firewall rules #### Issue: SFTP upload fails **Check:** - SFTP server reachability (`ping`, `telnet`) - Authentication credentials - Private key file permissions - Remote path exists --- ## Performance Considerations 1. **Dapper** provides near-native ADO.NET performance 2. **Batch operations** reduce database round-trips 3. **Cancellation tokens** allow graceful shutdown 4. **Serilog** async file writing reduces I/O blocking 5. **DisallowConcurrentExecution** prevents job overlap --- ## Future Enhancements - [ ] Add Polly for retry policies (transient fault handling) - [ ] Implement comprehensive unit tests - [ ] Add health checks (liveness/readiness probes for K8s) - [ ] Metrics export (Prometheus) - [ ] Distributed tracing (OpenTelemetry) - [ ] Background job status dashboard - [ ] Email notifications on failure --- ## License Proprietary - Lumina Datamatics --- ## Support For issues or questions, contact the Lumina Technology Team.