Brak opisu

Nitin Kumar 074d17f112 First Commit - project setup 2 tygodni temu
Database 074d17f112 First Commit - project setup 2 tygodni temu
src 074d17f112 First Commit - project setup 2 tygodni temu
.gitignore 074d17f112 First Commit - project setup 2 tygodni temu
ARCHITECTURE_COMPARISON.md 074d17f112 First Commit - project setup 2 tygodni temu
CODE_REVIEW_SUMMARY.md 074d17f112 First Commit - project setup 2 tygodni temu
Dockerfile 074d17f112 First Commit - project setup 2 tygodni temu
ENVIRONMENT_CONFIG.md 074d17f112 First Commit - project setup 2 tygodni temu
EmbaseConferenceScheduler.slnx 074d17f112 First Commit - project setup 2 tygodni temu
QUICK_START.md 074d17f112 First Commit - project setup 2 tygodni temu
README_Architecture.md 074d17f112 First Commit - project setup 2 tygodni temu

README_Architecture.md

Embase Conference Abstract Packaging Scheduler

Architecture Overview

This project follows Clean Architecture principles with clear separation of concerns across multiple layers, adhering to Microsoft development best practices.

┌──────────────────────────────────────────────────────────────┐
│                    Worker Layer (Host)                       │
│  - Program.cs (Composition Root)                             │
│  - Quartz Job Definitions                                    │
│  - DI Configuration                                          │
└────────────────────┬─────────────────────────────────────────┘
                     │
         ┌───────────┴───────────┐
         │                       │
┌────────▼──────────┐   ┌────────▼───────────────────┐
│  Application      │   │   Infrastructure           │
│  - Business Logic │   │   - Database (Dapper)      │
│  - Orchestration  │   │   - SFTP (SSH.NET)         │
│  - Services       │   │   - File Operations        │
└────────┬──────────┘   └────────┬───────────────────┘
         │                       │
         └───────────┬───────────┘
                     │
            ┌────────▼──────────┐
            │     Domain        │
            │  - Entities       │
            │  - Interfaces     │
            │  - Configuration  │
            └───────────────────┘

Layer Responsibilities

1. Domain Layer (Core)

  • No dependencies on other layers
  • Contains:
    • Business entities (ConferenceAbstractArticle, DispatchRecord)
    • Repository and service interfaces (contracts)
    • Configuration models (SftpSettings, PackagingSettings, SchedulerSettings)
  • Pure business logic and domain rules

2. Application Layer

  • Depends on: Domain
  • Contains:
    • Business orchestration (PackagingService)
    • Use case implementations
    • Application-level services
  • Coordinates domain objects and infrastructure

3. Infrastructure Layer

  • Depends on: Domain
  • Contains:
    • Database implementation (ConferenceAbstractRepository using Dapper + Npgsql)
    • External service implementations (SftpService, ZipService)
    • Third-party integrations
  • Implements interfaces defined in Domain

4. Worker Layer (Composition Root)

  • Depends on: Domain, Application, Infrastructure
  • Contains:
    • Program.cs with DI container setup
    • Quartz.NET job definitions
    • Configuration extensions
    • Entry point for the application
  • Wires up all dependencies

Technology Stack

Layer Technologies
Framework .NET 8
Database PostgreSQL with Dapper ORM
Scheduler Quartz.NET
SFTP SSH.NET
Logging Serilog
Containerization Docker (Linux)

Project Structure

Embase_Conference_Workflow_Scheduler/
├── EmbaseConferenceScheduler.sln            # Solution file
│
├── src/
│   ├── EmbaseConferenceScheduler.Domain/
│   │   ├── Entities/
│   │   │   ├── ConferenceAbstractArticle.cs
│   │   │   └── DispatchRecord.cs
│   │   ├── Interfaces/
│   │   │   ├── IConferenceAbstractRepository.cs
│   │   │   └── IFileServices.cs
│   │   └── Configuration/
│   │       └── Settings.cs
│   │
│   ├── EmbaseConferenceScheduler.Application/
│   │   └── Services/
│   │       └── PackagingService.cs
│   │
│   ├── EmbaseConferenceScheduler.Infrastructure/
│   │   ├── Persistence/
│   │   │   └── ConferenceAbstractRepository.cs
│   │   ├── FileTransfer/
│   │   │   └── SftpService.cs
│   │   └── FileOperations/
│   │       └── ZipService.cs
│   │
│   └── EmbaseConferenceScheduler.Worker/
│       ├── Program.cs
│       ├── Jobs/
│       │   └── ConferenceAbstractPackagingJob.cs
│       ├── Configuration/
│       │   ├── DependencyInjection.cs
│       │   └── QuartzConfiguration.cs
│       ├── appsettings.json                 # Base/common settings
│       ├── appsettings.Development.json     # Dev overrides
│       ├── appsettings.Staging.json         # Staging overrides
│       └── appsettings.Production.json      # Production overrides
│
├── Database/
│   └── create_tracking_table.sql            # Database schema
│
├── Dockerfile                                  # Multi-stage build
├── .gitignore
└── README_Architecture.md                   # This file

Configuration Management

Environment-Specific Settings

The application uses hierarchical configuration following .NET conventions:

  1. appsettings.json - Common settings shared across all environments
  2. appsettings.{Environment}.json - Environment-specific overrides
  3. Environment variables - Runtime overrides (Docker/K8s)

Configuration Hierarchy (least to most specific)

appsettings.json
  ↓ (overridden by)
appsettings.Development.json / appsettings.Staging.json / appsettings.Production.json
  ↓ (overridden by)
Environment Variables
  ↓ (overridden by)
Command-line arguments

Settings Sections

Section Purpose Location
ConnectionStrings PostgreSQL connection All appsettings + env vars
Sftp SFTP server configuration All appsettings + env vars
Packaging File paths and naming All appsettings
Scheduler Quartz CRON schedule All appsettings
Serilog Logging configuration appsettings.json (common)

Database Integration (Dapper)

Why Dapper?

  • Performance: Minimal overhead, close to ADO.NET speed
  • Control: Full control over SQL queries
  • Simplicity: No heavy ORM abstractions
  • PostgreSQL Native: Works seamlessly with Npgsql

Repository Pattern

All database operations are abstracted through IConferenceAbstractRepository:

public interface IConferenceAbstractRepository
{
    Task<IReadOnlyList<ConferenceAbstractArticle>> GetUnprocessedArticlesAsync(CancellationToken ct);
    Task<long> GetNextSequenceNumberAsync(CancellationToken ct);
    Task SaveDispatchRecordsAsync(IEnumerable<DispatchRecord> records, CancellationToken ct);
}

Implementation uses:

  • Dapper for query mapping
  • Npgsql for PostgreSQL connectivity
  • Transactions for atomicity

Dependency Injection

All services are registered in DependencyInjection.cs:

services.Configure<SftpSettings>(config.GetSection(SftpSettings.SectionName));
services.Configure<PackagingSettings>(config.GetSection(PackagingSettings.SectionName));
services.Configure<SchedulerSettings>(config.GetSection(SchedulerSettings.SectionName));

services.AddSingleton<IConferenceAbstractRepository, ConferenceAbstractRepository>();
services.AddSingleton<IZipService, ZipService>();
services.AddSingleton<ISftpService, SftpService>();
services.AddSingleton<IPackagingService, PackagingService>();

Benefits:

  • Testability (easy to mock)
  • Loose coupling
  • Single Responsibility Principle
  • Inversion of Control

Build & Deployment

Prerequisites

  1. .NET 8 SDK
  2. Docker (for containerized deployment)
  3. PostgreSQL database with schema created

Local Development

# Restore dependencies
dotnet restore

# Build solution
dotnet build

# Run Worker (Development environment)
cd src/EmbaseConferenceScheduler.Worker
dotnet run --environment Development

Environment-Specific Builds

# Staging
dotnet run --environment Staging

# Production
dotnet run --environment Production

Docker Build & Run

# 1. Create tracking table
psql -d embase -f Database/create_tracking_table.sql

# 2. Configure appsettings files
# Edit src/EmbaseConferenceScheduler.Worker/appsettings.Production.json
# Update database connection, SFTP settings, etc.

# 3. Build Docker image
docker build -t embase-conference-scheduler:latest .

# 4. Run container
docker run -d \
  -e DOTNET_ENVIRONMENT=Production \
  -v /data/production/articles/pdf:/production/articles/pdf:ro \
  -v embase-logs:/logs \
  --name embase-conference-scheduler \
  embase-conference-scheduler:latest

# 4. View logs
docker logs -f embase-conference-scheduler

Scheduler Configuration

CRON Expressions

Default schedules per environment:

Environment CRON Description
Development 0 */5 * * * ? Every 5 minutes (testing)
Staging 0 0 3 * * ? Daily at 03:00 IST
Production 0 0 2 * * ? Daily at 02:00 IST

Override via Environment Variable

docker run -e "Scheduler__CronExpression=0 0 4 * * ?" ...

Business Workflow

┌──────────────────────────────────────────────────────┐
│  1. Scheduler Triggers (Daily CRON)                  │
└───────────────────┬──────────────────────────────────┘
                    │
┌───────────────────▼──────────────────────────────────┐
│  2. Query Unprocessed Articles from PostgreSQL       │
│     (tbldiscardeditemreport JOIN tblEmbaseConference │
│      WHERE lotid NOT IN dispatched)                  │
└───────────────────┬──────────────────────────────────┘
                    │
┌───────────────────▼──────────────────────────────────┐
│  3. Get Next Sequence Number (emconflumXXXXXXX)      │
└───────────────────┬──────────────────────────────────┘
                    │
┌───────────────────▼──────────────────────────────────┐
│  4. Group Articles by SourceId                       │
│     (One ZIP per source)                             │
└───────────────────┬──────────────────────────────────┘
                    │
        ┌───────────┴───────────┐
        │                       │
┌───────▼────────┐     ┌────────▼─────────┐
│  5a. Copy PDFs │     │  5b. Create ZIP  │
│  to temp folder│────▶│  from bundle     │
└────────────────┘     └────────┬─────────┘
                                │
                       ┌────────▼─────────┐
                       │  6. Upload SFTP  │
                       └────────┬─────────┘
                                │
                       ┌────────▼─────────┐
                       │  7. Save Dispatch│
                       │  Records to DB   │
                       └──────────────────┘

Testing Strategy

Unit Tests (Future)

- EmbaseConferenceScheduler.Domain.Tests
- EmbaseConferenceScheduler.Application.Tests
- EmbaseConferenceScheduler.Infrastructure.Tests

Mock external dependencies:

  • IConferenceAbstractRepository → in-memory fake
  • ISftpService → mock SFTP
  • IZipService → mock file system

Integration Tests (Future)

  • Test with real PostgreSQL (Docker TestContainers)
  • Test SFTP with test server
  • End-to-end workflow validation

Design Patterns Used

Pattern Location Purpose
Repository Infrastructure Abstract database access
Dependency Injection Worker (Program.cs) IoC container
Options Pattern All layers Strongly-typed configuration
Factory (Quartz) Worker Job instantiation
Strategy Infrastructure SFTP auth (password vs key)

Security Best Practices

Secrets Management

  1. Never commit secrets to source control
  2. Configure per environment in appsettings files:
    • appsettings.Development.json - Local development (can commit with dummy values)
    • appsettings.Staging.json - Staging secrets (git-ignored or stored in CI/CD)
    • appsettings.Production.json - Production secrets (git-ignored or stored in CI/CD)
  3. Use Docker secrets for SFTP keys (mounted as files)
  4. Use environment variables to override sensitive settings at runtime
  5. Use Azure Key Vault / AWS Secrets Manager in cloud deployments

Configuration Priority

Settings are loaded in this priority (last wins):

  1. appsettings.json (base/common settings)
  2. appsettings.{Environment}.json (environment-specific)
  3. Environment variables (runtime overrides)
  4. Command-line arguments (highest priority)

Connection Strings

Override via environment variables in Docker:

# In docker-compose.yml or at runtime
environment:
  ConnectionStrings__EmbaseDb: "Host=secure-db;Port=5432;Database=embase;Username=user;Password=secret"
  Sftp__Password: "sftp-secret-password"

Troubleshooting

Common Issues

Issue: Job not running

Check:

Issue: Database connection failure

Check:

  • Connection string format
  • Network connectivity to PostgreSQL
  • Database user permissions
  • Firewall rules

Issue: SFTP upload fails

Check:

  • SFTP server reachability (ping, telnet)
  • Authentication credentials
  • Private key file permissions
  • Remote path exists

Performance Considerations

  1. Dapper provides near-native ADO.NET performance
  2. Batch operations reduce database round-trips
  3. Cancellation tokens allow graceful shutdown
  4. Serilog async file writing reduces I/O blocking
  5. DisallowConcurrentExecution prevents job overlap

Future Enhancements

  • Add Polly for retry policies (transient fault handling)
  • Implement comprehensive unit tests
  • Add health checks (liveness/readiness probes for K8s)
  • Metrics export (Prometheus)
  • Distributed tracing (OpenTelemetry)
  • Background job status dashboard
  • Email notifications on failure

License

Proprietary - Elsevier Embase Team


Support

For issues or questions, contact the Embase Data Engineering team.