From a technology perspective, the solution is represented by a distributed asynchronous multi-agent computing platform operating translation quality algorithms, with Rabbit MQ as a service bus. The system provides RESTful API, which is available to external systems.
Tough requirements were set to system reliability, availability, performance and elasticity.
A critical requirement to the system was the ability to dynamically adjust provisioned cloud servers to the amount of files processed at each moment of operation. The delivered solution scales up to 100 distributed nodes, processing 30-100 execution orders per minute (i.e. 15 Mb/s).
Running the algorithms takes from 2-3 sec to 3-4 hours per task, depending on configuration and input files varying from 100 kb to 10 Mb each.
- Internal load balancing and routing mechanisms
- Computing nodes fault-tolerance
- Logging transparency
- Performance analysis
- Centralized system reconfiguration
- “Sick” nodes/orders exclusion