Telecommunication Allowance, Meal Allowance, Transportation Allowance, Housing Allowance, Medical Reimbursement
Description
Location
Japan | Full-time
Department
Operations Department
Job Responsibilities
System Stability and Performance Optimization
Responsible for the deployment, monitoring, fault handling, and capacity planning of the company's platform systems (website, APP, API, database, middleware, etc.), ensuring efficient and stable operation.
Establishment of Automated Operations System
Build and maintain CI/CD processes, automated deployment, container orchestration (Kubernetes), configuration management (Ansible/Terraform), etc., to achieve efficient operational automation.
High Availability Architecture and Disaster Recovery Design
Design and deploy high availability system architecture across availability zones and regions, establish comprehensive disaster recovery and backup mechanisms, ensuring 24/7 service continuity.
Operational Security and Compliance Development
Assist the security team in implementing access control, data encryption, firewall strategies, DDoS protection, audit log management, etc., to build a comprehensive security system.
Monitoring and Emergency Response
Utilize tools such as Prometheus, Grafana, Zabbix, ELK, etc., to establish monitoring and alerting systems, quickly respond to and resolve abnormal events.
Collaboration with R&D and Support for Rapid Delivery
Collaborate with the development team to support testing, gradual release, and traffic switching, ensuring system stability and iteration efficiency under agile development.
Cloud Platform Resource Management
Manage resources on cloud platforms such as AWS, Alibaba Cloud, GCP, etc., for cost control and elastic architecture management, enabling multi-cloud or hybrid cloud deployment.
Operational Documentation and Process Standardization
Write and maintain operational manuals, emergency plans, log records, and other documents to enhance team collaboration efficiency and standardization.
Requirements
Job Requirements
Bachelor's degree or higher in a computer-related field.
Over 3 years of experience in operations and maintenance for large internet or financial systems, with a preference for candidates with a background in trading systems.
Proficient in Linux operating systems and well-versed in common network protocols (TCP/IP, HTTP, DNS).
Familiar with system components such as Docker, Kubernetes, Nginx, Redis, MySQL, and MongoDB.
Expertise in monitoring and logging tools like ELK, Prometheus, and Zabbix.
Proficient in scripting languages such as Shell, Python, and Go, with an understanding of GitOps/DevOps principles.
At least familiar with one cloud platform (AWS, Alibaba Cloud, GCP, Azure).
Good communication skills, a strong sense of responsibility, and the ability to work under pressure, capable of adapting to a 24/7 response mechanism.
Experience in blockchain, exchanges, or the fintech field is preferred.
Preferred Qualifications
Experience in hybrid cloud/cross-region architecture implementation.
Familiarity with financial-grade high availability and disaster recovery deployment standards.
Experience in operations and maintenance of large microservices architecture or trading matching systems.