초당 수천 건의 결제를 처리하는 API 만들기

2025년 05월 28일 01시 33분 42초에 업로드 된 글입니다.

작성자: do_hyuk

728x90

✅ 목표: 트래픽 처리를 위한 큐/비동기 처리 학습

요구 사항

POST /payments 로 결제 요청을 받는다.
내부에서 큐 (예: ConcurrentLinkedQueue) 또는 비동기 작업으로 저장/처리
동시에 100000개의 요청을 보내도 처리 가능해야 함
처리 시간과 결과 로그 기록

🔧 키워드: @Async, ThreadPoolTaskExecutor, ConcurrentLinkedQueue, ExecutorService, CountDownLatch

✅ 개발 환경

Java 버전: Java 17 (최신 LTS)
Spring Boot: 3.x (Spring Web 사용)
빌드 도구: Gradle 또는 Maven (Gradle 추천)
IDE: IntelliJ IDEA (Community Edition 이상)

✅ 필수 라이브러리 (Gradle 기준)

dependencies {
    implementation 'org.springframework.boot:spring-boot-starter-web'
    implementation 'org.springframework.boot:spring-boot-starter'
    implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
    implementation 'org.springframework.boot:spring-boot-starter-aop'

    compileOnly 'org.projectlombok:lombok'
    annotationProcessor 'org.projectlombok:lombok'

    runtimeOnly 'com.mysql:mysql-connector-j'

    // 테스트
    testImplementation 'org.springframework.boot:spring-boot-starter-test'
    testRuntimeOnly 'org.junit.platform:junit-platform-launcher'
}

✅ 기본 설정

1. 비동기 활성화

@EnableAsync를 사용하려면 아래 설정이 필요하다

@SpringBootApplication
@EnableAsync
public class PaymentApplication {
    public static void main(String[] args) {
        SpringApplication.run(PaymentApplication.class, args);
    }
}

✅ 성능/스트레스 테스트 준비

도구	장점	추천	상황
k6	코드 기반, 시나리오 관리 쉬움	JS 환경 익숙하지 않으면 약간 진입장벽	✅ 실무 대비 학습, 고급 시나리오 작성
hey	매우 간단하고 빠름	시나리오 확장성 거의 없음	🔹 단순 벤치마크용
wrk	매우 빠름, Lua 스크립트 지원	고급 기능 설정 복잡	🔹 초고성능 벤치마크
JMeter	GUI 지원, 복잡한 테스트 가능	무겁고 느림, 복잡한 설정	🔹 복잡한 기업 환경/레거시 시스템 연동

실전/실무에 가깝게 준비하고 싶다면 → k6이 가장 추천
단순히 TPS 확인만 원하면 hey로 빠르게 시도해도 좋다.
k6는 나중에 CI/CD와 부하 테스트 자동화할 때도 큰 도움이 된다.

실전 및 실무 가깝게 준비하고 싶기 때문에 K6를 사용하도록 하겠다.

✅ 구현

@RestController
@RequestMapping("/payment")
public class PaymentController {

    private final PaymentService paymentService;

    @PostMapping
    public ResponseEntity<CompletableFuture<Long>> save(@RequestBody PaymentSaveRequestDto requestDto) {
        return ResponseEntity.ok().body(paymentService.save(requestDto));
    }
}

@Configuration
public class AsyncConfig {
    @Bean(name = "paymentExecutor")
    public Executor paymentExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(20);
        executor.setMaxPoolSize(100);
        executor.setQueueCapacity(5000);
        executor.setThreadNamePrefix("payment-async-");
        executor.initialize();
        return executor;
    }
}

@Service
public class PaymentService {

    private final PaymentRepository paymentRepository;

    @Async("paymentExecutor")
    @Transactional
    public CompletableFuture<Long> save(PaymentSaveRequestDto requestDto) {
        try {
            Payment saved = paymentRepository.save(requestDto.toEntity());
            log.info("✅ 저장 완료: user={}, history={}", requestDto.userName(), requestDto.history());
            return CompletableFuture.completedFuture(saved.getId());
        } catch (Exception e) {
            log.error("❌ 저장 실패: {}", e.getMessage(), e);
            CompletableFuture<Long> failed = new CompletableFuture<>();
            failed.completeExceptionally(e);
            return failed;
        }
    }

}

비동기 처리 방식으로 save 로직을 구현하였고 반환 타입으로 CompletableFuture<Long> 타입을 반환하였다.

CompletableFuture<T>는 Java 8부터 추가된 비동기 작업의 결과값을 담는 컨테이너이다.

미래에 완료될 작업의 결과 (T)를 담고,
완료되기 전까지는 기다리지 않고 다음 작업을 수행할 수 있다.

save() 메서드에서 CompletableFuture<Long>을 쓴 이유는 비동기 저장 처리를 위해서이다.

@Async와 함께 사용하면, 이 메서드는 별도 쓰레드에서 비동기적으로 실행된다.
즉, Controller는 이 저장 로직이 끝날 때까지 기다리지 않아도 됨.

어떤 이득이 있는가?	설명
⏱️ 처리 속도 개선	저장 작업 중에도 다른 요청을 계속 처리 가능. 서버 자원 효율적으로 사용.
🧵 쓰레드 분리 처리	@Async("paymentExecutor")로 저장 작업을 별도 쓰레드에서 실행해 메인 쓰레드 부담 줄임
🔁 후속 처리 연결	.thenApply(), .exceptionally() 등을 활용해 후처리 로직도 체이닝 가능
🛠️ 예외 처리 유연성	completeExceptionally() 등을 통해 에러 발생 시 명확한 예외 흐름 관리 가능

요약하자면 CompletableFuture는 비동기 저장 결과를 담는 컨테이너이며,이를 활용하면 서버는 저장 작업이 끝날 때까지 블로킹되지 않고 응답성을 유지할 수 있다.

트러블 슈팅

import http from 'k6/http';
import { check } from 'k6';

export const options = {
    vus: 100000, // 동시 사용자 수
    duration: '1s', // 테스트 시간
};

export default function () {
    const url = 'http://host.docker.internal:8080/payment';

    const randomAmount = Math.floor(Math.random() * 10000) + 1; // 1 ~ 10000 사이 금액
    const user = `user${__VU}_${__ITER}`; // 사용자 이름을 VU+순번으로 고유하게

    const payload = JSON.stringify({
        userName: user,
        history: randomAmount
    });

    const params = {
        headers: {
            'Content-Type': 'application/json',
        },
    };

    const res = http.post(url, payload, params);

    check(res, {
        '✅ status is 200': (r) => r.status === 200,
        '✅ body has id': (r) => r.body && r.body !== 'null',
    });
}

K6를 통해서 1초동안 10만명이 동시에 save API를 호출했을 경우 제대로 적용되는지 확인해보았을 때,

다음과 같은 결과가 떳다.

k6-1  | running (01.0s), 100/100 VUs, 5032 complete and 0 interrupted iterations
k6-1  | default   [  10% ] 100 VUs  01.0s/10s                                                                                                                                                                                       
...                                                                                                                                                                                          
k6-1  |                                                                                                                                                                                                                             
k6-1  | running (33.0s), 100/100 VUs, 13853 complete and 0 interrupted iterations
k6-1  | default ↓ [ 100% ] 100 VUs  10s                                                                                                                                                                                             
k6-1  | time="2025-05-27T08:01:17Z" level=warning msg="Request Failed" error="Post \"http://host.docker.internal:8080/payment\": dial: i/o timeout"                                                                                 
...                                                                             
k6-1  | time="2025-05-27T08:01:17Z" level=warning msg="Request Failed" error="Post \"http://host.docker.internal:8080/payment\": dial: i/o timeout"
k6-1  | 
k6-1  | running (34.0s), 068/100 VUs, 13885 complete and 0 interrupted iterations                                                                                                                                                   
k6-1  | default ↓ [ 100% ] 100 VUs  10s                                                                                                                                                                                             
...
k6-1  | time="2025-05-27T08:01:18Z" level=warning msg="Request Failed" error="Post \"http://host.docker.internal:8080/payment\": dial: i/o timeout"
k6-1  |                                                                                                                                                                                                                             
k6-1  |      ✗ ✅ status is 200                                                                                                                                                                                                      
k6-1  |       ↳  69% — ✓ 9672 / ✗ 4281                                                                                                                                                                                              
k6-1  |      ✗ ✅ body has id                                                                                                                                                                                                        
k6-1  |       ↳  99% — ✓ 13853 / ✗ 100                                                                                                                                                                                              
k6-1  |                                                                                                                                                                                                                             
...
k6-1  |      http_req_duration..............: avg=25.4ms   min=0s       med=18.08ms  max=182.67ms p(90)=56.24ms p(95)=73.44ms                                                                                                       
k6-1  |        { expected_response:true }...: avg=17.77ms  min=590.1µs  med=15.45ms  max=153.25ms p(90)=31.92ms p(95)=39.81ms                                                                                                       
k6-1  |      http_req_failed................: 30.68% 4281 out of 13953                                                                                                                                                              
...
k6-1  |      iteration_duration.............: avg=244.77ms min=658.72µs med=20.08ms  max=30s      p(90)=68.28ms p(95)=91.71ms  
k6-1  |      iterations.....................: 13953  402.297265/s
k6-1  |      vus............................: 68     min=68             max=100                                                                    
k6-1  |      vus_max........................: 100    min=100            max=100
k6-1  |                                                                                                                                                                                                                                                                                                                                                                                  
k6-1  | running (34.7s), 000/100 VUs, 13953 complete and 0 interrupted iterations                                                                                                                                                   
k6-1  | default ✓ [ 100% ] 100 VUs  10s                                                                                                                                                                                             
k6-1 exited with code 0

api를 정상적으로 처리하다가 TimeOut이 발생하기도 하였고,

✔ status is 200 → 69% 성공
✔ body has id → 99% 성공 (응답 받은 요청 중에서는 대부분 성공)

한 것을 볼 수 있다.

📈 부가 메트릭 요약

지표	값	설명
http_req_duration	avg=25ms	요청에 걸린 시간 평균
http_req_failed	30.68%	❗ 실패 요청 비율
iteration_duration	avg=244ms	반복(iteration)당 걸린 시간
vus	68~100	실행 중인 가상 유저 수

여기서 Timeout이 발생한 원인으로 AsyncConfig 설정파일과 관련이 있다고 생각하였다.

executor.setCorePoolSize(20);       // 기본 쓰레드 수
executor.setMaxPoolSize(100);       // 최대 쓰레드 수
executor.setQueueCapacity(5000);    // 큐에 저장 가능한 요청 수

이 설정은 @Async("paymentExecutor")를 사용하는 비동기 메서드 호출 시 사용할 스레드 풀을 구성한다.

즉, PaymentService.save()가 @Async로 동작할 때는 이 ThreadPoolTaskExecutor의 설정을 따른다.

🧠 이 설정과 k6 실패의 관계

테스트 중 수많은 요청이 들어오고,
비동기 메서드가 스레드 부족이나 큐 용량 초과로 인해 처리를 못 하면,
CompletableFuture를 반환하기 전에 예외가 나거나 내부 처리 지연이 발생해서
응답이 늦어지고, 클라이언트(k6)에서 timeout 발생 가능

ex)
100명 이상이 동시에 POST /payment 요청 시 →
스레드 풀 최대치(100) + 큐 용량(5000) 초과되면 →
요청을 바로 처리하지 못함 →
네트워크 레벨에서 응답 지연 (dial: i/o timeout) 발생 가능

💡 첫 번째 개선

(서버 자원이 충분하다는 가정 하에 )풀 사이즈 및 큐 용량을 증가시켜보았다.

executor.setCorePoolSize(50);
executor.setMaxPoolSize(200);
executor.setQueueCapacity(10000);

풀 사이즈 및 큐 용량을 증가시키고 테스트 한 결과, k6 exited with code 137: 에러가 발생하며 종료되었다.

해당 오류는 프로세스가 OOM(Out Of Memory) 또는 강제 종료(SIGKILL) 되었음을 의미한다.

즉, 부하량이 너무 커서 K6 또는 도커 컨테이너가 죽어버린 것이다.

다시 한 번 생각해보니 VUs 10만을 1초 내에 실행하는 것은 부하 테스트가 아닌 DDoS에 가깝다는 생각이 들었고

이는 서버나 로컬 네트워크가 감당할 수 없다고 판단되었다. VU와 duration을 적절히 조절해야겠다.

💡 두 번째 개선

(현실적인 성능 테스트 설정) 점진적인 VU 증가 방식을 택하였다.

export const options = {
    stages: [
        { duration: '10s', target: 50 },  // VU 0 → 50으로 증가
        { duration: '20s', target: 50 },  // 50 유지
        { duration: '10s', target: 0 },   // 종료
    ],
};

📈 핵심 메트릭 요약

항목	값
총 요청 수	27,621회
성공 요청 수	27,588회
실패 요청 수	약 50회 (모두 i/o timeout)
평균 요청 차단 시간	avg = 5.5ms, max = 7.24s
전송/수신량	약 5.3MB / 5.8MB

💡 blocking: max=7.24s → 요청 대기 시간이 7초 이상이면 심각한 병목 가능성이 있다.

원인 추정	개선 방안
서버의 스레드 또는 커넥션 한계	Tomcat/Netty 설정 조정 (max-threads, max-connections)
클라이언트 요청 처리 한계	k6에서 keep-alive 설정 확인 및 커넥션 재사용 설정, VU 수에 맞게 커넥션 수 제한 및 재활용 최적화

🎯 프로젝트 목적 정리

"단기간에 증가하는 트래픽에 서버가 어떻게 반응하는지를 관찰하고, 그로부터 병목 지점을 식별하고 해결 방안을 도출하는 것이 본 테스트의 핵심 목표였다."

실제 배포 환경이 아닌 로컬 환경 기반 테스트였지만,
이 테스트를 통해 성능 저하, 타임아웃, 요청 대기 시간 증가라는 결과를 도출했고,
이런 현상들이 실제 트래픽 환경에서도 충분히 재현될 수 있다는 점에서 의미 있는 인사이트를 얻었다.

🔍 주요 인사이트 요약

요청이 몰리면 서버 또는 네트워크 병목이 즉시 드러난다.
단순히 서버를 띄운다고 끝이 아니고, 동시 요청 처리 능력, 커넥션 관리, I/O 대기 등을 고려해야 한다.
i/o timeout이라는 단일 에러 메시지도 서버, 클라이언트, 네트워크 중 어디서든 원인이 될 수 있다.