Changelog
Release Version: v0.14.2-pb7-mpsc10 | Release Date: Feb 10, 2025
What's update
- Fixed the distinct_values keep updating schema
- Fixed get schema cache missing issue
- Fixed circuitBreaker recover issue
Search without ingester
ZO_FEATURE_QUERY_SKIP_WAL=true
Slow request detail
- Add more debug log detail and print the original json
- Rollback actix-web from 4.9 to 4.8
- also change the print only greater than 5s
ZO_ACTIX_SLOW_LOG_THRESHOLD=5
will print some logs like this: bulk request detail:
total: {} ms, prepare: {} ms, flatten: {} ms, convert_to_uds: {} ms, json_parse: {} ms, format_stream_name: {} ms, get_uds_and_original: {} ms, handle_timestamp: {} ms, before_write: {} ms, write_to_channel: {} ms
also if the before_write over than 5s will print:
original_line: {}
write to channel detail:
[write_logs] total time: {} ms, get_schema: {} ms, check_schema: {} ms, validate: {} ms, get_distinct: {} ms, write_distinct: {} ms, get_writer: {} ms
Circuit Breaker
ZO_ACTIX_SLOW_LOG_THRESHOLD=5 // seconds
ZO_CIRCUIT_BREAKER_ENABLED=true
ZO_CIRCUIT_BREAKER_WATCHING_WINDOW=60 // seconds
ZO_CIRCUIT_BREAKER_SLOW_REQUEST_THRESHOLD=100 // slow requests
ZO_CIRCUIT_BREAKER_RESET_WINDOW_NUM=3 // reset state window num
We also added a new CircuitBreaker to tell the LoadBlancer this node is too heavy please stop give new traffic. that is because we found there is always some node have more connections than others. it looks the LB doesn't know how to control the schedule polity. let's tell it.
This new feature will analyze the http request, if we found there are more than 100
slow requests in 60
seconds, then we will set the node state to unSchedulable
, then the /schedulez
API will return error. this should let the LB stop give new traffic.
And then the state will reset in next 3
watching window (3 * 60s), the /schedulez
API will return ok
again if there is no slow request exceeds the limit.