This is what I previously used for apache 2.4 (this very out of date for what I ...

This is what I previously used for apache 2.4 (this very out of date for what I use now since the company I'm with isn't apache but the gist is there). I don't have the nginx one readily in front of me.

I snipped a few fields out of this that were 'me setup' specific so small chance the formatting is off. The field variables for nginx are a hell of a lot less obtuse than the apache ones.

LogFormat "{ \"application_name\":\"%v\", \"application_canonical-port\":\"%p\", \"application_client-ip\":\"%a\", \"application_local-ip\":\"%A\", \"application_local-port\":\"%{local}p\", \"application_pid\":\"%P\", \"fastly-client-ip\":\"%{fastly-client-ip}i\", \"request_x-forwarded-for\":\"%{X-Forwarded-For}i\", \"request_x-tracer\":\"%{X-TRACER}i\", \"request_geo-ip\":\"%{GEOIP_ADDR}e\", \"request_geo-continent\":\"%{GEOIP_CONTINENT_CODE}e\", \"request_geo-country\":\"%{GEOIP_COUNTRY_CODE}e\", \"request_host\":\"%{Host}i\", \"request_auth-user\":\"%u\", \"request_content-type\":\"%{Content-Type}i\", \"request_timestamp\":\"%t\", \"request_uri\":\"%r\", \"request_referer\":\"%{Referer}i\", \"request_user-agent\":\"%{User-Agent}i\", \"response_code\":\"%>s\", \"response_bytes\":\"%b\", \"response_seconds\":\"%T\", \"response_microseconds\":\"%D\", \"response_content-type\":\"%{Content-Type}o\" }" extendedcombined

It'll come out looking like this (thrown through jsonlint.com)

{

"application_name": "%v", # Server name from the vhost config (not necessarily the hostname if you have aliases and depending on how you handle bare ip requests and if you use named vhosts

"application_canonical-port": "%p", # 80 or 443 depending on if you're tls or not, I never figured out the point of this and it's actively confusing

"application_client-ip": "%a", # The client ip calling apache, not necessarily the users IP if you have lbs/reverse proxies in the mix

"application_local-ip": "%A", # Host ip the application is running on

"application_local-port": "%{local}p", # Actual port the application is running on

"application_pid": "%P", # Apache pid that handled the request

"fastly-client-ip": "%{fastly-client-ip}i", # This is a header that the fastly service will add to tell you the actual client ip, we don't use them anymore but they're good (expensive). They actively defend this field, I spent an hour or so poking at it to see if I could add false data without success (of course if you don't protect your origins someone could falsify the data there instead).

"request_x-forwarded-for": "%{X-Forwarded-For}i", # X-forwarded-for, occasionally useful if I suspect the client was monkeying with request data, there's a similar header for the protocol the load balancer saw in AWS land

"request_x-tracer": "%{X-TRACER}i", # I used to use this pattern for when we wanted to set an arbitrary header we could trace a request with, usually by QA

"request_geo-ip": "%{GEOIP_ADDR}e", # Relevant to mod_geoip, I don't use this anymore

"request_geo-continent": "%{GEOIP_CONTINENT_CODE}e", # Relevant to mod_geoip, I don't use this anymore

"request_geo-country": "%{GEOIP_COUNTRY_CODE}e", # Relevant to mod_geoip, I don't use this anymore

"request_host": "%{Host}i", # Hostname the client sent

"request_auth-user": "%u", # Relevant if you're using basic auth, usually not relevant

"request_content-type": "%{Content-Type}i", # Content-Type the client requested, comes up if we expect request monkeying

"request_timestamp": "%t", # Request timestamp

"request_uri": "%r", # Request URI, I don't remember if this logs the get fields or not

"request_referer": "%{Referer}i", # Request referer header if there is one

"request_user-agent": "%{User-Agent}i", # Request user agent

"response_code": "%>s", # Response we got

"response_bytes": "%b", # Response size

"response_seconds": "%T", # Response time (not necessarily how long it took the client to get it) in seconds

"response_microseconds": "%D", # Same thing in microseconds

"response_content-type": "%{Content-Type}o" # Response returned content-type header, occasionally relevant if we expect monkeying

}

When this makes it's way into Splunk or ELK they will automatically parse out the header fields. Sumologic will do it if you pass a query through "| json auto nodrop"

Sumologic will handle sub structures (if you're passing an object where one of the fields is a hash object, not relevant for apache/nginx), I don't know about the others. Sumologic will intelligently handle non-json data (like the timestamps and tagging rsyslog adds).