Proxy Log Generator to Load Test NiFi

When it comes to streaming, writing your own data generator is often times the easiest way forward. I’ve written a simple Python script providing me with realistically looking web proxy logs. Apart from other features (volumes, frequency, DoS attack simulation etc.), it allows to stream traffic to a remote HTTP endpoint.

This post shows how to plug the generated output into NiFi, which creates a nice setup for clickstream analytics, web server monitoring and other useful streamed data processing scenarios.

The proxy log generator is a simple Python script. It takes a range of arguments to test different scenarios. Nevertheless, the use case I want to focus on today is streaming of inbound traffic to NiFi.

Source code and full documentation is available on GitHub. To generate logs and stream them to the assumed endpoint via HTTP I start the proxy as follows:

python src/log_generator.py \
--stream 100 \
--url http://localhost:8081/contentListener

Immediately, the generated logs show up in a console, wrapped as JSON objects. Mind you, all data is made up, including links, user names and response codes.

{"auth": "kourtney.buckley", "url": "http://www.alysonhunt.com/51/865/711.png", "timestamp": "2016-12-04 20:43:21", "user_agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0", "ip": "112.170.1.143", "res_status": "200", "res_size": 1850015}
{"auth": "-", "url": "http://www.sergio-ewing.com/283/123.xml", "timestamp": "2016-12-04 20:43:21", "user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36", "ip": "148.143.230.40", "res_status": "302", "res_size": 2973508}

However, after three unsuccessful attempts to connect to the remote service the proxy gives up.

The remote service is inaccessible, terminating ..

Time to start NiFi. Once it’s up and running, I go ahead and drop a ListenHTTP processor onto the canvas.

ListenHTTP as a Web Proxy Endpoint
ListenHTTP as a Web Proxy Endpoint
Listener configuration: URL, port and a base path
Listener configuration: URL, port and a base path

As soon as the processor is running, I restart the proxy and let it stream logs to the endpoint. Once again, here is how to do it.

python src/log_generator.py \
--stream 100 \
--url http://localhost:8081/contentListener

This time, all runs just fine and the proxy logs are being captured in NiFi.

nifi_listen_http_captured_logs

That’s it for now, I hope you enjoyed this brief demo on my approach of how to set up streaming in NiFi. In the next post, I will focus on data transformation and data flow performance tuning. Thanks for reading and don’t forget to check out the source code.

Similar Posts