Python

Proxy Log Generator to Load Test NiFi

ByTomas Zezula December 4, 2016

When it comes to streaming, writing your own data generator is often times the easiest way forward. I’ve written a simple Python script providing me with realistically looking web proxy logs. Apart from other features (volumes, frequency, DoS attack simulation etc.), it allows to stream traffic to a remote HTTP endpoint.

This post shows how to plug the generated output into NiFi, which creates a nice setup for clickstream analytics, web server monitoring and other useful streamed data processing scenarios.

The proxy log generator is a simple Python script. It takes a range of arguments to test different scenarios. Nevertheless, the use case I want to focus on today is streaming of inbound traffic to NiFi.

Source code and full documentation is available on GitHub. To generate logs and stream them to the assumed endpoint via HTTP I start the proxy as follows:

python src/log_generator.py \
--stream 100 \
--url http://localhost:8081/contentListener

Immediately, the generated logs show up in a console, wrapped as JSON objects. Mind you, all data is made up, including links, user names and response codes.

{"auth": "kourtney.buckley", "url": "http://www.alysonhunt.com/51/865/711.png", "timestamp": "2016-12-04 20:43:21", "user_agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0", "ip": "112.170.1.143", "res_status": "200", "res_size": 1850015}
{"auth": "-", "url": "http://www.sergio-ewing.com/283/123.xml", "timestamp": "2016-12-04 20:43:21", "user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36", "ip": "148.143.230.40", "res_status": "302", "res_size": 2973508}

However, after three unsuccessful attempts to connect to the remote service the proxy gives up.

The remote service is inaccessible, terminating ..

Time to start NiFi. Once it’s up and running, I go ahead and drop a ListenHTTP processor onto the canvas.

Listener configuration: URL, port and a base path

As soon as the processor is running, I restart the proxy and let it stream logs to the endpoint. Once again, here is how to do it.

python src/log_generator.py \
--stream 100 \
--url http://localhost:8081/contentListener

This time, all runs just fine and the proxy logs are being captured in NiFi.

That’s it for now, I hope you enjoyed this brief demo on my approach of how to set up streaming in NiFi. In the next post, I will focus on data transformation and data flow performance tuning. Thanks for reading and don’t forget to check out the source code.

Natural Language Processing | Python

15 Natural Language Processing Libraries Worth a Try

ByTomas Zezula August 29, 2020

With the rise of machine learning, NLP has become accessible to a wider developer community. This post gives an overview of 15 libraries worth a try in 2020.

Natural Language Processing | Python

COVID-19: Rasa Bot Template

ByTomas Zezula April 24, 2020

This is a quick announcement for those of you who work with or contribute to Rasa. I have created a template for my Test Locations bot. Clone the repo and make it your own. I plan to extend the API with more countries and a more fine grained search. Get in touch if you want…

Natural Language Processing | Python

First Steps with Rasa for Busy Developers

ByTomas Zezula June 7, 2020

This post will help you get started with Rasa, an open source framework for building contextual virtual assistants. I take an opinionated stand point, sharing what I found worked best in terms of cutting on time and effort. Among other things, I’ll show you how to work around some of the limitations and use this…

Natural Language Processing | Python

4 Useful Ways to Automate Text Summarization

ByTomas Zezula August 15, 2020

Data is gold and time is of essence. Would’t it be great to get a summary of an elaborate article, latest news or a complicated piece of a legal text? Well, we aren’t quite there yet. Algorithms that generate concise information-rich summaries that make sense and are pleasant to read remain a tough challenge. Nevertheless,…

Natural Language Processing | Python

Build a Cool Face Detection App in Your Browser

ByTomas Zezula July 29, 2020

In my previous post I gave an overview of facial recognition, its advantages and ethical considerations. Today I want to show you how to put this technology in practice. We are going to build a web application that detects your face and does some basic feature analysis. Let’s start with a demo. Source code: https://bit.ly/3g39PZY…

Python

Haversine formula application in Python

ByTomas Zezula September 14, 2014

I have recently enrolled to Introduction to Data Science. One of the very first assignments was Twitter sentiment analysis performed in Python. Leaving a whole lot aside, what captured my attention was a requirement to resolve tweets’ geocoded locations without relying on 3rd party services. The assignment paper suggested to use a Python Dictionary of…

Similar Posts