Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (2024)

Splunk is committed to using inclusive and unbiased language. This blog post might contain terminology that we no longer use. For more information on our updated terminology and our stance on biased language, please visit our blog post. We appreciate your understanding as we work towards making our community more inclusive for everyone.

Why Splunking NetFlow?

Well, NetFlow provides a lens into the why of network use - What applications are being used, what is driving observed load on a network, and what are my users ultimately trying to accomplish with the network access? By combining NetFlow data with other, lower-level network and infrastructure data you can build a rich picture of network use end-to-end and cross-layer, peering into the applications and services which are being supported and are depending on the network infrastructure! This can help solve myriad use cases, ranging from capacity management, peering strategies, security, performance optimization, and more.

And NetFlow is (mostly) standard and it’s ubiquitous! Essentially every IP network device supports NetFlow. As a rich, passively generated, standard data source available out of the box from your gear, it’s just waiting to be added to the mix, providing some additional insights and goodness!

So, now that we understand the importance of NetFlow data, let’s see how Splunk can help us. Ready to take this journey? Let’s go!

Splunk Stream

First of all, let me introduce you to our sherpa in this Himalayan ascent, my good friend Splunk Stream.

The Splunk Stream app, lets you capture, filter, index, and analyze streams of network event data. The software acts as a network traffic sniffing tool. Through a simple web GUI, you can filter which protocol metadata you want to index, aggregate for statistical analysis of event data, collect NetFlow data, capture full packet streams, monitor network trends and app performance and much more!

So, we will set up our base camp by deploying theSplunk Stream app in our Splunk environment.

Establishing the Everest Base Camp: Environment Setup

A solid base camp is crucial for the success of our ascent. There are different deployment architectures available for deploying Splunk Stream both on-premises and on Splunk Cloud. We'll assume that we have a functioning Splunk on-premises distributed environment where we need to deploy Splunk Stream components.

In the following figure you can see the general architecture for Splunk Stream in a distributed Splunk deployment:

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (1)

The figure above shows a typical multilayer Splunk deployment with Search Head layer, Indexing layer and Forwarding layer adding to all of them the Splunk Stream components required to make Splunk Stream work.

Note that for the forwarding layer we can set up Splunk universal forwarders, Splunk heavy forwarders or Independent Stream Forwarder (ISF). The ISF is a standalone Stream forwarder. The ISF sends captured network data to Splunk using the HTTP event collector and does not require a Splunk universal forwarder to collect wire data. It is helpful in networks and deployments where a universal forwarder cannot be installed.

NetFlow traffic will be generated at network equipment such as routers or switches where universal forwarders are unlikely to be installed. Therefore we will use Independent Stream Forwarders as our forwarding layer for NetFlow traffic.

For the network input layer, the network equipment such as routers or switches will be directly sending NetFlow logs to the forwarding layer via UDP. In this type of equipment, universal forwarders are unlikely to be installed. Moreover, Splunk’s best practices for scaling flow ingestion include using Independent Stream Forwarders instead of Universal Forwarders. Therefore we will use Independent Stream Forwardersas our forwarding layer for NetFlow traffic. ISF will forward NetFlow traffic to the indexers via HTTP Event Collector (HEC) configured at the indexer layer.

Taking into account all the facts explained above, the specific architecture for Splunking NetFlow in a distributed Splunk environment should be similar to this:

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (2)

Now that we know specifically the architecture we need to set up, we will pick up the specific steps we need to follow from the “Install Splunk Stream in a distributed deployment general guide”.

Step 1: Install Splunk App for Stream on search heads

The sub-steps to be taken:

  1. Go to http://splunkbase.splunk.com/app/1809.
  2. Click Download. The installation package downloads to your local host.
  3. Log into Splunk Web.
  4. Go to the command line and untar the installation file to SPLUNK_HOME/etc/apps/.
  5. Restart Splunk Enterprise, if prompted. This installs the Splunk App for Stream (Splunk_app_stream) in $SPLUNK_HOME/etc/apps.

Step 2: Install the Splunk Add-on for Stream Wire Data on search heads and indexers

The sub-steps to be taken:

  1. http://splunkbase.com/app/5234.
  2. Click Download. The installation package downloads to your local host.
  3. Log into Splunk Web.
  4. Click Manage Apps > Install app from file.
  5. Upload the installer file.
  6. Restart Splunk Enterprise if prompted.

Step 3: Splunk Stream set up

Now we will configure Splunk Stream to receive data from remote machines via HTTP event collector (HEC). To do that log in to the search head and launch Stream app. You will get a prompt to configure Stream for the first time:

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (3)

We will check Collect data from other machines.

  • If you see "HTTP Event Collector streamfwd token configuration has been enabled," then the HTTP Event Collector endpoint is configured to receive data.
  • If you see "HTTP Event Collector streamfwd token configuration has been disabled," click View Configuration. This opens the HTTP Event Collector page. Click Enable for the streamfwd input to enable the HTTP Event Collector for streamfwd data input.

Then we will click on Let’s get started.

Step 4: Install your ISFs and configure them to push data to HEC

Now we will install a standalone Stream forwarder and configure it to send captured network data to the Indexing layer using the HTTP event collector. Splunk provides the binary code required to install the ISF on compatible Linux machines.

To perform the installation and configuration to push data to HEC you simply need to follow the following guide: “Install an Independent Stream Forwarder”

Cool! We have successfully settled our base camp. Now we have our environment ready to start configuring NetFlow collection!

Reaching Everest camp 1: NetFlow stream setup

Now we need to climb to the first Everest camp: camp 1. For that, we will use the compelling Splunk documentation on Using Splunk Stream to ingest NetFlow and IPFIX data extracting the specific steps we will need to take to reach fast and safe.

Step 1: Setup new NetFlow stream at Stream app

In this step, we will perform the configuration at Stream app of a new NetFlow stream that will be collected via HEC from ISF and will be indexed in netflow_index

The sub-steps to be taken:

1) Log in to the search head where the Splunk App for Stream is installed.

2) Navigate to the Splunk App for Stream, then click Configuration > Configure Streams.

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (4)

3) Click New Stream > Metadata.

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (5)

4) Enter Name as netflow_test.

5) Select NetFlow as the protocol.

6) The NetFlow option works for NetFlow, sFlow, jFlow, and IPFIX protocols.
Enter a description then click Next.

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (6)

7) Select No in the Aggregation box then click Next.

8) (Optional) Deselect any fields that do not apply to your use case then click Next.

9) (Optional) Develop filters to reduce noise from high traffic devices then click Next.

10) Select the index for this collection and click Enable then click Next. For examplenetflow_index (that you should have previously created)

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (7)

11) Select the Default group and Create_Stream.

Optionally you could set a Forwarder Group at Configuration > Distributed Forwarder Management to help you manage your NetFlow dedicated ISF.

Also optionally you could set aggregation options. Aggregated streams group events into aggregation buckets, with one bucket allocated for each unique collection of Key fields. At the end of the time interval, the app emits an object that represents each bucket.

For example, if you apply the mean and values aggregation functions to the bytes_in field over a 60 second interval and select src_ip as the only Key field, Stream aggregates the mean and values of bytes_in into a separate bucket for each src_ip seen in the selected interval.

Splunk Stream lets you apply aggregation to network data at capture-time on the collection endpoint before data is sent to indexers. You can use aggregation to enhance your data with a variety of statistics that provide additional insight into activities on your network. When you apply aggregation to a Stream, only the aggregated data is sent to indexers. Using aggregation can help you decrease both storage requirements and license usage.

Step 2: Setup new NetFlow traffic ingestion at the Independent Stream Forwarder

In this step, we will configure the Independent Stream Forwarder on your Splunk platform deployment. To ingest flow data, configure streamfwd to receive data at a specific IP address and port and specify the flow protocol. To do this, add a set of flow configuration parameters to streamfwd.conf as follows:

1) Edit local/streamfwd.conf.

2) Add the following parameters to specify the IP address to bind to, the port number to bind to, and the flow protocol.

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (8)

For example, to receive NetFlow and sFlow data at IP address 172.23.245.122 on port 9995 and 6343 respectively, configure streamfwd.conf as shown:Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (9)

3) For high volume with NetFlow data, configure additional NetFlow processing threads as shown:

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (10)

4) Save your changes.

5) Restart your Splunk platform deployment.

6) Navigate to your independent Stream Forwarder's etc/sysctl.conf directory. Adjust your kernel settings to increase buffer sizes for high-volume packet capture.

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (11)

7) Reload the settings:Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (12)

8) Restart the streamfwd service:Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (13)

To see more configuration options (i.e clustered indexers) have a look at the “Use Splunk Stream to ingest Netflow and IPFIX data”

Step 3: Setup your network devices to send NetFlow to your ISF

Finally, you will set your network devices to send NetFlow to your ISF receiver IP address at NetFlow receiver port.

If you do not have any network device ready yet or you just want to test your setup before going to production you can use a NetFlow simulator. This NetFlow simulator is really easy to set up and works perfectly well for testing purposes.

Step 4: Check NetFlow traffic is indexed in Splunk

Before going straight to the Splunk Web UI and running searches, verify the Independent Stream Forwarder’s web interface (http://ISF_IP:8089) and check that traffic is going out (Bytes Out increases over time):

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (14)

If traffic is going out, open the “Search & Reporting” App and have a look at what is being indexed in sources, by clicking in Data Summary filtering by NetFlow:

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (15)

Finally check NetFlow data is being indexed at netflow_index we previously created:

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (16)

Remember that Splunk offers a reduced-cost license to ingest your NetFlow data that allows you to ingest NetFlow sourcetype at a lower per GB cost than your normal license. If you want to read more about it and get to know other sourcetypes with reduced-cost-license available have a look at Splunk Licensed Capacity.

Credit to Raúl Marín for creating much of this content and providing a detailed step-by-step guide on deploying a single instance with an Independent Stream Forwarder: “NetFlow traffic ingestion with Splunk Stream and an Independent Stream Forwarder”

Credit to Matt Olsonfor his guidance and support in the publication of this blog series and his introduction on should anybody care about NetFlow.

What’s Next?

Awesome, we have reached Everest camp 1.

Now, imagine you want some help to quickly get value from your NetFlow data at Splunk and you want not only to be able to play with real time data but also with long term data (i.e months) and get some trends or even apply advanced analytics on that. Does it sound good to you? Do you want to reach Everest camp 2? Then do not miss the next chapter of this series!

Meanwhile...happy Splunking!

----------------------------------------------------
Thanks!
Lucas Alados

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (17)

Splunk

The world’s leading organizations trustSplunkto help keep their digital systems secure and reliable. Our software solutions and services help to prevent major issues, absorb shocks and accelerate transformation. Learnwhat Splunk doesandwhy customers choose Splunk.

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk (2024)

FAQs

Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk | Splunk? ›

Navigate to the Splunk App for Stream, then click Configuration > Configure Streams. Click New Stream > Metadata. Enter Name as INFRA_NETFLOW. Select NetFlow as the protocol.

How do I get NetFlow data into Splunk? ›

Navigate to the Splunk App for Stream, then click Configuration > Configure Streams. Click New Stream > Metadata. Enter Name as INFRA_NETFLOW. Select NetFlow as the protocol.

How do I collect NetFlow data? ›

For effective NetFlow monitoring, a device operating as a flow exporter collates data packets into flows and sends flow records to one or more NetFlow collection servers.

What is one of the challenges of NetFlow data? ›

Although NetFlow is a proprietary standard, its format is published. It has a small record size, and data it can be repeated from multiple devices along a packet's path. Removing the duplicates (records) in a distributed system can be a challenge.

How does data flow through Splunk? ›

Splunk processes data through pipelines. A pipeline is a thread, and each pipeline consists of multiple functions called processors. There is a queue between pipelines. With these pipelines and queues, index time event processing is parallelized.

How does data get into Splunk? ›

Take a look at the four main ways to get your data into Splunk platform. These include the Universal Forwarder, guided data onboarding in Splunk web, creating data inputs for TCP or UDP traffic, and the HTTP Event Collector (HEC).

How do I inject data into Splunk? ›

To add data to the Splunk platform, access the Add Data page in Splunk Web by following these steps: Log into Splunk Web, the Home page appears. Click Add Data under the Settings tab to access the Add Data page. The Add Data page does not appear if your search head is part of a search head cluster.

What is NetFlow export? ›

NetFlow is an industry standard for traffic monitoring. Cisco developed this network protocol to collect network traffic patterns and volume. One host (the NetFlow Exporter) sends information about its network flows to a different host (the NetFlow Collector).

What is NetFlow collection? ›

A NetFlow collector is part of a flow monitoring system designed to receive, process, and store IP traffic data packets from these network devices. Once the data has been properly formatted, NetFlow collectors forward the data to another application for analysis.

What type of data is not captured in NetFlow monitoring? ›

Netflow does NOT capture packets! Netflow does capture STATISTICS about flows on that device where it's configured. You define that traffic yourself and based on that it will analyze the flows, for example how much data was transmitted using HTTP or FTP or IPSEC or VoIP.

How do I verify and troubleshoot NetFlow? ›

When debugging a NetFlow statistics export problem, follow these guidelines:
  1. • Ensure the destination IP address is reachable from the VSM.
  2. • ...
  3. • Run tcpdump on the host running the NetFlow Collector to identify if the data exported from the.

What is the difference between NetFlow data and PCAP? ›

NetFlow data (or jFlow, sFlow IPFIX and other flow-based standards) provides a metadata-based view of activity on the network. Full packet capture, on the other hand, is a complete record of actual network activity, including the actual data (packet payloads that are transferred across the network.

What does NetFlow stand for? ›

Network flow monitoring protocols: A closer look. NetFlow is recognized as an Internet Engineering Task Force (IETF) standard, although “netflow” is also a general term to refer to other network flow monitoring protocols. Cisco Systems NetFlow version 9 is template-based and lets you choose which statistics to enable.

How do I ingest data into Splunk? ›

How to get data into your Splunk deployment
  1. How do you want to add data?
  2. Upload data.
  3. Monitor data.
  4. Forward data.
  5. Assign the correct source types to your data.
  6. Prepare your data for preview.
  7. Modify event processing.
  8. Modify input settings.
Sep 25, 2023

How do I push data to Splunk? ›

To add data directly to an index
  1. Use the upload method to upload a single file as an event stream for one-time indexing, which corresponds to a oneshot data input. ...
  2. Use the submit method to send an event over HTTP. ...
  3. Use the attach method to send events over a writeable socket.

Where does Splunk data get stored? ›

The events are stored in in the splunk indexers in indexes in a timestamp order. By default the retention size per index is 500GB and the time retention is 6 years. It can be changed of course depending of your needs and of your storage. If you are looking for logs for application errors (splunkd.

How do I add data source to Splunk? ›

Add file-based data sources to Splunk UBA
  1. In Splunk UBA, select Manage > Data Sources.
  2. Click New Data Source.
  3. Select the type of data source you want to add.
  4. Click Next.
  5. Enter a Name to identify the data source in Splunk UBA.
  6. Upload the file.
  7. Click OK.
Mar 4, 2021

How do I get syslog data into Splunk? ›

How to ingest Syslog Data into Splunk
  1. STEP #1: Install Syslog-ng which is pre-packaged with some versions of Linux and enable it to start at boot.
  2. STEP #2: Modify syslog-ng. ...
  3. STEP #3: Install the latest version of Splunk Universal Forwarder on the machines where syslog-ng is installed.
Feb 10, 2023

How do I get AWS data into Splunk? ›

To get AWS CloudTrail data into Splunk Cloud Platform, complete the following high-level steps:
  1. Set up your Splunk Cloud Platform environment.
  2. Configure an access policy for Splunk Access in AWS.
  3. Create a Splunk Access user.
  4. Create a group for Splunk Access Users.
  5. Enable the AWS CloudTrail Service.
Oct 12, 2022

References

Top Articles
Latest Posts
Article information

Author: Tyson Zemlak

Last Updated:

Views: 5928

Rating: 4.2 / 5 (63 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Tyson Zemlak

Birthday: 1992-03-17

Address: Apt. 662 96191 Quigley Dam, Kubview, MA 42013

Phone: +441678032891

Job: Community-Services Orchestrator

Hobby: Coffee roasting, Calligraphy, Metalworking, Fashion, Vehicle restoration, Shopping, Photography

Introduction: My name is Tyson Zemlak, I am a excited, light, sparkling, super, open, fair, magnificent person who loves writing and wants to share my knowledge and understanding with you.