Visualizing Code

In my reverse engineering course, we would occasionally watch videos, one of which was a Ted Talk by Chris Domas titled “The 1s and 0s behind Cyber Warfare.”  I strongly recommend you watch the video, because Chris Domas is an interesting guy and and a great speaker, but if you absolutely can’t spare the time, here is the TL;DR version: the future of reverse engineering and cybersecurity rests upon converting binary code into colorful pictures that can be easily identified by humans.

Chris Domas describes a situation where he was looking at a huge amount of binary code and trying to make sense of what it does.  He explains that it is difficult to find patterns because the human brain is just not wired to work in binary.  Ultimately, he realizes that he has spent several hours sifting through the binary of…a picture of a cat.

How tragic!  How frustrating!  But also, hopefully, hyperbole.  I sincerely hope that Chris was not pouring over binary (which he displays as a block of 1s and 0s on the screen).  There are many tools at the disposal of a reverse engineer, and staring at binary is generally not (dare I say, never)  the best option. For example, a simple Unix “file” command might have been sufficient to determine the type the file if it was entirely in one block (which is possible, albeit rare, depending on how intact the recovered binary was).  If Chris had some idea that this binary was, in fact, code, then he might pull the file into IDA Pro, which easily converts to assembly commands and even offers some powerful pattern recognition tools of its own.  … Read the rest

Analyzing Sony and Staples Breaches with Sentiment Analysis

While I am admittedly a newcomer to the field of sentiment analysis, I was interested in seeing how the breaches at Sony and Staples affected the public perception of the companies.  One of the language processing engines I used was called Semantria, created by a company called Lexalytics, which specializes in language processing API’s.  So, I fed Semantria some twitter posts associated with different companies to see how they compared.  I wanted to see if the breach had any impact on the results for Sony and Staples relative to their competitors.

Let’s start with the end result and work towards it from the beginning:


On the x-axis, we have the names of some companies which I selected as “competitors” for Sony and Staples, as well as Christmas, because, well, why not.

So indeed, the biggest losers in sentiment score were Staples and Sony.  Now I’d like to add a grain or two of salt to these results for various reasons.  My selection of competitors was somewhat arbitrary based on my limited knowledge of the industries which Sony and Staples compete in.  Additionally, what we really want is to see the change in Sony and Staples sentiment scores over time. Of course, this is tough to do considering that I had no preconceived notion of which companies were going to be hacked.

I would also like to note that “Officemax” has a higher sentiment score than “Christmas.” This may seem surprising at first, but there is a very logical reason for this.  … Read the rest

Exploring Big Data Through the Twitterverse

On episode 13 of the podcast, we explored how to start working with the Twitter Streaming API to feed your development sandbox with some interesting data about our society – tweets!

To get you started, I’ve provided a start-up kit to accelerate your development or help you learn how to rebuild the kit yourself. Just download the ZIP and import it into Eclipse or your Java IDE of choice to get to run-time.

Twitterverse Streaming API Starter Kit

Once you’ve opened up the starter kit, navigate to the source file to see where main executes. You will need to enter the correct authentication data to Twitter in order to start the program successfully. If you haven’t already done so, head over to Twitter to register a developer API app so that you can authenticate with their API resources.

Using the Code

At the start of the driver, we create a new TweetMiner class. This was just a class structure and naming convention I used to encapsulate the authentication references in the twitter4j package that I use to work with Twitter. Nothing of note here, but you might consider making a better class name than I did and expanding it as you see fit.

TweetMiner tm = new TweetMiner("YOUR_CONSUMER_KEY",


From here, we create a Twitter object using our TweetMiner class’ instance of our authenticated Twitter instance.

Twitter twitter = tm.getTwitterInstance();

From here, we invoke the StatusListener object for the Twitter Streaming API. It is the OnStatus method that we have overloaded with our own custom code to process tweets how we wish.… Read the rest

Getting started with distributed HBase and Zookeeper

If you have made it through setting up Hadoop, you are fortunate enough to have created the basis for many other useful applications, two of which are Zookeeper and HBase.  This post will outline how to set those two applications up and give a brief explanation of why you would want them in the first place.

In the beginning, there was HDFS…

Hadoop provides an awesome resource in terms of distributed file systems.  The convenience of using commodity hardware to redundantly store and easily access data is a huge advantage when working with large datasets.  But one of the Big Problems of Big Data is that many times it lacks structure.  For example, let’s say have 100 gigabytes of temperature data stored from the last 10 years at 30 second intervals.  What if I want to look at the range of values on Tuesdays from 8:00 A.M. to 10:00 P.M. starting in 2010 and ending in 2012?  How easy this is depends heavily on how the data is organized.  If it is stored in a convenient database with logically named fields, then that data could be a single simple command away.  If the data is stored in CSV…things get more complicated.  You probably would need to create a pig script or even hadoop job to retrieve the data you are interested in, which can be a time consuming and challenging task.

HBase is a NoSQL database which will hopefully allow you to access data like in the first of the cases described above. … Read the rest

Installing Hadoop 2.5.1 On CentOS 7

This post is a continuation of our previous article on creating a self-contained, multi-node big data sandbox. In this part, we’ll provide some cliff notes for getting Hadoop 2.5.1 installed on CentOS 7 for the development environment previously documented.

1. Setup Passwordless-SSH for service user Hadoop

On all nodes, run:

useradd hadoop
passwd hadoop
su - hadoop
ssh-keygen -t rsa
cat ~/.ssh/ >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh-copy-id -i ~/.ssh/ hadoop@data2
ssh-copy-id -i ~/.ssh/ hadoop@data3

varying source and destination servers accordingly.

2. Install Sun Oracle Java

I’ve seen strange things happen with Hadoop using the native OpenJDK platform, so I always advise folks to use the Oracle JDK instead.

tar -zxvz jdk-8u11-linux-64.tar.gz /opt/jdk1.9.0_11
/usr/sbin/alternatives --install /opt/jdk1.8.0_11/bin/java
alternatives --config
java -version
export JAVA_HOME=/opt/jdk1.8.0_11/
export PATH=$JAVA_HOME/bin:$PATH

 3. Add Host Entries

nano /etc/hosts

[write the following entries, for example] router master data2 data3

4. Download Hadoop 2.5.1 and Compile From Source

Pre-requisites: Protobuf 2.5.0 (this exact version) and maven. You can compile protobuf 2.5.0 from source here

yum install cmake
yum install kernel-headers gcc-c++

mkdir /downloads
tar -xf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
make install

sudo yum install ant
sudo yum install maven

tar -xf hadoop-2.5.1-src.tar.gz
cd hadoop-2.5.1-src
mvn package -Pdist -Pdoc -Psrc -Dtar -DskipTests
cd hadoop-dist/target

At this point, you will have a compiled and installable setup. Move your target folder contents  to a directory like /home/hadoop/hdfs (your home directory where the hadoop service account will have permissions to execute these files). Verify that the folder you transferred is owned by hadoop, and if not run chown -R hadoop:hadoop /home/hadoop Once moved, use scp to transfer the install directory to the same location on the data nodes, i.e.… Read the rest

Creating a Self-Contained Big Data Development Sandbox

Setting up a development and testing platform to explore newer big data technologies such as Apache Spark and Hadoop 2.5.X in a cost effective manner is not necessarily a trivial task – there are a lot of different ways to do so effectively and each way has its own advantages and disadvantages. While some may argue its just as easy to spin up a few virtual machines on AWS or use AWS’ Elastic Map-Reduce, I wanted to have a self-contained platform to run a multi-node sandbox all on one condensed piece of hardware in a cost effective manner. In this article, I’ll walk you through my hardware choices for the cluster, the virtualization technology used to create the sandbox, and how to install Hadoop 2.5.X to get started on the platform.

1. The Hardware

One of my design requirements was having the entire sandbox self-contained on a single system that had strong performance characteristics and could support a multi-node distributed environment (virtually) for relatively little out-of-pocket money. From time to time as data centers clean out older racks they put up rack servers on eBay for more or less dirt cheap. In my case, I picked up a Dell CS24-SC with dual socket Xeon Quadcores, 16GB of RAM, 4 10K SAS drives, and 2x1gbps Ethernet NICS for a $200 dollar price tag – not bad. This 1U server was all I needed to run a decent hypervisor and get a sandbox online that I could use to evaluate the newer software in the big data ecosystem.… Read the rest

Setting up Spark and Hortonworks Sandbox

In this post we cover how to get up and running with hortonworks sandbox, which is a simple way to get access to a bunch of cool open source projects currently hosted by apache, including HDFS.  Then, using Hortonworks Sandbox VM for HDFS, we show how to use Spark to calculate the value of pi.  In fact, this two part tutorial is separate enough that the first step can be skipped if you are only interested in using Spark (although, obviously, if you want to use HDFS with spark you will need it to be hosted elsewhere).

Hortonworks Sandbox

Perhaps the easiest way to get started with the newest techonologies from HortonWorks is from the HortonWorks Sandbox.  It can be run in the HyperV (Windows), VMWare Fusion (Mac) or VMWare Player 5.0 + (Windows), or VirtualBox (Free for Windows and Mac).

This basic tutorial will get you running with the all of the great options available from the HortonWorks Sandbox, which includes Hadoop 2.4, Apache Hive, Apache HBase, Apache Pig, Apache Storm, and many other options (full list available at the Hortonworks Sandbox Overview Page)

In addition to the pre-installed packages listed above, this tutorial will also give simple instructions on how to install Spark on top of HDFS.  Let’s get started!

Installing Hortonworks Sandbox

The documentation for installing the sandbox is excellent, and can be found along with the downloads at the Hortonworks Downloads Page.  Follow along by reading the instructions for your operating system (which assumes Mac or Windows). … Read the rest

Cyber Frontier Labs Launched

CFLCyber Frontier Labs has been created in partnership with the Cyber Frontier’s Podcast on The Average Guy Network – where we talk cybersecurity, big data, and the technologies shaping the future through an academic perspective.

With the launch of this lab, we will be focusing on showcasing projects that utilize the new technologies that are at the center of these key areas. From learning about the Internet of Things to understanding how to effectively use Machine Learning libraries – the projects discussed in our labs will converge many of the frontier topics that are advancing the Computer Science industry and research community.

We will announce our first major project shortly! If you are looking for content in the meanwhile, consider checking out the Cyber Frontiers podcast.… Read the rest