Quantcast
Channel: Data Mining – CCRi
Browsing latest articles
Browse All 10 View Live

Image may be NSFW.
Clik here to view.

Median Age as Predictor Variable

There is a ton of information in the TIGER Census files at the U.S. Gov Census site.  Unfortunately, it is not easily mapped to geolocations.  I had to get the tract level shapefiles and then transform...

View Article



Image may be NSFW.
Clik here to view.

Python Static Dictionaries in Nearest Neighbor Queries

A standard query on geospatial data is the nearest neighbor query, i.e. Select the five closest police stations from a given point.  The brute force approach to this problem is joining the two tables...

View Article

Image may be NSFW.
Clik here to view.

Latent Semantic Analysis in Solr using Clojure

I recently pushed a very alpha Solr plugin to GitHub that does unsupervised clustering on unstructured text documents.  The plugin is written in Clojure and utilizes the Incanter and associated...

View Article

Image may be NSFW.
Clik here to view.

Destructuring in Mathematica

A technique that I have particularly useful in Lisp-like languages like Mathematica and Clojure is destructuring. Destructuring is a mechanism for extracting parts of an expression. The Lisp “code as...

View Article

Image may be NSFW.
Clik here to view.

Stochastic Gradient Descent

Most machine learning algorithms and statistical inference techniques operate on the entire dataset.  Think of ordinary least squares regression or estimating generalized linear models.  The...

View Article


Image may be NSFW.
Clik here to view.

Data Science Meetup

CCRi was delighted to host the second meeting of the Cville Data Science group earlier this month. A full house packed our conference room, and a good time was had by all. The lineup for the talks...

View Article

Image may be NSFW.
Clik here to view.

Going beyond tabulating comentions

In this and my next post, I’ll be showing a a few quick analyses we performed using a new tool we developed, called Elias. In today’s post, we’ll see how topic modeling can be used to characterize how...

View Article

Image may be NSFW.
Clik here to view.

Which Armstrong?

In my last post, I described how we used Elias, an exploratory analysis tool for large-scale information extractions, to look at which (person,location) pairs are mentioned the most together, and then...

View Article


Image may be NSFW.
Clik here to view.

Calculating Feature Importance in Data Streams with Concept Drift using...

I had the privilege of presenting my work on “Calculating Feature Importance in Data Streams with Concept Drift using Online Random Forest” at IEEE Big 2014 in Washington, DC this last week. The...

View Article


Image may be NSFW.
Clik here to view.

Cloud Computing With Spark: Using All Your executors

Sometimes Data Scientists find themselves with a map-reduce cloud architecture and computation that needs to be done on a large scale, but the data isn’t actually cloud scale. One great way to get the...

View Article
Browsing latest articles
Browse All 10 View Live




Latest Images