






|
|
Finding Value in Chaos
The Online Community (e.g., blogs, message boards, opinion sites and other online public forums) often appears chaotic and unstructured. Umbria turns it into valuable consumer insights. Using proprietary natural language processing and machine learning techniques, Umbria is able to derive meaning out of the hundreds of thousands of conversations taking place on the Internet everyday, telling you what's being said about brands, products, people and topics of interest.
Data Collection
Multiple times per day we mine hundreds of thousands of public conversations taking place in over 60+ million blogs, (as of October 2006) on message boards and other other online public forums. We can even collect data from email suggestion boxes. This data is captured, cleansed— filtering Splogs* and other data, such as false positives, that can skew analysis results—and placed in a repository for analysis.
Analysis
Umbria uses proprietary natural language processing and machine learning techniques to analyze key attributes of both, the speech and the speaker of all of the online data sources.
By analyzing attributes of speech, we are able to discern not only the topics of most interest or concern to the Online Community, but digging deeper to illuminate the most important sub-topics along with the favorability or level of satisfaction associated with each level. For example, the majority of consumers speaking online about a new brand of jeans may be speaking in the context of colors, materials, texture and/or fit of those jeans. Further, relative to the topic of jeans and the sub-topic of color, consumers may be conversing about lower level attributes such as fading or shades of the color blue and whether or not they like or dislike these attributes. Analyzing attributes of speech enable marketers to understand the context of what is most important to their consumers.
By analyzing attributes of the speaker, we are able to segment the Online Community by age and gender as well as other demographics. We analyze many variables including words, emoticons, semantic expressions and other speaker attributes in order to ascertain the demographic characteristics of a significant portion of the blogosphere. This information is then used to shed even greater insights into whose talking and in what context they're talking about brands, products, people and topics of interest. For example, despite the negative publicity and legal troubles of Kobe Bryant, the Online Community showed that Kobe Bryant was the second most positively perceived basketball player among Generation Y males, despite all other market segments demonstrating disdain for him based on his criminal allegations. Maybe this is why Nike continues their sponsorship of Kobe.
Reporting
Once data is captured, cleansed and analyzed, deep-dive reports are produced, enabling clients to review the latest findings relative to their products, marketing and competition. Reports include demographic analysis as well as verbatims that link to actual online data sources to gain greater insight into context.
* - "Splogs" are fake or spam blogs established to "fool" search engines and drive traffic to sites offering everything from consumer electronics to pornography.
|
|
 |
|
|
|  |
| REMOVING "SPLOGS" & FALSE POSITIVES RESULTS IN HIGHER DATA INTEGRITY AND RELEVANCY |

| |
False Positives and Blog Spam—also known as "Splogs"—can significantly skew accuracy of information collected. Umbria filters out both.
Shown below are the results from a blog search done on "Apple Computer."
A False Positive is when what you're searching for appears in an irrelevant context, and looks like this:

A spam blog—or "splog"—is a fake blog created to “spoof” or fool search engines to boost search engine rankings, and might look like this:

|
|
 |
|