Monday 24 March 2014

Lesson 6: How categorization affects the quality of user-generated information in different social media platforms

The rise of different social media platforms has been an important factor in the massive increases of information available on the web. The growth rate of information has been so fast, that according to Google’s CEO Eric Schmidt, humanity produced during 2010 in just two days the equal amount of information that was produced from the beginning of civilization up until 2003. [1] In times of such information overflow, the platforms that are able to give end users the most relevant high quality information, often end up enjoying massive success. One of the key factors in collecting and serving information of high quality and relevance is the efficient categorization of the content. This article looks at some popular online platforms that are heavily rooted in user-generated content and analyzes how they use categorization in order to serve their users with high quality information.

What qualifies as good information
When discussing quality information, it is useful to first define what we mean by the term. In other words, how can we judge if the quality of a piece of content is good or bad? It’s worth mentioning that several different systems, such as the dimensions of information quality by Wang and Strong [2], have been developed in order to classify information quality as objectively as possible. For the purposes of this article, we will break it down into two main areas: factual correctness and relevance to the user. Both criteria needs to be fulfilled in order for the information to have the maximum amount of value for the consumer. A piece of information can be factually correct, but if it’s not in any way relevant to the current needs of the end user, it can’t be considered valuable at that moment in time. To make an example, let’s say the user is searching for information on whether a certain type of expense is tax deductible or not. If his attempts to find this information produce a factually correct description of baking a cake, the information is clearly not relevant and thus should be considered being of poor quality in the context of the situation. Likewise, having the presented information fit the current needs of the user, making it relevant in that sense, doesn't help if it’s incorrect factually. To continue with our previous example, the user might find a perfectly relevant statement about the tax deductibility of the expense in question, but if it was written by a person with an incorrect or incomplete understanding of the issue, leading to it being factually incorrect, we should consider it to be of low quality.

Why categorization matters
Being well organized is one of the key factors when you need to find things among a large pool of options. As a practical example, think of a typical supermarket. These places contain tens of thousands of items and the customer gets quickly frustrated, if finding specific items proves to be challenging and time consuming. To improve the shopping experience, the shop owners have carefully analyzed and categorized each item, in order to build a logical structure for the placement of these items. We can only imagine how chaotic the attempt to find the desired products would be without this categorization. Finding relevant information is in many ways similar to finding the right products in a shop. Well structured categories can make a big difference.

Well made categorization inside user-generated content platforms can have the following benefits:

Locating the information
Categorization makes specific information easier to find. Searching for information nowadays is often keyword-based because of well performing search engines like Google. While this can reduce the necessity of categorizing content, it doesn't fully remove the usefulness of categorizing. First of all search engines themselves value clearly divided sections of content: your website is more likely to perform well SEO-wise if the targeted keywords and pages are surrounded by large amounts of relevant information. Secondly, categories can be helpful in situations in which the user doesn’t have a clear picture of the terminology of the subject and thus doesn't know the proper keywords that would result in high quality search results.

Information collections
Categorization makes it possible for users to browse specialized collections of information relevant to their needs. Think of a traditional library: because of well structured categorization of books, not only can you easily find the book you need, but it also gives you an opportunity to look at other books on the same subject since they are in the proximity of your target book. In the context of social media platforms, you can get a good picture of a specific topic by looking at the previous discussions related to that same topic. So let’s say you are studying the physics engine of a some game creation software. Having a forum category focused to that specific topic lets you see the questions other users have asked about it in the past. That’s helpful in giving you a good overall picture about the possible issues you too might encounter.

Focused experts
Categorization can let people focus on their area of expertise in the interactions between other users. Division of labor has been an important factor in the creation of efficient societies. What it essentially means is that people are able to focus on their particular subset of skills and become highly specialized in those areas. This allows the development of experts with deep knowledge of their specific craft. Social media platforms can benefit from this same division by creating well structured categories for user content. To use our previous example of an accounting forum, one user can focus solely on discussions related to foreign trade situations. Without categories these topics would be much harder to find making valuable contributions more difficult.

Avoiding duplicate content
Categories can help in the fight against duplicate content. There are many reasons why duplicate content is usually considered a bad practice in online platforms. One problem is that the people answering questions quickly become frustrated when the same thing gets asked over and over again, even when they have already given a comprehensive answer to it in a previous discussion. Avoiding duplicate content is also helpful in situations in which the information needs to be updated to reflect changing circumstances. To continue with our accounting forum example, it’s easier to update the fact that the Finnish tax code has now completely removed the tax deductibility of the so called representational expenses, if the discussion related to them is focused to one or two topics instead of tens of similar discussion threads.

Differences in categorization between popular social media platforms
Different social media platforms approach the categorization of information in different ways. The categorization rules and methods inherent in these systems can produce dramatic differences in information quality and relevance. In the following chapters we study some of the most popular social media platforms and their handling of categorization.  

Facebook
Let’s begin by looking at what Facebook does. In Facebook the categorization of information is heavily connected to individual people. In other words, the typical way we sort information in Facebook is by the person who has produced it. Users typically find content, not based on the topic of the content, but by the authors of it; Facebook might insert my friends writings into my feed regardless of the topic he has written about. This can give us good information about the social lives of our friends, but it is rarely useful for searching other types of information. The chances are you don’t go to Facebook when you need to search for things unrelated to you social relationships. Facebook has made attempts to gain ground in other types of discussions as well by allowing users to form groups around topics of common interests. While these groups help their case, they don’t really compete in functionality with more advanced discussion platforms. One of the big problems with these Facebook groups has been the absence of subcategories, which means that the discussions can be grouped only by the main topic of the group. This causes even useful conversations to eventually end up buried deep down in the page structure.

Twitter
Categorization in Twitter is similar to Facebook in the way it usually revolves more around interesting individuals than topics of discussion. The difference is that in Facebook you only get the content of your friends (and perhaps what’s written inside the groups you participate in). Twitter let’s you follow almost any individual who is of interest to you, which essentially widens the pool of potential quality content. The users of Twitter developed the famous hashtag-system to help with better categorization of content, but there are several problems with this categorization method; the ability for anyone to use any hashtag regardless of the actual content of their tweet and the absence of proper subcategorization just to name a few of them. Thus it’s probably a safe bet that your first choice for investigating a subject would not be going to Twitter and searching for the information by using hashtags.

Google+
Google+ has it’s unique approach for grouping discussions based on the feature called “circles”. Circles allow users to categorize different people into groups like friends, family or colleagues and then limit the distribution of content to the suitable circles. Google+ as a whole has had a rather slow start, perhaps because regular users struggle to see why they should switch over to it from other social media services, like Facebook and Twitter. It’s worth noting, that besides Google+, Google also has a product called Google Groups, which resembles in many ways message boards discussed in the chapter below.

Internet forums
Internet forums, often also called message boards, usually excel in categorizing content. While there are many different software engines for the creation of these forums, two of the big ones are PhpBB and Simple Machines Forums. The websites run by these engines are typically divided into several categories and subcategories which helps making information more organized. For example, a forum focused on accounting might be divided into main categories based on different company types ranging from publicly traded companies to sole proprietors. Each of these categories might then have further subcategories for more specific topics like travel expenses, salaries and so on. These types of forums usually enjoy the categorization benefits discussed on page 2 of this article. There are some weaknesses even with this model: it usually relies on the users ability to select a suitable category for their content and, for the frustration of the moderators, users are notoriously bad at this, leading to systematic categorization failures that need to be manually corrected by the moderators. Forum administrators can try to minimize this problem by making sure the categories are logical and easy to understand. A common mistake is to create a “general discussion” category, because many users will simply assign their content to that without giving the other categories too much thought.

Vote-based platforms
The final type of social media platform we discuss from the perspective of categorization is vote-based boards like Reddit and Stack Exchange -sites. What separates these from the other examples can be summarized as their ability to utilize regular users as the moderators. For example, if somebody posts a topic with low information quality on Stack Overflow, respected members of the community will shut it down quickly with enough votes. Users with a large amount of reputation points have higher privileges and this works so well that it dramatically reduces the need for moderators. The actual method of categorization varies between different vote based systems; Reddit has thousands of subcategories called subreddits while the Stack Exchange -sites categorize content based on tags (sometimes called labels elsewhere). Thus the main strength of these platforms does not lie in the exact structure of categorization but in rather in the clever enforcement mechanics of it. People who inject low quality information to these sites get downvoted immediately by others, which makes their content less likely to be seen by others. This system, while not perfect, seems to generally speaking produce very coherent information, evident by the fact that search engines often place Stack Exchange hits among the top results for searches.

What the future might hold
Google became one of the largest companies in the world, because it helped users find relevant information among the massive amounts of online content. Since then the highest valuations have typically gone to companies like Facebook, Twitter, Linked-In, Youtube, Pinterest and Instagram. The one thing in common between all these companies is that they rely heavily on user generated content. By looking at the past, we can make a prediction that those companies that are able to get their users to create high quality content, and let others easily find that content when it’s relevant to their needs, will become successful. Quickly improving artificial intelligence systems will likely help these companies to serve our information needs even better. There are many possibilities. Think of information that is automatically fact checked by intelligent software. Technologies like machine sight will let computers classify visual information in a way previously impossible to automate. Semantic technologies will help computers gain a better understanding of languages, which might let users ask them actual questions instead of simply providing keywords for search. Semantic analysis of content will also help software systems in recognizing and penalizing poorly written content. As the amount of online information keeps growing almost exponentially, there is likely to be large demand for those companies and products, that can analyze it, categorize it and finally select you the very best pieces of it.





Sources:

[1]

[2]

No comments:

Post a Comment