Hla Hla Htay's Notes


Blogging and Natural Langauge Processing (NLP)
July 14, 2008, 4:34 pm
Filed under: NLP | Tags: ,

After coming back from Kharagpur, I wanted to write this topic. I just want to share. I don’t think I am good enough to explain about that. It is better I should give some references. But right now, I am not in the mood to read further. Whatever I understand rightnow, I will write here. :D
Prince helped me correcting my writing. Thank you, Prince.
Computer technologies which can deal with blogs.


Corpus
The blogs are in fact web log, to be precise anyone can write whatever wanted to disclose to the public or a group of colleagues, to share information and inspiration. Anything you write can help for text processing. It is called corpus in computer term.(plural for corpora)
Text Categorization
The blogs are good text source for text catergorization. When the blogger posts a topic, he/she usually categorize whether the post is about song, about technology, about sport, about plant, about movie, and so on. Using these categories and text as input, give a new text and see which category belongs to. Guessing the new text is about movie? sport? plant? technology? etc.
Information Extraction
Next one is intelligently guessing the blogger age? sex (male or female) ? what is this blog about? (general? political? technology? ) and so on. How will we do? Some writes political things, some writes technology, some writes literature, some writes computers etc. As an example, if the blogger is female who will interested in flowers, plants, things that are swayed to girlish style. The input will be the blog address and guess some information about blogger.
Emotional Detection
The most intersting for me is emotional detection. It is hypothesizing whether the blogger is in good/bad mood for a given time. When the blogger is happy, post will be showed off or he/she will probably use the words like “happy, laugh, LOL ,:) , :P , :D ” etc. Different smiley are also very helpful in hypothesizing. If the blogger is feeling bad, his posts sound like crying and he/she will probably use the words like “angry, sad, :( , depress “.
Image Classification and Image Retrieval
Another application is image classification or image retrieval. I keep posting flowers’ photos and I tag the flowers’ names such as lily, rose, or jasmine, etc. I just give flowers as an example. When I upload photos, mostly are photos taken from different angles. But tag or title or caption will be same for those photos. You may not know how we could apply practically. Inputs are satellite photos of a particular areas and hypothesize a particular object is a house? a tree? a building? plain? etc. An image retrieval system is a computer system for browsing, searching and retrieving images from a large database of digital images.

Finally, it comes to my conculsion. I just want to request Myanmar bloggers to help in Myanmar Langauge Processing by writing a word, a phrase , or a sentence or a post in Myanmar Langauge or other national langauges. Myanmar Langauges are very much needed to be explored in details. Then you may ask me why you are writing in english ? Well, now, I am writing thesis. Pardon me for a while. I do like to read in Myanmar langauge. I want to practice my english writing for this moment and english is necessary to document what we find out in the research.