Search Posts

Blogroll


Archives


Useful Pipes for the Lazy Man’s Job Search

February 3rd, 2010 by admin

Like 350.000 of my dear countrymen I’m currently searching for a job. This translates to: reading job-advertisements, applying for jobs, hoping for the opportunity of a job-interview, performing as good as possible in the interviews and hoping to finally get a decent job. The process is lengthy and repetitive but actually not boring. I always feel that I have learned something about myself after each job-interview. Most of the jobs which are relevant for me are posted on job-pages like monster.at and are delivered to me via email. But some of the companies (mainly the big ones) don’t post all their vacant jobs their. They have a section on their web-site (e.g. the APA career page) where they maintain their open positions. So if you want to know if any of these companies has announced a new open position on their web-site you have to check it regularly. This task is quite boring, time consuming and seldomly leads to new insights (companies don’t announce new jobs that often). In other words it’s a waste of time.

I was actually quite surprised that even though most of the big companies that maintain their own career pages use state of the art content management systems, none of them thought it could be useful for job-searchers to provide an RSS feed. RSS feeds are convenient because if you subscribe to them they keep you informed about updates of a web-site (without you having to read/access it directly). So I was thinking if it’s possible to automatically generate RSS feeds from such career-pages with very little effort (i.e. no programming). The first thing that came to my mind was to use Yahoo Pipes. To be honest, it always tempted me to do something with Yahoo Pipes, but I never had a specific problem at hand to use them for.

The Yahoo Pipes project was launched in 2007 as an effort to help internet-users adopt (manipulate) the content of web-pages to their needs without requiring them to to know or learn a programming language. Yahoo Pipes lets you pick from several input sources like a web-page or a google spreadsheet (CSV) document and run their content through several processing elements that filter and/or change the content. The processing elements of a pipe are put together (wired) using a graphical editor. Each pipe element has its special attributes that can be customized by the pipe-creator. The output of every pipe is an RSS feed which can be subscribed to using a feed aggregator like Google Reader or a web browser like Firefox.

The following picture shows a pipe that generates an RSS feed for the APA career page mentioned above. The first processing element (upper left corner) is “Fetch Page” where the URL of the career-page is specified. Further the markers are specified which tell the processing element which part of the web-page (the HTML code) to take. Also a delimiter is specified that seperates the continous input into seperated items (the open positions).

The second processing element “Regex” performs some cleanup in the HTML code of the seperated items. It deletes image tags and line feeds. Until this point each item consists just of one field “content” that contains the extracted HTML code from the web-page. The “Rename” processing element makes two copies of the “content” field (date and jobtitle) and renames the original field to joblink. Another “Regex” processing element does the main work. It extracts the date of the “date” field (for each item) and the name of the job offer as well as the link to it. All the processing elements until “Create RSS” fullfill the purpose of deleting the first and the last item of the data stream. “Create RSS” does some renaming of the fields in the items. The output of this pipe looks like this:

And if you subscribe to it using Google Reader like this:

As you can see in the picture above I also created pipes/feeds for:

So instead of having to go to all of these web-pages seperately and check if there’s an update, I just check the Google Reader which tells me this information in a fraction of the time. You’re invited to use these pipes and alter them.

Good luck with your job search!

Posted in Uncategorized, computer science, data mining, music, sports, work | No Comments »

Google Wave

October 19th, 2009 by admin


About two weeks ago I got the honor to become one of the lucky 100.000 Google Wave beta users - thanks to Johannes from Blackwhale. First it felt very charming to be one of the few who may lay their hands on Google’s latest hottest product but soon I realized that it’s hard to evaluate a communication tool like Google Wave without the people you normally communicate with. First you have to find some friends on Google Wave and then you have to come up with an artificial topic which gives the whole evaluation a small bias. But anyway. I managed to overcome these obstacles and after about one week or so I was participating in a wave with over 15 members.

So what is Wave? Well, according to Google, Wave is “Email as it would look like if it was invented today.” I’d rather say Wave is a combination of new fancy web 2.0 stuff (wikis, interactive multi user applications) and other stuff from the web that stood the test of time (Internet Relay Chat, Forums). In contrast to an email, a wave is always hosted on a server. This means that it isn’t sent to a recipient. You start a wave, write some text in it (this is called blib) and add people from your contact list to the wave. From now on these contacts can see what’s happening in the wave and contribute to it. You actually can see how another wave participant is editing the wave in an IRC like style. This is kind of annoying because very often people make typing errors and therefore you’ll also see a lot of correction activities.

The fancy web 2.0 stuff I was talking about can be seen in the following picture. It shows a blib with a youtube video in it and a poll application that lets the wave’s participants interactively decide if they liked the video or not.

The next picture shows a part of the largest wave I participated until now. As you can see it pretty much looks like a regular forum. One participant posts something and the others reply.

What differentiates Wave from a regular forum is it’s high degree of interactiveness, responsiveness and extendability. Developers can for example design robots that can be added to a wave like human contacts. These bots can perform tasks like keeping the wave clean (e.g. sweepy - deletes empty blibs) or changing the look of user’s input (e.g. emoticony - replaces textual emoticons with pictures) or act as communication partners (e.g. eliza).

All in all I like Google Wave a lot - it’s like email on steroids. If Google can improve the speed of Wave and finally release it to the public it will become a great product. I’m always looking for new “wave-friends”. So if you got an account don’t hestitate to add johannes.liegl@googlewave.com

Posted in Uncategorized | No Comments »

Italy & Information Retrieval

September 21st, 2009 by admin

Two weeks ago I attended the European Summer School for Information Retrieval (ESSIR) that took place in Padova (Italy). The ESSIR is an event which is held every two years and brings together renowned researchers from the field of information retrieval, students and young academics. Normally I’d be quite careful describing people with attributes like “renowned”, “top” or “high-class”. But when talking about people who defined / created a field of study, wrote THE books in their fields and give you a lecture about it, I guess it’s quite appropriate to use these tags.

So, the ESSIR 09 was five days of lectures about information retrieval. It started with a brief introduction to information retrieval given by C. J. van Rijsbergen. Basically he was talking about the widely known vector space model and some setups for IR experiments. For me the next interesting lecture was given by Norbert Fuhr, he talked about probabilistic IR models and highlighted their theoretical superiority in comparison to the much wider adopted vector space model. Another interesting talk was given by Stephen Robertson about evaluation in IR. He emphasized that evaluation should be taken as important as model building. An outstanding talk was given by Stefan Rueger about multimedia information retrieval. The talk was outstanding because it was informative and entertaining at the same time. Mr. Rueger has a presentation style that makes it very easy to follow, even on more complicated topics – I wish more presenters were like him. Another outstanding lecture was held by Hugo Zaragoza of Yahoo! Resarch. He talked about the relation between Machine Learning and Information Retrieval, the importance of feature selection and ranking functions. Ricardo Baeza-Yates talked about distributed web search and its difficulties. The numbers he presented to point out the size of the web were impressive. Even more impressive are the search engines and the techniques they employ to cope with the ever growing information on the web. (This is something that is so beautiful in IR. It’s one neat trick / algorithm / data structure after the other. For themselves they’re quite easy to grasp and actually wouldn’t stand out. But as a whole, in a system they work so nicely together. Get a copy of Manning’s book “Introduction to Information Retrieval” and read the first seven chapters to see the beauty.) The last talk was given by James G. Shanahan an independent machine learning consultant. He talked about advertising and advertising models on the web and how machine learning is used for targeting.

There were some other presentations too, but the ones I mentioned above were the most interesting to me. In general I’d say that the morning sessions from 9.00 – 12.00 were better than the ones in the afternoon. The sessions were separated by small coffee and a long lunch break. The snacks they offered in a tent outside the university very really tasty – but what else do you expect in Italy.

Besides listening to the lectures there was always plenty of time to get in contact with other students from all over Europe. It was interesting to see what others are working on and how similar some problems are among students. Some people I met there are working on sentiment detection too. So we talked about evaluation methods and test sets. During the whole week the weather was perfect. So we could sit outside the cafes and bars in the city of Padova until late in the night.

All in all I’d say that participating in ESSIR was a good idea. I heard some interesting talks. Got a feeling for what I already know in IR and what I should learn about it in the future. Met interesting new people from all over Europe and had a good time in the city. So I’d recommend you to go and participate in the next ESSIR!

C.J. van Rijsbergen
Norbert Fuhr
Stefan Rueger
Ricardo Baeza-Yates

Posted in computer science, conference, science | No Comments »

Pigs, Panic and Piles of Money

May 18th, 2009 by admin

No, unfortunately this is not the title of Guy Ritchie’s last movie. As everybody who turned on his radio/TV, glimpsed at the cover of any newspaper/magazine or listened to the conversation of two random people on the train/tube over the last 3 weeks knows, a new virus has entered the game of life with the power to become a full blown pandemic: H1N1 - the Swine Flu.

First spotted in Mexico, where it is suspected to have killed at least 60 people, it spread to the U.S. (supposably killing up to 5 people) and also to the rest of the world where it didn’t do such a big harm until now (we had a case here in Austria too - the patient was released from the hospital after beeing there for a week - fully recovered).

I don’t want to write about the seriousness of this new kind of influenza - I’m no doctor - I really don’t know it. But during this 3 weeks from the outbreak of H1N1 in Mexico to now I was asking myself one question over and over again: How much money are the pharmaceutical companies making now?

Two facts we should keep in our mind for the rest of the discussion:

1) Fear is no good advisor. (Ever thought about this wisdom when you were in fear?)

2) Media companies have big problems to get money for their online content and there is a declining number of people buying newspapers nowadays.

Knowing these two facts the amount of reporting and the number of articles concerning the Swine Flu, the fight against it, how you stay healthy and what you do if you have it (or think you have it), should have come as no surprise to us. It was clear that every media company (online and print) would jump the Swine Flu train to increase their sellings. Why? Well, people are in fear that they could die from it so they turn their attention on the information that is available (interesting how fast this new virus could be analysed - again, I’m no expert, but the speed with which ever new information about it was “created” impressed me). So people click on the web portals of media companies, there they find the information they need to remove their uncertainty about the seriousness of the situation (at least a little bit) + online advertisement. On the web eyeballs are money - got more eyeballs on your content and you make more money. The same is true for traditional (paper based) news media. They don’t finance themselves over their selling prices - it’s again the advertisement.

So, the first “players” who are profiting from the Swine Flu (or merely from giving some kind of information about it to their customers) are the media companies. This should be obvious if you don’t ignore the 2 facts given at the beginning. So thanks to the media we now have a lot of people with fear. They want to protect themselves from the Swine Flu, or if they get it, at least don’t want to die from it. Well, even if a country’s government knows that you cannot trust the media on everything the public is putting some presure on them to take precautions. Which in our case would be the buying of masks for a big part of the country’s population.

So the next player who’s profiting from Swine Flu would be the mask producing companies. I heard and “expert” speaking on the radio, that you would have to change the mask every 8 hours because the filter in it wears out after that time. This would mean you would need about 2 masks per day if you are a working person. I guess in Austria there are about 4 million working people. 2 x 4.000.000 x numberOfDaysPandemicLasts x priceOfOneMask. If it lasts 14 days and one mask sells for 0.5€ a government would have to pay about 56 million Euro. This estimate neglects the theory of supply and demand. If every country on this planet would order their masks at the same time the price could go up a little bit.

Now we have taken actions to prevent our population from the Flu. But what if it spreads despite our actions? Well, the good news is, that there are anti viral drugs on the market. One famous is Tamiflu by Roche and the other is Antiflu by Cipla. Cipla, a Mumbai based company, isn’t taking intellectual property that serious (for good reasons because they want to five cutting edge pharmaceuticals to underdeveloped countries) so Antiflu should be a cheaper Tamiflu. From a recent issue of the Financial Times I know that Cipla is charging 7.4 € (10$) per treatment, Roche is charging 15€ in richer and 12€ in poorer countries. These are prices for governments to fill their pandemic stockpiles not for individuals. So, in either case the pharmaceutical companies make money: if the pandemic doesn’t come they sell to the government to fill their pandemic stockpiles, if it comes the also sell but maybe to a higher price. Again, Austria: 8.000.000 inhabitants. Let’s say we calculate half the country gets ill an the Austrian government listens to the wishes/fears of its population: 4.000.000 x 15 € = 60 million Euro. Well, Austria now spent about 110 million euro to prevent itself from Swine Flu. Again, why did it spent this huge amount of money? 65 people died in Mexico and USA supposedly from Swine Flu. Hmmm. Ever thought about buying some pharmacytical stocks? They should go up if there is a strong demand in their products. So let’s have a look at Roche and Cipla stocks during the last 3 weeks. But first we should look and check when people had an increased demand on information about Swine Flu. Therefore I conducted a simple search on Google Trends. The Query was “swine flu” and the region was “Mexico”. Here’s the result:


Von Blog

So the Mexican’s search for “swine flu” started on April 23rd and reached a peak on April 26th. We’ll keep an eye on that dates. Now here’s the Cipla stock chart:

Von Blog

And here a chart of Roche’s stock:

Von Blog

In the stock charts the small grey line marks the 23rd of April. It’s interesting that the stock rises in both cases (Cipla and Roche) on that day. On the 26th both stocks reach a peak. Remember, the Google Trends chart had it’s peak on the same day. Correlation or causality? I leave this decision to the reader. In either way: I don’t want to blame the pharmaceutical companies. In the end they are the ones that develope such life-saving drugs as Tamiflu and Antiflu. They’re just clever in using the fear of people. No law forbids that. If anyone is to blame (which isn’t certain at all) it is just the unthinking readers.

Posted in data mining, finance, onLife | No Comments »

Totally on the Fru

February 17th, 2009 by admin

 

The Empyrean

 

Recently John Frusciante (the guitarist of the Red Hot Chili Peppers) released his new album “The Empyrean”. Well, to make this review short: It is great, go and buy it! There is, at least for me, no bad song on the album. And this is not common in the year 2009. Listen to “Song to the Siren” and “Dark/Light” to get a first impression.

Posted in Uncategorized | No Comments »

The iTunes Store

February 15th, 2009 by admin

Recently I found myself buying a lot of music from the itunes store (in addition to ordering CDs on a quite regular basis from amazon.com). Besides the DRM limitations of the files the shop is fine. But I’m wondering if Apple could squeeze a lot more money out of the people if they would give you the opportunity to listen to 1 full song of your choice (maybe several times) after you purchased one song. This model would totally work with me. I think the current 30 seconds “screening” is to short to get the right impression of a song. Maybe another store is doing this already, please inform me about it if you know!

 
Here’s a fine tune I purchased recently:

Posted in music, onLife | No Comments »

“The Profession of IT”

February 4th, 2009 by admin
 

… is the title of an article I stumbled upon today in one of ACM Communications last year’s issues. It explains very nicely why the number of computer science students at universities steadily declines. It does so by letting six computing professionals speak passionately about their work and also one potentional CS student. Here is a link to the online version of this article:

Look Inside >> 
August 2008

Posted in programming, science | No Comments »

New Radio Station

August 9th, 2008 by admin

 

Radio Superfly

Superfly is a Vienna based radio station that is playing black music all day long (jazz, funk, rap). I have been listening to it for one week now and I can say: It is the only alternative to FM4.

Go, check out the live-stream!

Posted in Uncategorized | 2 Comments »

ECAI 08

August 3rd, 2008 by admin

 

 

Last week I gave a talk at the CAFFEi (Computational Aspects of Affectual and Emotional Interaction) Workshop at this years European Conference for Artificial Intelligence (ECAI) that took place in Patras (Greece). The content of my talk was the paper my colleague Stefan Gindl and I submitted to the conference.
The paper contains an evaluation of different methods for sentiment detection. The evaluation was carried out on Web-based reviews from Amazon, TripAdvisor and IMDb. The range of the SD-methods we tested was rather broad. We had a couple of methods that relied on a tagged dictionary which contains sentiment words and their associated sentiment values on a scale from -1 to +1. The SD-methods took advantage of this dictionary by spotting the sentiment-words in the reviews and summing up their sentiment-values. This “summing up” was always carried out a little bit different in this so called “arithmetic SD methods”.

One method for example solely looked for adjectives which we considered the most useful parts of speech concerning sentiment detection. Another one looked out for emphasizing words like “very” or “great” directly in front of sentiment words (from the dictionary). On spotting such an emphasizing word the value of the sentiment word was increased by constant factor.

Besides these rather simple arithmetic SD-methods we also investigated the performance of some machine learning algorithms on the task of sentiment detection. We aligned the machine learning algorithms the LingPipe and OpenNLP framework provide to our task and trained them on parts of the web-based reviews we previously crawled from Amazon, TripAdvisor and IMDb. After the training we applied the SD-classifiers on the parts of the reviews that were not used for training. Our “Machine Learning” group covered the following algorithms: Language Models, Naive Bayes and Maximum Entropy Models.

Finally we also had a group of SD methods that carried out a combined approach. Befor applying the arithmetic SD methods on the reviews a subjectivity filter was used to filter out the sentences that don’t contain sentiment.

All in all we came to the conclusion that the simple arithmetic methods are not as bad as we thought but they were outperformed by the more sophisticated Maximum Entropy model (Machine Learning approach). The advantage we see in the simple methods that rely on the tagged dictionary is their applicability on the sentence level and they are more generally applicable (no training is necessary). Due to the lack of training data on the sentence level we couldn’t train our machine learning algorithms to detect the sentiment of sentences - just articles.

To overcome this shortcomings we developed a symmetric verification game (in the spirit of Luis van Ahn who created the first “Game with a purpose”) to create a training corpus on the sentence level, leveraging the vast amount of Facebook users.

Please contanct me if you are interested in or have questions on any of the topics mentioned above. Also take a look at the Sentiment Quiz on Facebook - every month we give out prizes to the best gamers.

Some Impressions of ECAI 08:

Posted in computer science, conference, data mining, science, work | No Comments »

Styrian Spring in Vienna

June 1st, 2008 by admin

The event, this post is referring to, took place in April 08. So I’m reporting rather late about it but I think you should tell when you meet a star or get close to one like I did ;) no matter when it happened.

The “Styrian Spring in Vienna” is an event that promotes Styria and its treasures to the people living in Vienna. But what are Styrian treasures? Governor Schwarzenegger? Well, he definitely is pure gold for Styria but not one of Styria’s treasures in the context of this posting. The treasures mainly are Kernöl, Styrian Wine and Apples.

To bring a little country-side feeling to the big city the Styrians put up some tents, benches and tables in front of the Vienna City hall:

This would be the City Hall


Here we’ve got the tents, benches and tables


Never forget to put a name tag on things people shall remember


Johann Lafer was just 5 meters away from our bench. He cooked for the VIPs in a little hut that was separated from the public. But we could peek at him through one of the hut’s windows:

Johann Lafer cooking for the VIPs in a hut in front of Vienna’s City Hall.



Meeting Johann was a really amazing moment of life ;)

Posted in Uncategorized | No Comments »

« Previous Entries