CV
Work Experience
| Jan 08 - Dec 09 | Researcher, Modul University Vienna, New Media Technology Department
|
| Jun 06- Jan 08 | Software Developer, Know-Center Graz, Knowledge Extraction Department
|
| Feb 06 - Jun 06 | Freelancer
|
| Jul 00 - Jan 06 | Intern, ConfigWorks Klagenfurt, Infineon Technologies Villach, Uniquare Krumpendorf |
Education
| Sep 04 - Mar 06 | Master Studies in Computer Science at Klagenfurt University
|
||||
| Sep 01 - Sep 04 | Bachelor Studies in Computer Science at Klagenfurt University | ||||
| Sep 96 - Jul 01 | Higher Technical College Villach – Department for Electrical Data Processing and Organization
|
||||
| Sep 92 - Jul 96 | Bundesrealgymnasium St. Martin, Villach |
Professional Skills
| Programming LanguagesABAP, Bash, Assembler, C, C++, COBOL, Fortran, Groovy, Java, JavaCC, JavaScript, LISP, MS Visual Basic, Perl, PHP, Prolog, PythonWeb-TechnologiesAJAX, Apache Server, CGI, CSS, Google Analytics, Google AdSense, HTML, Java RMI, Java Server Pages, Java Servlets, MooTools, RDF, Tomcat Server, Yahoo User Interface, XML | Retrieval TechnologiesAmazon WS, Delicious API, GATE Hadoop, HBase, JLSI, JWI (WordNet), LingPipe, Lucene, Maximum Entropy Models, Support Vector Machines, OpenNLPTools, Yahoo BOSS APIMathematical SoftwareColt Library, Mathematica, MatLab, Octave, RClient ApplicationsMicrosoft Office, Microsoft Visio, VMWare | DB-TechnologiesDerbi, JPOX, HSQL, MS Access, MySQL, Oracle, PostgreSQL, Sesame Triple Store, SQLModeling and SpecificationEntity Relationship Modeling, Petri Nets, Unified Modeling Language, Z Specification LanguageOtherEclipse, SAP, LaTeX, LINUX/UNIX, @enterprise Workflow Management System, JProfiler, Ethereal |
Projects I Contributed To
RAVEN Research Project
Project length: 18 Months
The aim of the RAVEN project was to improve the accuracy of the sentiment detection methods developed in previous projects at Vienna University of Economics and Business Administration. Sentiment detection (SD) methods try to automatically determine if their input text expresses a positive or negative opinion and also the intensity of the opinion.The original SD-methods used a tagged dictionary that contained a mapping between common words (e.g. “bad, good, evilâ€) to their associated sentiment values (e.g. “good†-> “1â€). Roughly speaking, the original SD-methods only used this dictionary to look at every word in a given text and summed up all the associated sentiment values. If the sum was positive, the conclusion was that the input text has a positive charge, otherwise a negative one. The first thing that was done in the project was a comprehensive evaluation of the existing SD-methods. Therefore their classification accuracy on a number of different testing data was calculated. The outcome of the evaluation was summarized, presented at a conference and used as a baseline for further improvements on the SD-methods.
First improvements of the SD-methods were extensions of the dictionary. The lexical resource WordNet was used to find synonyms for the words in the original dictionary and inherit their sentiment values. The original sentiment dictionary was also merged with a second one created by another research group. After that, supervised learning methods like a Maximum Entropy model were used to learn a classifier from training data. Also the usefulness of part of speech information (e.g. “evil -> adjectiveâ€) in combination with the words from the sentiment dictionary was tested. Further grammatical patterns were learned from a large corpus to identify negation triggers in a better way. To generate more synonyms for the words in the tagged dictionary, we tried a couple of techniques that are able to calculate similar words (and in the best cases synonyms) from arbitrary input texts. Among the methods tried were Co-Occurrence Analysis, Latent Semantic Analysis and Pointwise Mutual Information IR.
At the time the project got reviewed for the first time from the funding agency the conclusion was that all the efforts tried so far can only boost recall and not precision. For the remaining time of the project it is planned to include topic-information into the SD-methods. Therefore machine learning techniques (e.g. Support Vector Machines) and clustering techniques (e.g. k-means) will be used.
Lessons learned:
- In Machine Learning it’s all about data preparation.
- Communication among the team members is even more important than in non-research projects because the tight schedule and the limited funding force you to get things right the first time you do them.
- Unique and interesting research questions don’t come easy. The first ideas that come to mind are commonly the ones that have been tried before by others.
APA Brockhaus News Mashup
Project length: 6 Months
The aim of this project was to enrich and supplement press articles from the Austrian Presse Agentur (APA) with bibliographic knowledge from the Brockhaus Encyclopedia. The APA maintains a search engine called PowerSearch. Search queries can be sent to PowerSearch to retrieve news articles. The main goal of the project was to build a web interface for PowerSearch that allows the user to enter his query, and this web interface automatically (or manually) expands the query with synonyms for the terms making up the query. These synonyms will come from the Brockhaus Synonym Dictionary. Hence, if the user enters “Abenteuer Tourismus†it will also be searched for Erlebnis, Robinsonade, Aventüre, Experiment, Risiko, Unterfangen, Wagnis, Eskapade and Vabanquespiel instead of Abenteuer as well as Fremdenverkehr, Reisen, Reiseverkehr, Reiseverkehrswesen and Urlaubsreiseverkehr instead of Tourismus. The main obstacles in achieving this goal was firstly the parsing that had to be done to transform the Brockhaus Synonym Dictionary from its XML representation into a relational database, and secondly the cumbersome data cleansing in the relational database.
The second goal of the project was to search for persons and locations in news articles. As we came to the conclusion that the GATE (General Architecture for Text Engineering) framework was too bloated to fulfill this task in reasonable time we decided to use a statistical named entity recognizer to perform that search. We used the Stanford Named Entity Recognizer which is based on Conditional Random Fields (like Maximum Entropy Models, but the concept is more general). The Stanford NER is state of the art but unfortunately trained on English texts. Due to the lack of German training data for NER we could only use it to spot locations. We were lucky that the goal was altered by the APA, so the press articles were annotated with persons from PowerSearch. After we knew the locations and persons in the press articles, the rest of the whole task was simple. The relevant articles had to be looked up in the Brockhaus Encyclopedia and presented to the user.
Lessons learned:
- Gained experience in writing XML parsers
- Data Cleansing is cumbersome but worth spending time
- Statistical Named Entity Recognition is interesting
- Maximum Entropy Models are versatile
- There is a lack of German training data for NER
LexiScout
Project length: 12 Months
The outcome of this project was a browser extension called LexiScout that makes it possible to search in selected Brockhaus data sources (online and offline data sources). LexiScout was integrated into the Google search page. Simultaneously to every Google Search a LexiScout Search was conducted and its results replaced (or augmented) the Google Search Ads. LexiScout went through the software development cycle for an entire year. The initial aims were successfully achieved and LexiScout went into public beta. Unfortunately, the project was terminated in the public beta phase for non-technical reasons.
In terms of innovation, LexiScout was a real Web 2.0 tool and could have been a top seller for Brockhaus. Leveraging the popularity of Google by hooking into the page with a browser extension demanded a lot of technical research. To my knowledge we developed the first extension for the Internet Explorer that could inject custom JavaScript into any page. This ability gives you the ability to alter every page and use it for your purposes. From a computer security view the use of this ability is problematic: Injecting arbitrary JavaScript enables you to do a lot of bad things like Cross scripting, a technique which is actually prohibited by IE and Firefox.
Lessons learned:
- Trust your initial concerns about a project and bring the project managers to think about them.
- Details make work cumbersome and dull.
- Gained very good JavaScript skills and improved the Java, Database and XML skills.
- Never provoke a big player like Google
Email Information Extraction, Visualization & Processing
Project length: 4 Months
The aim of this project was to extract specific information from certain emails received in Outlook, store this information in a relational database, present the extracted information to the user in a GUI and allow him to take action on and with the data. The whole project was carried out using common software engineering techniques. First, the customer’s requirements were patiently elicited and a requirements specification was written. After confirming the requirements by the customer the system was modeled using UML use case diagrams, static class diagrams, ER Diagrams and some drawings of the forms that would make up the GUI. After discussing the diagrams with the customer the system was implemented and a handbook was written.
The emails contained information about cars that people wanted to sell on the internet. The information to be extracted was product details like horse power, make or color and user details like his address. Extracting this information wasn’t so hard and could be accomplished with a descent set of regular expressions. Surprisingly the hardest part was “freeing†the emails from Outlook with Java and importing their extracted information into a DB. The rest of the project was straight forward. The GUI had to be developed with Java Swing and aligned to the wishes of the customer. This was achieved via rapid prototyping. Among the actions the user could perform on the data was the possibility to send batch notifications to a selected group of car-sellers. This was also little bit tricky because instead of email addresses certain URLs had to be used via GET and POST requests. As there was no public API specification for the URLs their semantic had to be re-engineered by using the Packet Sniffer Ethereal and some Firefox plugins. During and after the project the semantic of these URLs changed a couple of times and so the software had to be adapted.
Lessons learned:
- Systems behave different under different circumstances – especially under heavy load.
- When there is no public API and Information Extraction methods are used their adaption to changing information and/or layout can be an ongoing.
- Though quality is defined by the customer you have to define in advance what the limits of the system will be.
Master Thesis
Project length: 10 Months
I wrote my master thesis for the same company I did my last internship with – ConfigWorks. The company’s products are recommender systems for a broad spectrum of products (digital cameras, expensive wines, cigarettes, financial products like insurances). Over the years they developed a framework (the ConfigWorks Advisor Suite) that allows them to build new recommender applications for new products and new customers very quickly. The principle of every generated recommender application is barely the same: They ask domain experts which questions they apply to get an understanding of the customers’ requirements. This scenario is comparable to a shop where the sales person asks you: “What do you want to buy?â€, “How many Megapixels should the camera have?â€, “Should the camera be very compact?â€. The salesperson drives you into a streamlined selling dialogue. This dialogue is converted into the recommender application and the final outcome is a set of filter rules. From all the possible products a customer could buy, these filters choose just the products that fit the customer’s requirements.
Normally, this whole process leads to very good recommendations, but situations can occur in which a small subset of the set of filters is contradictory. Contradictory means that two filters cannot be satisfied at the same time which translates to: No products are recommended to the customer. This is a very frustrating experience for the customer. He went through a probably long recommendation dialogue and just because 2 filters are contradictory no products were recommended to him. The goal of my master thesis was to handle this problem by relaxing the set of filters (removing a subset of filters from the initial contradictory set) in a way that leads to product recommendations that are acceptable for the customer satisfying as many of his requirements as possible.
My approach was to view the whole problem as a diagnostic problem and map it to model based diagnosis techniques which have their roots in Artificial Intelligence. I ended up implementing Reiter’s algorithm for finding minimal diagnoses in combination with Junker’s algorithm for finding minimal sets of conflicts. The output of the whole work was not only my master thesis but also a paper which was accepted at the IEA/AIE Conference 2006 and also received a Best Paper Award on spot.
Lessons learned:
- Massive accessory does not always lead to the best results
Professional and Research Interests
Data Mining, Information Retrieval, Machine Learning, Opinion Mining, Natural Language Processing, Named Entity Recognition, Decision Support Systems, Mobile Devices like the HTC Hero and its open operating system Android, Recommender Systems, Geo-based services, Web Technologies and Web-based Systems, Software Engineering, Data Modeling.
References
Dr. Michael Granitzer (Know-Center, Graz)Prof. Dietmar Jannach (TU Dortmund)
Publications
Gindl S., Liegl J., Scharl, A. and Weichselbraun, A. (2009): “An Evaluation Framework and Adaptive Architecture for Automated Sentiment Detection“, Networked Knowledge – Networked Media (Springer Studies in Computational Intelligence). Eds. S. Schaffert, K. Tochtermann and T. Pellegrini. Berlin: Springer.
Gindl S. and Liegl J. (2008): “Evaluation of Different Sentiment Detection Methods for Polarity Classification on Web-Based Reviews“, 18th European Conference on Artificial Intelligence (ECAI-2008), ECAI Workshop on Computational Aspects of Affectual and Emotional Interaction, Patras, Greece.
Jannach D., Liegl J.: Conflict-Directed Relaxation of Constraints in Content-Based Recommender Systems, Proceedings of the 19th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE”06), Annecy, France, 2006.
Timmerer C., Kofler I., Liegl J., and Hellwagner H., An Evaluation of Existing Metadata Compression and Encoding Technologies for MPEG-21 applications, Proceedings of the First IEEE International Workshop on Multimedia Information Processing and Retrieval (IEEE-MIPR 2005), Irvine, California, USA, December 2005, pp. 534-539.
