Folksonomies

As Clay Shirky says, and the wild success of Wikipedia, WordNET, Digg, del.ico.us, demonstrates, folksonomy is a key collaborative knowledge creation method that overcomes the fatal flaws of ontology-centered design (and even the half-measures approach of Topic Maps). Essesentially this is the OSS-over-propreitary, The Cathedral vs. The Bazaar story in another form.

But for some Artificial Intelligence approaches we still need ontologies. While PUP's focus on computer programs as source material remains a fundamental key to success, learning from folksonomy sources will also be effective.

Some folks are trying to use Wikipedia in various forms:

http://www.physorg.com/news87276588.html

http://dbpedia.org/
Extracts Wikipedia Infobox properties into RDF.

"Folksonomies - Cooperative Classification and Communication Through Shared Metadata" by Adam Mathes discusses how social tagging works. Interestingly the guy (Rashmi Sinha, uhh, turns out it is gal from India) who wrote that has a company that has created very impressive applications including SlideShare which operates over Amazon S3.

And "A cognitive analysis of tagging (or how the lower cognitive cost of tagging makes it popular)" is another article on the same topic.

One more article: "Tagging vs. Cataloging: What It's All About" by Chira Fox.

A research report on Tagging from Pew Interet & American Life Project.

"Folksonomies:Tidying up Tags?", Guy & Tonkin, January 2006, D-Lib Magazine.

But I think more processing of the sources for concept learning before trying to create general language applications is the way to go.

http://www.informatik.uni-trier.de/~ley/db/conf/icsoft/icsoft2006-2.html @inproceedings{DBLP:conf/icsoft/UnalA06, author = {Ozgul Unal and Hamideh Afsarmanesh}, title = {Using linguistic techniques for schema matching.}, booktitle = {ICSOFT (2)}, year = {2006}, pages = {115-120}, crossref = {DBLP:conf/icsoft/2006}, bibsource = {DBLP, http://dblp.uni-trier.de} }

Many related concepts

http://www.semanticweb.gr/index.php?title=Semantic_IE&oldid=1796

"Extracting Meaning From the Structure of Networks": http://tech.slashdot.org/tech/08/05/03/2013207.shtml

A taxonomy of ontologies

See "Table 1. A Typology of Data Standards" in "Introduction to Metadata", edited by Murtha Baca: http://www.getty.edu/research/conducting_research/standards/intrometadata/setting.html

A discussion with Amnon


From an email I sent to Amnon on 2006/03/06:

One thing I find lamentable is the sorry state of public (and proprietary) ontologies due to the lack of cooperation and reuse (and being incomplete and brittle don't help either ;). A thought I had today is that given the success of WordNet, that it might be a promising point of integration (especially since formal methods will probably continue to lag too far behind to change the state of affairs). Kind of like BRICO did for OpenCyc, but applied to the foundational concepts of each ontology system (rather than just synsets for vocabulary). Then each ontology could be encoded using WordNet sense keys. The reasoning systems could then import those using their NLP analyzers.

Not a free lunch, but a first pass could be mostly automated using VisualText for many ontologies with just a few parsers (KIF, OWL). Being public and in WordNet then though, the various ontology developers could be enticed do further work to make their respective encodings correct.

Ahh, Google. This paper http://acl.ldc.upenn.edu/W/W98/W98-0713.pdf ends with a comment that suggests my thought may not be new:

Lehmann 95 Lehmann, F. 1995. Combining Ontologies, Thesauri, and Standards. IJCAI95 Workshop on Basic Ontological Issues in Knowledge Sharing, August 19-20, 1995, Montreal

Can't find that paper, but in any case we all know we aren't lacking for ideas. It's the execution that's limited. Even so, maybe the ontological engineers will learn something from the lexicographers about collaboration. Maybe even make peace with the topic map folks...

Here we go. Indeed someone has already implemented this idea; LOM (lexicon-based ontology mapping) http://projects.teknowledge.com/DAML/.

--Jim White, 26-Jan-2007


And a follow-up email to Amnon on 2006/03/06:

I've been stuck a long while on how to get just basic things going on PUP, but I've kept thinking

Spent all night digging through Wikipedia, and let me tell you, it is the mother lode for information extraction. I already have a ton of ideas on multilingual analysis and ontology generation (you'll recall my earlier email with the long blurb about my first idea for using VisualText). I found a paper from some guys that extracted some WordNet entries from Wikipedia, but their flaw is attempting to statistically generate the rules. Also I can't find that there has been any further work.

http://www.ii.uam.es/~castells/publications/nldb05.pdf

Wikipedia's extensive tagging and idiomatic usage makes IE from it a slam dunk application for TAIParse and your WordNet rules. The first thing I will do is extract Named Entities as OWL (mostly because RDF is the way to go for web annotation and the Open World model is necessary, SUMO is interesting because it is mapped to WordNet, but I don't want to deal with KIF). WordNet would be a good output too, but I've seen a claim that it isn't well suited to proper nouns.

With just the extracted entities a super sexy public service can be launched that provides Semantic Web annotation for Wikipedia (via Piggy-Bank and Annotea). There are folks with plans for manual tagging, which is dandy since we'll provide a way for humans to examine the output and accept or reject it (simply by providing a different wiki tag). Naturally the entity KB will be a hot item itself.

http://www.aifb.uni-karlsruhe.de/Publikationen/showPublikation_english?publ_id=1055

http://meta.wikimedia.org/wiki/Transwiki:Wikimania05/Paper-IM1

http://www.tagpatterns.com/

But the best part is feeding the entity KB back into TAIParse to provide ultra high quality tagging of whatever text folks have. Beyond that is stuff that will take some work (web of trust and versioning), and thinking (language translation, concept/ontology analysis, et al into wild blue yonder).

Anyhow, I've got VisualText running on my PC and exploring the interface and TAIParse stuff. I've got some Wikipedia XML downloaded and the first thing I'm gonna do is whip up the WikiWord rules.

--Jim White, 26-Jan-2007