02
Jun 07

Classifier plus Tagger equals ClassiTagger

Path: mv.asterisco.pt!mvalente
From: mvale…@ruido-visual.pt (Mario Valente)
Newsgroups: mv
Subject: Classifier+Tagger=ClassiTagger
Date: Sat, 02 Jun 07 21:47:21 GMT

Imagine that I have a /home dir full of documents of
several different types (DOC, PDF, PPT, TXT, etc) and
although I have some specific folders for specific stuff
most of these documents need to be classified (or tagged)
into specific folders (or not, if you use tags).

Although I have scoured the Internet for some utility
or app that did it for me automatically (its *boooring*
going through each file, open it, analyze content and
decide on which folder it should go), I have found
nothing of the kind. No, I dont want some app that indexes
the contents and lets me search: I want an app that
looks through the files and moves them to specific folders
(created on demand by the app itself); or just tags them
according to the content.

Anyone knows of something like that? Suggestions to
mvalente@ruido-visual.pt and/or mfvalente@gmail.com.

— MV

PS – I’ve even gotten so desperate as to start hacking a
script in Python to do it for me; below follows the code
for the ClassiTagger….

88
from operator import itemgetter

MINFREQUENCY=5
MAXNGRAMTAGS=10

filename=’texto.txt’

stopwords={}
liststopwords=(‘the’,’a’,’an’,’and’,’or’,’not’,’if’,’then’,’else’,’i’,’you’,’he’,’she’,’we’,’them’,’us’,’to’,\
‘your’,’of’,’off’,’is’,’in’,’on’,’for’,’that’,’this’,’can’,’have’,’are’,’it’,’be’,’at’,\
‘with’,’will’,’use’,’do’,’see’,’as’,’which’,’from’,’by’,’should’,’into’,’some’,’these’,\
‘when’,’what’,’but’,’other’,’may’,’all’,’has’,’my’,’out’,’make’,’sure’,’like’,’get’,\
‘so’,’one’,’how’,’when’,’after’,’before’,’*’,’+’,’about’,’any’,’look’,’no’,’yes’,\
‘where’,’who’,’there’,’here’,’same’,’dont’,’more’,’than’,’also’,’up’,’down’,’must’,’yet’,’many’,’why’\
‘was’,’is’,’his’,’her’,\
“don’t”,”doesn’t”,”you’ll”,”it’s”)
for word in liststopwords:
stopwords[word]=1

print “CLASSITAGGER”
print “Classifying/Tagging file”,filename,”\n”

# Read file
inFile = file(filename, ‘r’)
content = inFile.read()
inFile.close()

#Split by words
words = content.split()

#Extract N-grams
tags={}

tags[(words[0].lower(),)]=1

tags[(words[1].lower(),)]=1
tags[(words[0].lower(),words[1].lower())]=1

i=2
while i =MINFREQUENCY and not (stopwords.has_key(ngram[0]) or stopwords.has_key(ngram[1]) or stopwords.has_key(ngram[2])):
print len(ngram), ngram, count
maxngramtags=maxngramtags+1
if maxngramtags==MAXNGRAMTAGS: break

maxngramtags=0

for ngram,count in sorted(tags.items(), key=itemgetter(1), reverse=True):
if len(ngram)==2 and count>=MINFREQUENCY and not (stopwords.has_key(ngram[0]) or stopwords.has_key(ngram[1])):
print len(ngram), ngram, count
maxngramtags=maxngramtags+1
if maxngramtags==MAXNGRAMTAGS: break

maxngramtags=0

for ngram,count in sorted(tags.items(), key=itemgetter(1), reverse=True):
if len(ngram)==1 and count>=MINFREQUENCY and not stopwords.has_key(ngram[0]):
print len(ngram), ngram, count
maxngramtags=maxngramtags+1
if maxngramtags==MAXNGRAMTAGS: break

88


26
May 07

Asterisco.PT

Path: mv.asterisco.pt!mvalente
From: mvale…@ruido-visual.pt (Mario Valente)
Newsgroups: mv
Subject: Asterisco.PT
Date: Sat, 26 May 07 03:12:21 GMT

(This one goes off in Portuguese…)

Depois de varios anos com o registo do dominio *.pt
(le-se asterisco.pt :-), decidi-me a dar um uso a coisa.
E uma forma de “dar ao dedo” outra vez em Zope, DTML,
Python e HTML/CSS

Vai servir para todas as semanas publicar um conjunto
de links relativos a tecnologia, media e telecomunicacoes
em Portugal. Aqueles que eu pessoalmente ache importantes.

Assim, numero 0: http://www.asterisco.pt/

O layout “sucka” em Internet Explorer, mas: I couldnt
care less…

— MV


26
May 07

E Justice

Path: mv.asterisco.pt!mvalente
From: mvale…@ruido-visual.pt (Mario Valente)
Newsgroups: mv
Subject: E Justice
Date: Sat, 22 May 07 00:08:21 GMT

Just got back from Brussels, after a meeting about
E-Justice in the European Council. Let me tell you,
the jetset lifestyle is highly overrated… planes
suck, airports suck, hotels suck, being alone in
Brussels sucks… On the other hand, Mort Subite,
Duvel, Chimay, Grimbergen and Delirium Tremens dont,
so thanks God for small things…

http://en.wikipedia.org/wiki/Belgian_beer

There are kids out there creating decentralized
and distributed content portals and virtual worlds,
out of free software, and generating more real cash
than some European contries. And yet I just had to
sit through a day-long meeting (10am-6pm) discussing
the definition of e-justice, whether its a good idea,
the so-called obvious need for a centralized agency
for management and the consequent need of a “serious”
feasibility study. This also sucked…

— MV


19
May 07

The Future of Web Development

Path: mv.asterisco.pt!mvalente
From: mvale…@ruido-visual.pt (Mario Valente)
Newsgroups: mv
Subject: The Future of Web Development
Date: Sat, 17 May 07 20:08:21 GMT

Ever since I chose Zope (back in 1999) as the basis for
easy web development, I’ve been constantly looking for better
alternatives. One such alternative, back in 2005, was (for
me) the choice of Nuxeo CPS (which is Zope based).

http://www.zope.org/
http://www.cps-project.org/

That constant search was definitely not helped by the
development of Zope 3; technically its brilliant; its a
total mess in pragmatic terms. Even I, with a brain the
size of the universe, am unable to totally grasp its
concepts. Which it figures: being the concept child of
Jim Fulton, a genius, trying to actually grasp it means
your brain could explode.

The search was also completely botched by Nuxeo’s option
to switch from Python development to Java development. I
guess stupidity is a God given right, so I won’t say much
about that choice.

Meanwhile, lots of stuff has come up: Django, Pylons,
TurboGears, Ruby on Rails, you name it… I dont want to
go over extensively why I dont like any of them, but let
me just state this: if the “framework” allows for the
mixing of layout/design/HTML with code/programming, for
me is enough of a disqualifier.

A mix of CherryPy with PyMeld is the closest thing that
I have found that aproximates what I have in mind, but
still not enough, which is why I’ve been thinking about
developing a “framework” of my own (dont we all…)

http://www.cherrypy.org/
http://entrian.com/PyMeld/

The Wheat project also seems to have some good ideas
but unfortunately it seems to have stopped:

http://www.wheatfarm.org/

My justification for the current lack of better choices
is that there’s a complete lack of understanding on the
relation between the 3-tier architectural pattern and the
MVC architectural pattern. If you think they are one and
the same thing, you dont know jackshit; if you know they
arent the same thing but fail to understand the relation
between those two, expect a post from me on the subject
one of these days. Meanwhile, go and read this:

http://www.tonymarston.net/php-mysql/infrastructure-faq.html#faq26

Still, while on the lookout, I’ve always kept an eye out
for the ActiveGrid guys ( http://www.activegrid.com/ ). If
there’s someone who’s thinking long term and in terms of
future standards use (BPEL, XPath, XForms, etc), its them.

Their latest screencast is a feast for the eyes and mind
and puts to shame some of the “ooh-aah” demos that have
been floating around for the last 3 or 4 years. Go watch
it and cry:

http://www.activegrid.com/demo_screencast/empnom2/empnom2.html

— MV


19
May 07

My Way

Path: mv.asterisco.pt!mvalente
From: mvale…@ruido-visual.pt (Mario Valente)
Newsgroups: mv
Subject: My Way
Date: Sat, 17 May 07 06:08:21 GMT

“Now, you do just what
You choose to do
And I will be alone again tonight its true”

The Damned – Alone Again Or
http://www.youtube.com/watch?v=CTXSirNZlqA

“Regrets, I’ve had a few
But then again, too few to mention
I did what I had to do and saw it through without exemption

But through it all, when there was doubt
I ate it up and spit it out
I faced it all and I stood tall and did it my way

To think I did all that
And may I say, not in a shy way,
Oh, no, oh, no, not me, I did it my way

For what is a man, what has he got?
If not himself, then he has naught
To say the things he truly feels and not the words of one who kneels
The record shows I took the blows and did it my way!”

Frank Sinatra – My Way
http://www.dailymotion.com/video/x18cy8_frank-sinatra-my-way

— MV