Keyphrases & Keywords Analyzer

A Web Tool to analyze the semantic of the Search Engine Keyphrases (SEK)

Semantic analysis of the Keywords & Keyphrases

The issue that we will be about ourselves to face in this section is that one to try one possible the following solution to problem:

A machine can "comprise" a (Key)phrase?

In order "to comprise a phrase" we will mean the ability to associate one phrase a argument (meant?). I think that someone other has thought around this question and I'm sure has been enrolled entire volumes on the argument. Well, but this time I do not want to googling around the argument so I want to think before myself and then to even listen to some your opinion!

Definition of the problem:

Let be N a set of n phrases { F }, It is wanted to be obtained a set of couples { (F i , C i ) } where Ci is the "category" (meant) to which the phrase Fi belongs. The categories are not knowns "a priori" but must be determined by a statistical analysis of the phrases supplied in input.

Applications:

A possible application is that one to obtain "intelligent and dynamics" statistics of the keyphrases (and keyword) inserted in the search engines in order to reach our web sites.

Many specific web site offers statistics services of accesses to a URL. They offer detailed reports around the phrases used (referrer) by visitors in the search engines in order to reach web site, but they often bring back only the keyphrase and the number of times (Hit) that it is used in order to arrive to the URL.

It would be much more interesting to get a report that groups, dynamically, the totality of the phrases in few categories (and even subcategories) with totals obtained adding the Hits. A report like this would be fundamental for the webmaster that it wishes to understand which are the sections of website that are more visited. In this way the website manager can adopt the best choices of optimization and web marketing.

Analyses of the (Italian Language) phrases used in the search engines:

Fortunately the phrases inserted in the form of search have common characteristics that they can facilitate the logical analysis: the main ones are:

Number of words : very rarely a search phrase contains more than 5 words
Grammar structure : a search phrase generally has 2 kind of grammar structures:
- Object of the search: one or more words are the object of the search, as an example: "wireless linux". If they are more words generally the first one is the argument while the last one are details of the search, as an example: "windows 2003 SP1"
- Actions and Object of the search: a verb, generally in the "infinite mode", followed by the object of the search, as an example: "paging ADO recordset"

A possible algorithm in order to groups keyphrases should be composed by two phases: a first one (learning phase) is necessary to build a database of related keyprhases. In the second phase the the detection of the relatives meants by statistical operations:

Phase 1: "Learning" :

For every phrase to determine a possible " object " of the search (that presupposes the logical analysis of the phrase). The object can be constituted from or more words, eventual remaining words not "common" (type verbos, congiuzioni, prepositions, etc) belongs to the entirety " related " of the determined object.
If the found object does not exist in with of the objects, you add it, otherwise increases the value of a storage cell (k) for the object in issue.
Deep eventual words "related" found with the previous ones.

for each phrase in period
    for each word in phrase
        if is_not_common(word) then
            if Not Objects.Exists(word) then
                Objects.Add(word, related, 1)
            else
                Objects(word).k ++
                Merge(word, related)
            end if
        end if
    next
next

To the end of the cycle we will have with of the type: { (String Object, String() Related, Int K) }

Phase 2: "Analysis and statistical":

It calculates average (m) and variance (s) of the K, "categories " are defined all the objects that they have

k > = m+s

More info:

Keyphrases Analyzer discussion on GT Forum (Italian)