My+Thoughts

This is the place where thoughts and notes go. I'll try to make it as organized as possible for both your and my sake.

Meeting # 1: 1/27/2012 I've learned that I'm going to have to be specific. Most people will not have an incredibly detailed knowledge of what my project is and what the research topics are. For example, Fuzzy logic or Artificial Intelligence, the latter of which is itself a blanket term encompassing dozens of fields of. Next, I'm going to have to define the scope of the project. I knew that it would be a problem, seeing as there currently is no defined project scope. I was also prompted to put up a page specifically for documents, which I have already done as of today. Other than that, my notes from the meeting, which took place on 1/27/2012, were surprisingly scarce. They did remind me of some other things I had to do, though. (My laundry still wasn't done. Whoops!)

Thoughts # 1: 2/2/2012 Alright, so I'm going to try and get through the backlog of notes and ideas I've written down on paper that need to be up here on the wiki. For a chatterbot program I'd taken notes on both Determiners and Chomsky's system of Transformative Grammar. A Determiner is a word that establishes the Reference of a noun or noun phrase, including quantity, rather than its attributes (which are expressed by Adjectives). Apart from the Cardinal Numbers (Cardinal in the linguistic sense, not the mathematical), "Determiners" is a closed class of words that number around fifty.This means that for a program to 'know' these words would be simplicity incarnate, that they could be hard coded into the program's methods rather than being a malleable set of data (the storage method of choice for nouns, verbs, etc.) that is stored in an external file.

As for Chomsky's system of Transformative Grammar (I just love how that sounds!), a picture would do well to illustrate what it is. The picture is from [|here]. The sentence is divvied up into Noun Phrases and Verb Phrases, which are then subdivided into Determiners, Nouns, Adjectives, Verbs, Adverbs, and Pronouns. My theory is that if the program keeps a running file on each possible word, it could determine what kind of sentence it was, and what the sentence means in a larger context. This is quite a large task, but I believe that that if the user feeds enough data, there would eventually be enough to make sense of many different types with relative ease.

On the topic of building databases, I have done some research into practical designs for the files to be made. My conception of how the data is to be sorted and/ or stored is a malleable style, as mentioned above, styled by various palindromes. The file convention, which I have lovingly named the '.tacocat' extention for now, is itself a palindrome. The data is written once, reversed, and then written again. The data is thus saved in two places, confirming the safety of it. The way the data is saved within the file is also interesting, in my opinion.

Each line within the file (which is, for all intents and purposes, a text file for convenience) has a header character that indicates its purpose. A line that starts with 'c' might have the class information for the word. An example file is in the Documents section of the wiki, and is illustrated below.



I hope to post a guide to interpreting tacocat files soon, as well as more of my notes. This post has only been the first few pages of my research from before the official SYP start.

Thoughts # 2: 2/4/2012 I am continuing to type my notes now. I've compiled a list of precursor programs that I need to make to make the chatterbot program work out. 1) The Word Database Builder 2) The Sentence Particle Identifier 3) The Data Mixer 4) The Sentence Data Extractor

1( The program asks information and then compiles tacocat files based upon it. It categorically asks for information about the base word, which is input upon execution. The information is ordered correctly and formatted into the right formulation. 2( When you run the program, it asks for a sentence. After that, it breaks the sentence into parts, which are identified and given back to the user. This is perhaps the most critical piece of hard coding in the project. 3( This program looks in two sub directories, specified by the user of course, and then identifies any differences between files and/or content. After which, it might ask the user what course of action they want to take, replace the files or add them to the other directory. 4( The Sentence Data Extractor takes a sentence the user inputs and compiles the data into a proper tacocat file. Basically, it combines the functions of program two and one to do so.

I will not develop them in order, necessarily. I will definitely be devoting a large amount of time to it over the next couple weeks. My next goal is to research better methods in programming that can be applied to the development of these precursors.

I first turned my mind to finding a way to hold the characters parsed from the sentence input for precursor 2, and my mind turned to delimiters with scanners. I've added the reference to the Artifact Research section, under a new header specifically for programming. Also on that note, I've added a subsection to the Artifacts section for interesting topics in computer science. I'm going to get some sleep now.