Discussion061005log

Summarization
A summary of the night's discussions: there's some uncertainity of why the PAQ-programs should be considered having neural nets in them as well how the correlation between expert knowledge or the data size and the prediction efficiency. In the second article, there were a number of assumptions or claims that could be seen as not quite justified. Strong emphasis was put on the lack of incoperating the time of the agent' complications of non-halting computations and peripherals thereof. The measure of intelligence is not computable by practical means. it seems largely arbitrary. it lacks any meaningful use of the concept of time. some of the claims are overstated. nonetheless, hutter provides an elegant theoretical measure and an educational paper.

Complete discussion log
14:01:30 <AJC> So, the first paper is about lossless text compression, written by Matthew Mahoney. He introduces a novel technique which allows the compressor to combine the predictions of arbitrary models, and learns to select the best one online using a neural network.
14:01:43 <AJC> I think a good place to start the discussion is why prediction helps with compression, and how prediction can also relate to AI in general…
14:03:20 <cwenner> isn't that answered in the paper?
14:03:37 <AJC> it's arguable :P
14:03:43 <AJC> the second particularly
14:03:57 - - - join: poolio (n=ten.tsacmoc.dm.1dsh.701-3-152-96-c|oiloop#ten.tsacmoc.dm.1dsh.701-3-152-96-c|oiloop) joined #ai
14:03:58 <AJC> ttt- would have a field day.
14:04:42 <chessguy> airbot: discussion night?
14:04:42 <airbot> it's in 2 hours folks!
14:04:57 * chessguy snickers
14:04:57 <AJC> ok, we can move on then. why does context mixing specifically help prediction?
14:05:03 <poolio> oh really? awesome.
14:05:21 <cwenner> No, the discussion nigth is Now
14:05:28 - - - join: Shadow_mil2 (n=chuck@SilentFlame/Member/pdpc.active.Shadow-mil) joined #ai
14:05:31 <AJC> airbot: discussion night?
14:05:31 <airbot> it's in 2 hours folks!
14:05:43 <billfur1> uh, no.
14:05:51 - - - part: airbot left #ai
14:05:52 <AJC> hmm, i thought i'd fixed that.
14:05:57 <cwenner> it's now :)
14:05:57 - - - quit: Shadow_mil (Nick collision from services.)
14:06:16 <YaroslavVB> AJC: he uses a neural network? I missed that
14:06:23 - - - nick: Shadow_mil2 -> Shadow_mil
14:06:36 <AJC> YaroslavVB: i was trying to find the reference again… certainly one of his PAQ implementation does.
14:06:50 <cwenner> i also missed it at first, he mentions it however in the last paragraphs
14:07:09 <AJC> http://www.cs.fit.edu/~mmahoney/compression/mmahoney00.pdf
14:07:16 <YaroslavVB> so the training equation is (2)
14:07:19 <AJC> Fast Text Compression with Neural Networks
14:07:38 <YaroslavVB> I'm trying to see if this training equation is why he calls it "neural network"
14:07:55 - - - quit: pooh_beawr ("Ex-Chat")
14:08:00 <cwenner> "PAQ1 is derived fro m an eaerlier compressor, P12, which predicts a bit stream using … NN ..
14:08:02 <AJC> probably a perceptron
14:08:18 <AJC> one layer, easy to train
14:08:33 <cwenner> PAQ differs in that the submodels output a confidence in addition to a probability
14:09:00 <YaroslavVB> what do the Sn_li, S_1n_i stand for?
14:09:06 <YaroslavVB> (equation 2)
14:09:18 <AJC> he calls it a Maximum Entropy NN in his other paper
14:09:51 <cwenner> S n_li S_i n_i i guess. i would be S_0 or S_1
14:09:54 <cwenner> S * n_li
14:09:56 <cwenner> S_i * n_i
14:10:02 <cwenner> 1i*
14:10:23 <poolio> Is the * a multiplaction operator or a dereferencing
14:10:32 <YaroslavVB> cwenner: I don't understand, can you rephrase?
14:10:49 <AJC> cwenner: that's a bit weird the models outputting confidence. isn't it the job of the context "selector" to evaluate quality instead of the models doing so?
14:10:50 <cwenner> multiplication, the last one was for spelling
14:11:08 <cwenner> yaroslav: Sn_li is a multiplication between S and n_1i
14:11:22 <cwenner> S and n_1i are both defined above
14:11:23 <YaroslavVB> well yeah
14:11:42 <YaroslavVB> where's n_li defined?
14:11:50 <cwenner> though when i did a partial derivate dcost/dw_i, i got nothing like his equation
14:11:53 <cwenner> 1, not l
14:12:06 <YaroslavVB> yes, what is it
14:12:29 <AJC> poolio: multiplication
14:13:05 <YaroslavVB> cwenner: when you took the partial derivative, what was your formula for the cost?
14:13:08 <cwenner> yaroslav: n_xi would be the confidence (support) for classification x by model i
14:13:30 <YaroslavVB> ah ok
14:13:33 <cwenner> log_2 1/p_x? where x is the correct classification. is that wrong?
14:13:38 <poolio> AJC: yeah I got that a bit late (and cwenner said it)
14:13:53 - - - join: ows (n=tp.obacten.epc.302-89-231-38a|evoL#tp.obacten.epc.302-89-231-38a|evoL) joined #ai
14:14:02 - - - quit: ows (Read error: 131 (Connection reset by peer))
14:14:17 <YaroslavVB> cwenner: that's the minimum expected codelength
14:14:20 <cwenner> (AJC: don't have a good answer, that's why i'm quiet)
14:14:54 <YaroslavVB> cwenner: oh I see, that's the cost function you used?
14:15:00 <cwenner> what would be the proper cost?
14:15:20 <cwenner> dialog :/
14:15:47 <AJC> in which case i'll open up the question. is a model capable of estimating its own accuracy that well? i mean models are purposefully narrow, so it makes sense that a more global predictor get a better sense of which model to use.
14:16:23 <AJC> though i think the context selector does not in fact use any other information that what the contexts give it, so unless you gave it more information, that's the best it can do…
14:16:35 <cwenner> (yaroslav, d(log 1/p_x)/dw_i is nothing like what he put at least)
14:18:38 <YaroslavVB> hm, trying to derive it now
14:18:53 <cwenner> the weight is altered somewhat proportional to the models' confidence though, so if they are overly confident when they shouldn't they would quickly be ignored
14:19:20 - - - join: pooh_beawr (n=031.561.231.47|medamsah#031.561.231.47|medamsah) joined #ai
14:20:37 <cwenner> though it seems a model that claims it's very confident and is right > 0.5 could get a greater relative weight than one that is right more but with a lesser confidence. for instance C1 2/3 - C1 1/3 versus C2 3/4 - C2 1/4
14:21:16 <AJC> it seems so
14:21:20 <cwenner> not a good general answer though, any other takers?
14:21:30 <cwenner> (state your thoughts, people)
14:21:38 <cwenner> (on the question)
14:22:10 - - - quit: noclue ()
14:22:51 <cwenner> i think confidence is a great property otherwise. obviously you shouldn't force a specialized system to be as certain of datum in it's area as outside
14:23:31 <AJC> though i'm guessing each predictor is designed with those two purposes in mind, so a lot of human expertise went into the confidence i guess
14:23:33 - - - quit: narx_ (Read error: 110 (Connection timed out))
14:23:35 <cwenner> this update rule doesn't work very well though since you should bet max if p > 0.5
14:24:21 <cwenner> and since they only have two choices here, the model must always believe p >= 0.5
14:24:35 <cwenner> it seems n is efficiently useless
14:25:13 <AJC> heh
14:25:20 <cwenner> do you understand what i mean?
14:25:27 <AJC> so having a softer threashold would be a better strategy?
14:25:55 <_death> don't you ordinairily want to integrate over the confidence interval and utility?
14:25:57 <cwenner> what threshold?
14:26:10 <AJC> 0.5
14:27:06 - - - join: pattern (i=moc.natj.edemynag|sisong#moc.natj.edemynag|sisong) joined #ai
14:28:28 <AJC> is there a rate of adjustment somewhere?
14:28:29 <cwenner> you mean adding a factor to the update if the answer is wrong? (the 0.5 in p > 0.5 refers to the critical value for by what probability you must certain to at average increase our certainity)
14:29:20 <cwenner> _death: they only have two options here though, the next bit is 1 or 0
14:29:34 <_death> in that case, the integration is remarkably easy
14:29:45 <_death> and it is a summation
14:30:14 <YaroslavVB> I wonder what n_i means
14:30:26 <YaroslavVB> he has n_0i, n_1i and n_i
14:30:31 <cwenner> n_0i + n_1i, i.e. total confidence
14:30:36 <YaroslavVB> oh wait, it's there
14:31:21 <cwenner> _death: so how would it be here?
14:32:01 <YaroslavVB> I get something like n_1i x/S1 + n_01 (1-x)/S0 - (n_1i+n_01)/S when I take the derivative. Maybe it could be turned into what he has, not sure
14:32:08 <_death> if you only want to minimize error, the utility is 1, i think
14:32:21 <YaroslavVB> n_1i x/S1 + n_0i (1-x)/S0 - (n_1i+n_01)/S
14:32:36 <cwenner> what derivative, d(log 1/p_x)/dw_i ?
14:32:39 <YaroslavVB> yeah
14:32:59 <cwenner> don't you have a 1/p_x factor?
14:33:26 <YaroslavVB> you simplify log 1/p_x into -log p_x, then plugin the formula for p_x
14:33:42 <YaroslavVB> expand the logs, then take the derivative
14:34:00 <AJC> in the wikipedia compression challenge, they have different sections that compress better or worse… i'm wondering how fast the weights adjust, and if slower/faster adjustment gives better results. more to the point, how do you decide that?
14:34:25 <AJC> i guess running the compressor over and over and seeing how it does! :P
14:34:50 <cwenner> he only says that the initial weights changes the learning speed, but there's no factor in the update rule
14:35:11 - - - quit: chessguy (" HydraIRC -> http://www.hydrairc.com <- Leading Edge IRC")
14:35:17 <YaroslavVB> used p_x = Sum wi n_0i ^ (1-x) + Sum w_i n_1i ^ x as the formula…it's weird that he has the same weights (w_i) for both positive and negative predictions
14:35:32 <AJC> cwenner: the lack of factor surprised me a bit actually
14:37:29 <AJC> is it fair to say that context models are a crude way to approximate the true "prior" of a document, by mixing and matching a limited number of contexts?
14:37:32 <YaroslavVB> where was this paper published?
14:37:55 <AJC> tech-report it says
14:39:05 <AJC> i mean, ideally, we'd have a document specific "context" which could be created by analying the data in more depth, rather than having built-in contexts that are hand tuned by experts.
14:40:38 <YaroslavVB> the hard part would be specifying the language for creating the "context functions"
14:41:17 <AJC> isn't it just a program?
14:41:37 <YaroslavVB> are you talking about learning the context functions from data?
14:42:27 <AJC> yes
14:42:37 <cwenner> okay nm, got what he got now
14:42:46 <ttt-> discussion!
14:43:04 <AJC> a lot of this prediction stuff happens without even having seen the data. this is good because it works online, but if you have the data available, you can build a much better model.
14:43:06 <YaroslavVB> well, the more rich you are, the more data you'll need
14:43:13 <YaroslavVB> general program is pretty rich
14:43:26 <YaroslavVB> so you'll need a lot of data to learn a useful function
14:43:48 <AJC> well, for the wikipedia problem for example, that's a lot of data.
14:43:54 <AJC> yet most of that is compressed "online"
14:44:10 <AJC> the only thing that adjusts the contexts is a human tweaking the dictionaries
14:44:21 <AJC> shouldn't this be the next step of compression?
14:44:57 <YaroslavVB> There's similar work in reinforcement learning called "Predictive State Representation". The idea is to learn useful functions of the past history, as opposed to just previous 2 states
14:45:09 <AJC> allowing the compresson to spend as much time as it needs on the data, creating a good specific model for copmressing this document better, and then saving that to disk for the decompressor to use.
14:45:25 <_death> how does this compression technique compare against modern arithmetic compression algorithms?
14:45:33 <AJC> (this assumes the size of the context predictor is smaller than gained size :)
14:46:12 <AJC> _death: this uses arithmetic coding under the hood, ithey just predict differently.
14:46:21 <AJC> is that what you mean? compared to 7z or such?
14:46:24 <_death> AJC: thought so but wasn't sure
14:46:31 <cwenner> are the weights trained beforehand or merely online?
14:46:50 <AJC> _death: basically PAQ is much better, but much slower.
14:47:06 <AJC> cwenner: online, there's nothing in the equation that's not dynamic. that's what i was surprised about
14:47:43 <AJC> http://www.cs.fit.edu/~mmahoney/compression/text.html
14:47:49 <YaroslavVB> AJC: it's a compromise - - the more expert knowledge you put into an algorith, the better it'll be, because it just augments the data. If you have enough data, you can do away with the expert…but as google showed, with enough data, even the most simple learning algorithm works very well
14:47:51 <AJC> it has a bunch of benchmarks
14:48:26 <AJC> YaroslavVB: i don't think it's a tradeoff in terms of having data to learn from. if you have a little file, you should be able to make a simple context model.
14:48:42 <AJC> the only tradeoff i can see is the extra size of storing a document specific context
14:49:01 <AJC> global expert-tuned contexts are essentially packaged with the decompressor, so you don't include them with your archive.
14:49:09 <cwenner> not sure about much. according to the table at the end, 23% better than sbc but taking 288 times longer (or 1038?)
14:49:29 <YaroslavVB> what do you mean by "make a simple context model"? Do you mean learn the context functions? Or learn a model using some pre-specified context functions?
14:50:27 <AJC> YaroslavVB: well, i'm not sure what this context function would look like, but i'm certainly thinking about it in terms of algorithmic complexity.
14:50:47 <AJC> so a simple file you could compress to a simple program which outputs the exact data
14:51:30 <YaroslavVB> I'm thinking of context functions as the features of the context. IE, you can learn model that uses features like "previous word is a" or features like "previous two words are a dog", those are different context functions
14:51:34 <AJC> but instead of outputting the data, you use that as a prediction, and then do arithmetic coding (so you're not required to make your program perfect)
14:52:05 <cwenner> i'd be interested in seeing a table of data to Kolmogorov complexity
14:52:05 <AJC> YaroslavVB: yes, that's what i was thinking too
14:52:39 <AJC> cwenner: what kind of data?
14:52:50 <cwenner> binary for instance
14:53:21 <cwenner> merely going through the first thousand or so prefixes and listing their complexities
14:53:52 <AJC> oh, to see what kind of functions we'd get for, say, the wikipedia document?
14:54:23 <AJC> i think this is the approach that will win the contest btw
14:55:20 <cwenner> what wikipedia page
14:56:34 <AJC> the wikipedia comprossion contest
14:56:53 <AJC> http://www.hutter1.de/prize/index.htm
14:57:56 - - - quit: poolio ("leaving")
14:57:57 <AJC> so what do you guys think is the next step in lossless compression?
14:58:47 <_YKY_> Hi all…=)
14:59:10 <cwenner> hey _YKY_, discussion time, what do you think?
14:59:12 <_YKY_> AJC what do you use as the predictor?
14:59:42 <AJC> _YKY_: what do you mean, in what context? what does the paper describe?
15:00:05 <_YKY_> For the compressor
15:01:19 <_YKY_> It seems that you're trying to predict word associations
15:01:20 <cwenner> we were talking about the paper so the predictor would here be weighted submodels. not entirely sure how all of the submodels work
15:01:35 <AJC> _YKY_: well in PAQ, the author uses 8 expert tuned predictiors, based on bits, words, etc.
15:02:00 <_YKY_> I see..
15:02:03 <AJC> there's an ngram i think
15:02:47 <_YKY_> N-grams don't really have understanding of the document..
15:02:50 <AJC> anyone else have any questions or comments about the paper?
15:03:11 <AJC> _YKY_: no, none at all. most of the models in PAQ don't have any above the word level.
15:03:20 <cwenner> _YKY_: how is "understanding" relevant for compression?
15:03:46 <ttt-> how is compression relevant for understanding
15:04:36 <AJC> ttt-: if you can measure understanding with probability, then you can use prediction for understanding
15:04:46 <_YKY_> The argument is that compression can pass the turing test
15:04:50 <cwenner> i don't have a firm definition of what understanding means, so how is it related to compression?
15:05:20 <AJC> _YKY_: the argument is that better prediction can help you build an AI to pass the turing test.
15:05:31 <ttt-> i would ask it: give me your explanation of the data
15:05:36 <ttt-> now give me 3 other ones
15:05:38 <cwenner> whose argument?
15:05:42 <_YKY_> Conversely … humans are very good at the prediction test
15:05:43 <ttt-> a human could do that
15:05:54 <AJC> ttt-: now that's exactly what the current system does.
15:05:59 <ttt-> cool
15:06:22 <AJC> it'll say: "This is a string." "This is a strong." "That is a string."
15:06:22 <AJC> etc.
15:06:48 <AJC> but all at the letter level, now if you can add a gramatical model, then your prediction can be more accurate, so your compressor would be bettor.
15:07:21 <_YKY_> Do you have a semantic level as well?
15:07:23 <AJC> what's nice about this paper is that it provides a modular framework for plugging in your models, so if you could use Cyc for this, then it'd find out if it compressed better
15:07:34 <AJC> _YKY_: not in PAQ nor this paper
15:07:50 <cwenner> i believe this one generates arbitrary simple grammars though, adjusting the context models
15:07:58 <DanF_DrC> AJC, how's your sandbox coming along?
15:08:04 <AJC> DanF_DrC: discussion night :P
15:08:14 <DanF_DrC> ah ok
15:08:24 <cwenner> did you read the article(s), danf?
15:08:34 <DanF_DrC> no
15:08:56 <ttt-> i tried but didnt get very far :/
15:09:52 <cwenner> ttt-: if it was cause it was difficult to understand, you could discus it. i don't find it entirely clear either
15:10:07 <_death> i'm ready to discuss the second paper when everyone is ready :D
15:10:10 <cwenner> you could discus it| i mean go on here and ask away
15:10:14 <ttt-> does it save contexts?
15:10:18 <_death> k
15:10:19 <ttt-> to use them later?
15:10:35 <_death> before getting too far into the paper discussion, i think it is important that everyone knows what occam's razor is
15:11:07 <AJC> ttt-: a context in this case is used throughout the document
15:11:08 <_death> william of ockham is commonly attributed with having said, "entia non sunt multiplicanda praeter necessitatem," though no proof has ever been found
15:11:11 - - - quit: qsrv (Read error: 110 (Connection timed out))
15:11:20 <cwenner> you could describe it before the next
15:11:22 <cwenner> or nm
15:11:41 <_death> this statement translates to mean "don't multiply entities unnecessarily," or basically, the KISS principle
15:11:51 <AJC> ttt-: a context is not a piece of data that you store in a dictionary. it's a simple program that does prediction, some of which use small dictionaries yes.
15:11:52 <_death> it is intended as prescriptive advice and NOT as a normative claim
15:12:29 <cwenner> yaroslav: did you figure out whether they were justified or not to draw the conclusion of favoring simpler models?
15:12:38 <_death> people who utilize occam's razor often end up making powerful scientific claims and igniting explosive revolutions, but this does not, in any sense, mean that it is "correct"
15:12:57 <AJC> cwenner: who's "they"? Hutter?
15:13:12 <ttt-> shortest explanation has the highest probability of being correct?
15:13:20 <cwenner> was thinking about Hutter and Solomonoff
15:13:33 - - - join: CodeRun (n=561.82.841.98|aeDIon#561.82.841.98|aeDIon) joined #ai
15:13:35 <cwenner> after all, Hutter relies on Solomonoff and seems to make claims on top of that
15:14:14 <AJC> algorithmic probability says that long programs are less likely to be created randomly
15:14:29 <AJC> ttt-: pretty much
15:14:40 <_death> so, with that out of the way…
15:14:59 <_death> hutter complains that standard iq tests cannot be used to test machine intelligence in a meaningful way
15:15:04 <cwenner> hey?
15:15:17 <AJC> ho!
15:15:18 <ttt-> why not
15:15:37 <cwenner> AJC: well yaroslav aruged (but i may have misinterpreted) that it depends on the turing machine in question
15:16:07 <AJC> cwenner: algorithmic complexity remains valid for any turing machine.
15:16:09 <cwenner> and how this is related to the future works, they may very well have misused one or the other principle et.c.
15:16:11 <_death> it is interesting to note that hutter does not refer to "artificial" intelligence. he seems to consider the terms _artificial intelligence_ and _intelligence_ as synonymous. they are not intended to be, as embodied by the spirit of the turing test
15:16:21 <cwenner> for any Given turing machine
15:16:24 <ttt-> it has to read the question, do some reasoning and figure out how to answer. sounds pretty hard to me
15:16:35 <AJC> cwenner: the likelihood of it being a good TM in practice is a different issue though.
15:16:51 <cwenner> _death? wait a bit
15:16:56 <cwenner> _death: perhaps we should try to stop the other dsicussion before starting on the other paper
15:16:59 <_death> philosophical flaws extend deeply through the paper, but hutter does come up with a very interesting measure
15:17:10 <_death> cwenner: certainly
15:17:26 <cwenner> AJC: how about we continue later?
15:17:35 <AJC> sure
15:17:38 <ttt-> AJC: long programs are less likely to be generated in nature? or by a process?
15:18:04 <AJC> if you pick a bunch of random instructions including the halt instruction
15:18:05 - - - quit: Kraig (Excess Flood)
15:18:44 <ttt-> and you keep picking until the next instructions will never be reached and dont matter?
15:18:50 - - - join: Kraig (n=ten.acirbmalani.1tan|sabes#ten.acirbmalani.1tan|sabes) joined #ai
15:19:20 <AJC> ttt-: yes, those are called self-delimiting programs
15:19:33 <AJC> if you read something off a turing tape that has no instruction, you pick one randomly
15:19:34 <ttt-> hm?
15:20:23 <cwenner> ttt-: should we perhaps continue that discussion afterwards? (you could google for Algorithmic Complexity or Kolmogorov Complexity if you want to see the formal definitions)
15:20:24 <AJC> anyway, i think we should also discuss this some other time… it's not 100% related, and _death has prepared the next paper for us
15:20:28 <AJC> cwenner: yes :P
15:20:49 <ttt-> true, sorry
15:20:54 <_death> hutter states that the complexity of an environment can be measured by the kolmogorov complexity of the turing machine which represents the environment. he points out that kolmogorov complexity can be used to create a probability distribution, the algorithmic probability distribution, which he seems to believe is the prior distribution from which we expect our environments to be drawn. this leads to a simple but elegant formula which defines the intelligence o
15:20:59 <cwenner> (ttt-: i doubt AJC would be against discussing it after the other paper or privately)
15:21:20 <cwenner> okay, paper2, Hutter's A formal measure on machine intelligence. _death is giving an introduction
15:21:28 <AJC> he just did :P
15:22:02 <ttt-> how did he prove nature picks programs this way
15:22:29 <_death> it is not clear to me how to separate the agent and the environment
15:22:33 <AJC> ttt-: he didn't, he looked at it through the model of a turing machine
15:22:49 <_death> obviously they both need to be placed on the same universal turing machine, correct?
15:22:50 <cwenner> _death: try to keep them a bit shorter, some clients cuts them off otherwise
15:22:58 <AJC> what worries me is that my fingers are definitely not getting symbols as i type on the keyboard
15:23:06 <AJC> it's more of a contiuous stream of sensory data.
15:23:20 <AJC> so maybe this "theory" of his only applies to symbolic settings?
15:23:31 <ttt-> i think so
15:23:48 <AJC> _death: there are two TM that interact, sending symbols back and forth
15:24:02 <cwenner> any number of TMs are equivalent to another TM
15:24:19 <ttt-> if you add another halt instruction, you'll get programs on average
15:24:21 <cwenner> (or finite number?)
15:24:50 <_YKY_> AJC you certainly are typing symbols on your keyboard…no?
15:24:57 <ttt-> or another void instruction, you'll get longer ones
15:25:10 <AJC> does Hutter admit or acknowledge that his theory only applies to symbolic domains of AI?
15:25:14 - - - quit: kinoc (Read error: 145 (Connection timed out))
15:25:29 <_death> AJC: that makes sense, but doesn't the communication code add some overhead which would, even if we accepted the algorithmic distribution, skew the measure?
15:25:46 <_YKY_> All environments can be viewed as symbolic
15:25:54 <cwenner> AJC: do we in fact have any proof that they are continous though? we could very well function discretely
15:26:18 <AJC> _death: as cwenner said, i think the process of translating two TM to one is something that's probably barely polynomial complexity
15:26:38 <AJC> hmm
15:26:48 <_death> AJC: one turing machine must be able to time the other
15:26:50 <cwenner> i don't think the time complexity is of any relevance
15:26:51 <AJC> DanF_DrC? help!
15:27:07 <AJC> the world is symbolic and bayesian :P
15:27:26 <cwenner> _YKY_: What support do you have that any environment can be expressed symbolically?
15:27:37 <AJC> i think the fact that Hutter looks at the world through the lens of a turing machine is already a big assumption.
15:27:53 <_death> i think that time is of relevance for intelligence. hutter does not
15:28:02 <cwenner> an assumption that many computer scientists doesn't have any problems with
15:28:11 <_death> fast right answers are better than slow right answers
15:28:34 <ttt-> yeah all computation is turing computation. but to assume the types of computation in nature are distributed like turing machines is another thing
15:28:37 <_death> the environment should be able to judge that, right?
15:28:39 <cwenner> _death: hutter does state that as well though.
15:28:41 <AJC> cwenner: they have problems with it because nobody can solve any kind of real-world problem of suitable complexity
15:28:45 <_death> in cwenner's environment, it doesn't matter
15:28:50 <_death> cwenner: where?
15:28:52 <cwenner> _death: that it is relevant, not how to include it however
15:30:36 <_YKY_> Hutter's model is highly abstract… it doesn't ignore time… time can be handled by the turing machine
15:30:57 <_death> i am not satisfied with its method for handling time
15:31:40 <_YKY_> Any method can be expressed using a turing machine=)
15:31:50 <cwenner> _death: cannot find it now. i believe they somewhere state in the informal, intuitive definition, the speed should be included, not in the formal however.
15:32:01 <cwenner> _YKY_: That's an assumption
15:32:23 <cwenner> i agree though, _death, the time it tkes for the TM to halt should be taken into consideration
15:32:40 <_death> lol, yes cwenner
15:33:56 <_YKY_> Don't tell me we're here to debate the church-turing hypothesis..
15:33:58 <cwenner> but isn't that left out because it's a difficult question. if you do that, do you also require other models to be evaluted in their speeds in terms of the best equivalent TM? Doesn't TM-equivalence merely say that they will reach the same answer, if any, not on their relative speeds?
15:34:04 <_death> it is not entirely clear how learning relates to intelligence in cwenner's environments
15:34:20 <cwenner> my environments?
15:34:32 <_death> oops!
15:34:37 <_death> sorry, cwenner :(
15:34:50 <cwenner> hutter's?
15:34:52 <_death> it is not entirely clear how learning relates to intelligence in the paper's environments
15:34:54 <AJC> I thought TM equivalence said something about speed, i.e. not a huge O() different
15:35:29 <_death> yes, hutter's environments (though i am not sure how serious he means for the paper to be)
15:36:01 <ttt-> i dont get why a human IQ test isnt good for AI
15:36:02 <cwenner> _YKY_: Not if we don't want to, we don't have anything against using it as an assumption. I just commented cause you seemed to state it as an independent fact
15:36:07 <AJC> _death: the way Hutter looks at learning is that you should predict the future behaviour from the past observations, which is done by algorithmic complexity.
15:36:35 <_death> AJC: on the basis of reward?
15:37:02 <_YKY_> I think hutter's model is correct in an abstract way but not sure if it is practical
15:37:08 <cwenner> ttt-: The 1 Introduction section, first 5 paragraphs
15:37:14 - - - quit: pooh_beawr ("Ex-Chat")
15:37:16 <AJC> _death: reward is something else, learning and reward are separate.
15:37:19 <_death> i'm convinced it is utterly, horribly wrong
15:37:31 <AJC> _death: you can learn that something is bad, it's still learning.
15:37:38 <_death> AJC: i think that depends on the environment
15:37:48 <AJC> no, not at all.
15:37:53 <ttt-> we should figure out how humans learn things by seeing them once
15:37:54 <ttt-> not 1000 times in training :)
15:38:04 <_death> no, i mean learning and reward are not necessarily separate, depending on the environment
15:38:15 - - - join: rutski (n=ten.enilnotpo.nyd.53f66c44-loo|98ikstur#ten.enilnotpo.nyd.53f66c44-loo|98ikstur) joined #ai
15:38:20 <rutski> hi all
15:38:43 <AJC> _death: learning and reward are separate concepts.. learning happens within the agent internally to build its model, reward is something done by the other turing machine that models the environment.
15:38:53 <AJC> hi rutski, it's discussion night. read the topic.
15:39:01 <_death> i'm not sure about the claims at the end of the paper; the seem reasonable, but i would not be surprised if some of them fall under harsh scrutiny
15:39:03 <cwenner> hello rutski: we are in the middle of discussion an article ( http://www.idsia.ch/~marcus/ai/ior.pdf ), feel free to join in
15:39:30 <YaroslavVB> I think Turing equivalence refers to the set of problems that can be solved. A problem is computable if it's solvable by a Turing machine or an equivalent system
15:39:41 <_death> AJC: learning can come from reward, correct?
15:39:42 <AJC> _death: what claims in particular? i only skimmed the end…
15:40:01 - - - quit: henkol (Remote closed the connection)
15:40:23 <AJC> _death: no, learning happens regardless. you learn to predict reward. then, separately, once you've learnt to predict, then you can behave rationally according to the prediction.
15:40:39 <AJC> _death: the fact that humans learn only by carrot and stick is just an evolutionary hack :P
15:40:53 <_death> YaroslavVB: you know a lot more about turing machines than i
15:41:32 <YaroslavVB> I had to program some turing machines for one of my classes
15:41:38 <_death> AJC: that's an interesting claim
15:41:54 <ttt-> can someone make a program that solves human IQ tests
15:41:55 <YaroslavVB> that exercise really makes you appreciate higher level languages like assembly
15:42:17 <AJC> _death: there's a field of machine learning called classifier systems. they made progress in the 90s with XCS. the key insight was that classifiers are still good if they predict low reward. it's VERY useful to have that information, whereas before, only the classifiers that foretold high reward were kept.
15:42:29 <YaroslavVB> I've seen a program that can solve analogy problems at a level similar to highschoolers
15:42:39 <AJC> _death: XCS is just one of many examples.
15:43:00 <_death> are turing machines that give floating-point awards much more complex than those that give ratio awards?
15:43:20 <YaroslavVB> turing machines don't give any awards, they just read and write on tape
15:43:23 <ttt-> why dont people focus on those kind of problems instead of yet another statistical learners at low-level
15:43:26 <rutski> hey guys
15:43:29 <rutski> I'm pretty new to ai
15:43:44 <rutski> so I'm reading up about simple neural networks
15:43:48 <YaroslavVB> ttt-: the money lies in practical applications, especially classification
15:43:53 <cwenner> i believe turing machines must have finite alphabets, i.e. cannot give real rewards
15:43:56 <YaroslavVB> and research follows the money to an extent
15:44:01 <AJC> ttt-: the same reason that people try to make elegant computer languages and systems… it's easier to undrestand and solve problems like that.
15:44:07 - - - join: dnsa (i=lp.ten.golaid.nibul.5825-lsdx|b-a5nd#lp.ten.golaid.nibul.5825-lsdx|b-a5nd) joined #ai
15:44:10 <rutski> I'm at the level where I can only thus far understand simple 3 neuron networks that solve things like basic logic gate problems
15:44:11 <ttt-> isnt it time to move on
15:44:14 - - - join: pooh_beawr (n=031.561.231.47|medamsah#031.561.231.47|medamsah) joined #ai
15:44:14 <rutski> but I had a question about those
15:44:17 <_death> YaroslavVB: that's what i thought, but it wouldn't be hard to set up a protocol to allow two simulated universal turing machines to communicate, would it?
15:44:32 <cwenner> yaroslav: they interpret certain symbols/positions of the environment TM as rewards in this paper
15:44:33 <rutski> How do neural networks deal with variable size input data?
15:44:51 <AJC> rutski: they don't. maybe we can discuss it once our discussion is over?
15:44:56 <cwenner> rutski: look at recurrent/feedback neural networks
15:44:57 <YaroslavVB> _death: a Turing machine has a head and a tape. You could put two heads on one tape, so that would be like 2 turing machines communicating
15:45:21 <rutski> You can input 1 and 0 into a network with 2 input nodes to train it as a simple logic gate. But what if you want to employ a network with 4 input nodes to solve the same problem? How do you input "1 and 0" into a network that has 4 input nodes?
15:45:23 <YaroslavVB> But you could show that the number of problems you can solve with such setup is still the same
15:45:30 <cwenner> (which is equivalent to one TM)
15:45:32 <ttt-> rutski: almost all AI programs are designed for fixed chosen type of input for a special task and dont look beyond that.
15:46:00 <rutski> ttt-: type of input yes, but I'm talkig about size of input
15:46:11 <rutski> ttt-: are they mostly designed for a fixed size of input as well?
15:46:12 <YaroslavVB> rutski: there are 240 ways of doing it, you could just pick one
15:46:33 <cwenner> but having two TMs wouldn't really be difficult, you merely let one run, add a symbol to a designated position on hte other, repeat
15:46:37 <YaroslavVB> ie, you can input 1 by setting all inputs to 1 and 0 by setting all inputs to 0
15:46:41 <ttt-> just pick the biggest input size you expect and train it to ignore the other inputs
15:46:48 <ttt-> what YaroslavVB said
15:46:49 <cwenner> (well, the other runs, put a symbol on the first, then repeat)
15:47:00 <rutski> YaroslavVB: ahh, clever
15:47:04 <_death> someone asked which claims i didn't buy.. i didn't see any specifically, but i am going to have a look at them (closed the window and it is taking forever to load)
15:47:14 <rutski> or if you wanted to input 1 and 0 you set 2 inputs to 1 and the other two to 0?
15:47:43 <rutski> wait… but then how does the network distinguish that from the literal 4 inputs 0011?
15:48:00 <_death> this measure can never actually be calculated, correct?
15:48:52 <cwenner> _death: calculating the TMs is NP-complete i believe but it should still be calculatable in a 2^500 years
15:48:54 <cwenner> ;)
15:49:04 <cwenner> actually nm, all environments
15:49:49 <cwenner> i don't think there's a proof you cannot and they do claim AIXI is optimal
15:50:02 <_death> i don't see how he can make the claim that this measure is "valid" if it can't actually be measured
15:51:06 <YaroslavVB> you can approximate uncomputable quantities
15:51:31 <cwenner> perhaps measurable in infinite time
15:51:34 <_death> they would be 0
15:51:37 <ttt-> or send your computer to a universe where P=NP
15:51:38 <YaroslavVB> uncomputable simply means that you can't get guaranteed accuracy within certain time
15:52:01 <YaroslavVB> for instance, you can compute Chaitin's omega, an uncomputable number, in the same sense as you could compute Pi or e, just not to any guaranteed accuracy
15:52:41 - - - quit: billfur1 ("Leaving")
15:53:13 - - - quit: qbert (Read error: 110 (Connection timed out))
15:53:30 <_death> only extremely simple environments would have much of an impact on the measure
15:53:50 <_death> i suppose you could compare agents using this measure pretty easily
15:54:24 <cwenner> they have made a number of assumptions that aren't certain so i think it would be good to call it something like the legg-hutter intelligence rather than to coin the general term. it seems rather plausible til they introduce the kolmogorov complexity
15:54:24 <_death> you only need to calculate until there is a difference
15:56:02 <_death> i think that they could do something very similar to what they suggest doing with kolmogorov complexity
15:56:24 <cwenner> but it does seem as a practical measure for given environment distributions
15:56:36 <cwenner> such as the chess intelligence (if we assume it was efficiently measurable)
15:56:55 <_death> wouldn't answering null symbols be of the highest intelligence?
15:57:17 <_death> (assuming that they worked out the fine details to some degree of satisfaction)
15:57:22 <cwenner> i don't follow. a policy where it always picks the null action?
15:57:27 <_death> yes
15:57:49 <cwenner> can't we merely pick an environment which rewards if you do not pick the null action?
15:57:53 <rutski> so if neural networks are made to process fixed amounts of data then how do those spam filtering networks work? I mean, you can't guarantee the length of an email
15:58:01 <_death> wouldn't that environment be more complex?
15:58:20 <AJC> rutski: the neural network is given features of the email, not each character.
15:58:40 <AJC> e.g. number of $, does the word 'dollar' appear or 'V14GR4'
15:59:33 <cwenner> _death: are you talking about their measure or some sort of interpretation of what i said where Intelligence = Max e, Ypsilon_e
16:01:02 <_death> well.. i think the environment could handle that (so you could use summation)
16:01:35 <cwenner> _death: I do not follow, what are you argumenting against?
16:02:19 <_death> cwenner: sorry, i misread.. what is the question?
16:03:19 <cwenner> my statement was that it seems to be a decent measure of "intelligence" given an environment distribution. i don't like the introduction of the "KC" distribution
16:03:43 - - - quit: kristallpirat ("Verlassend")
16:03:50 <cwenner> (by the way, wikipedia states that Theorem. K is not a computable function.
16:03:52 <cwenner> In other words, there is no program which takes a string s as input and produces the integer K(s) as output. We show this by contradiction.
16:03:56 <cwenner> )
16:03:59 <cwenner> http://en.wikipedia.org/wiki/Kolmogorov_complexity
16:04:22 <_death> arbitrarily?
16:04:27 <_death> or at all?
16:04:52 <rutski> AJC: ah, ok, I see what you mean
16:05:17 <_death> i'm under the impression that there are several variants of kolmogorov complexity, but i'm not sure how that would impact the problem
16:05:24 <rutski> AJC: do we yet understand how the human brain processes variable sizes of data?
16:06:09 <AJC> sorry, you have to type smaller sentences… i can't read more than 32 characters
16:06:36 <_death> the "algorithmic distribution" seems to be chosen arbitrarily
16:06:37 <AJC> :P rutski: the human brain processes lots of data, but it's the same size.
16:06:57 <AJC> _death: how so?
16:07:07 <_death> why use the algorithmic distribution?
16:08:27 <cwenner> _death: they claim occam's razor is an intuitively important property of intelligence and perhaps also what objectively is justified
16:08:46 <_death> cwenner: that claim is patently false
16:09:01 <cwenner> i do not quite see how they justify it, that's all i can see given the paper
16:09:21 <AJC> _death: algorithmic probability makes sense if you require a specific solution.
16:09:22 - - - quit: AI_coder (Client Quit)
16:09:24 <cwenner> but perhaps they do not wish to do a strict claim
16:09:33 <_death> i think a flat distribution seems reasonable
16:09:42 <AJC> _death: you can use it to find the most probable solution to your program
16:09:49 <cwenner> but rather take something that appears to work pretty much as we would want it to do intuitiviely
16:09:50 <AJC> problem
16:10:06 <cwenner> thye claim you cannot since the possible environments is infinite
16:10:28 <AJC> _death: algorithmic probability makes even more sense with a speed prior.
16:10:55 <_death> what is a speed prior?
16:10:55 <AJC> that is, find the fastest program that simulates the enviroment , not the simplest
16:11:02 <cwenner> perhaps assuming a (generalized?) uniform distribution will lead to some counterinitutive or trivial results
16:11:15 <AJC> this assumes that the environment can't possibly simulate itself if the function is very complicated, so it must be fast.
16:11:18 <_death> the speed of the environment shouldn't matter
16:11:29 <AJC> if it's a turing machine, sure it does :P
16:11:42 <AJC> i questioned that assumption earlier, but it seemed OK for everyone eles.
16:12:07 <cwenner> i don't think we said anything about the environment
16:12:29 <_death> there are still more claims
16:13:35 <_death> for example, absolute
16:14:22 <_death> the claim that it is absolute proves that this measure cannot be calculated by any algorithm because the value is real
16:15:05 <_death> the claim that it is "wide range" is vauge
16:15:08 - - - quit: Kraig (Excess Flood)
16:15:19 <cwenner> In addition to sensibly ordering many simple learning agents, this formal definition has many significant and deiserable properties: … Absolute: Ypsilon (their intelligence definition) gives us a single real aboslute value, unlike the pass-fail Turing Test. This is important if we want to make distinctions between similar learning algorithms that are not close to human level intelligence.
16:15:28 <cwenner> (quote the paper)
16:15:37 <cwenner> (sorry AJC, c&p doesn't work well)
16:15:50 - - - join: Kraig (n=ten.acirbmalani.1tan|sabes#ten.acirbmalani.1tan|sabes) joined #ai
16:16:12 <AJC> that works
16:16:47 <_death> the claim that it is "general" is merely an attempt to do away with the requirement that otherwise-equal agents of differing speeds are of different intelligence measures
16:17:20 <cwenner> that seems a bit harsh :)
16:17:24 <_death> ok
16:17:47 <AJC> _death: the point of intelligence is that you can apply it to any problem
16:17:57 <AJC> so if you do, and get a result, then the score is "general"
16:17:58 <cwenner> as Yaroslav said: it can still be calculated approximately
16:18:04 <_death> this measure is certainly biased
16:18:12 <AJC> _death: how so?
16:18:19 <_death> by the null answer
16:18:31 <cwenner> whatabout it!?
16:20:02 <_death> the agent that gives the null answer is more intelligent than the one that gives any other answer
16:20:04 - - - quit: vertex (Remote closed the connection)
16:20:31 <AJC> heh, that makes everything biased?
16:20:42 <_death> yes, that destroys the entire paper
16:20:50 <AJC> it's like a clot to the brain
16:21:02 <cwenner> sorry, how does it follow that he is more intelligent?
16:21:12 <_death> that's pretty complicated
16:21:27 <AJC> maybe you misunderstood?
16:22:02 <_death> i don't think so; however, the paper is not completely formally specified, so it is hard to be absolutely sure
16:22:08 <cwenner> let's go with the case where the actual environment is one that doesn't care what the agent does, it rewards him if he doesn't pick the "null" action
16:22:49 <cwenner> in that environment, clearly the random agent gets a V of 1 and the null agent 0
16:22:59 <_death> the choice of prior distribution is entirely subjective
16:23:04 - - - join: rubymatt (n=matt@unaffiliated/rubymatt) joined #ai
16:23:29 <AJC> well, somehow outputting null rather than a wrong answer is smart; knowing when you don't know. that's how most multi-choice questionaires work, you get negative points for errors.
16:23:37 <AJC> otherwise any monkey could score 33%
16:23:53 <_YKY_> Let's take an environment that rewards stupidity…
16:23:58 <_death> cwenner: how complicated is that environment?
16:25:30 <_YKY_> That would also destroy the paper
16:26:03 <cwenner> null doesn't act as not leaving an answer, it's merely one of possible answers which recieves a reward [0,1]
16:27:19 <cwenner> _death: that environment would be very simple. So it succeeds for the environment where the right action is the null action but fails for all others, including those where it has been given previous information
16:28:01 <cwenner> the null agent is as good as any other constant-action agent. cannot we think of for example the non-stationary example in the compression paper
16:28:29 <cwenner> nevemrind the non-starionary example, it requires that you knew the right rpevious action
16:28:39 <_death> cwenner: there is an agent that gives a reward by matching on a symbol
16:28:51 <_death> there is a family of those agents
16:29:19 <_death> there is also an agent that does not give a reward by matching on a symbol though
16:29:34 <cwenner> what do you mean, the agent doesn't give a reward
16:29:40 <_death> are they of equal complexity? if so, i am quite possibly mistaken
16:29:46 <_death> an agent that gives a 0 reward
16:29:56 <cwenner> as in the environment?
16:30:08 <_death> err
16:30:09 <_death> yes
16:30:22 <cwenner> yes, i believe those are very simple for most U
16:30:25 <_death> an environment that gives a 0 reward
16:30:59 - - - quit: AJC ("Leaving")
16:31:46 <_death> so there are two environments, one that gives a 0 reward for the null symbol and 1 for any other symbol, and one that gives a 1 for the null symbol and a 0 for any other symbol
16:31:49 <cwenner> but after recieving 0 reward for action X out of (small) m actions, (large) n times in a row, it is likely that the environment gives higher rewards for other actions
16:31:52 <_death> these are of equal complexity?
16:32:05 <cwenner> we could assume so
16:32:07 - - - join: flatliner (i=zn.oc.guhi.pulaid.722-861-901-302|reehs#zn.oc.guhi.pulaid.722-861-901-302|reehs) joined #ai
16:32:09 <_death> right, but those are more complex environments
16:32:21 <_death> i had not thought of that
16:32:30 <cwenner> by the way people, we're taking up a bit too much on our personal dialog so it woud be good if you had another point in mind
16:32:49 <flatliner> re all
16:32:51 <cwenner> something else about the topic that is worth mentioning/pondering about
16:33:10 <_death> well… it appears as though the null symbol agent might not destroy the paper, so moving on…
16:33:19 <cwenner> hello flatliner, article discussion, Legg&Hutter a formal measure of machine intelligence
16:33:25 <_YKY_> You can actually choose a turing machine as the environment that gives a reward exactly opposite to that of a candidate's
16:33:47 <_YKY_> So you can make an arbitrary agent score zero
16:33:49 <cwenner> YKY That's true but it's also taken into consideration
16:33:51 <cwenner> it does
16:34:16 <_death> what are the practical implications of the similar computable measures of machine intelligence mentioned in the paper?
16:34:21 <cwenner> n't jhave any effect on the agent's choice but perhaps shows that you cannot get a 1
16:35:16 - - - quit: ttt- ()
16:35:34 <_death> cwenner: hutter mentions, by the way, that the measure will vary somewhat on different universal turing machines
16:35:51 <cwenner> (_YKY_ you do not pick the environment to score in for their general intelligence (Ypsilon) measure, you iterate over all programs on an universal TM which can "comptue environmental measure of summable reward with respect to .." )
16:35:59 <_death> cwenner: that statement very likely holds the key to resolving our discussion
16:36:28 <cwenner> could you formalize the problem?
16:37:22 <cwenner> is it about the incomputability?
16:37:24 <cwenner> non-*
16:37:39 <_death> cwenner: the paper needs work, but i think so
16:37:42 - - - join: oogaw (n=hc.niweulb.tsuc.58-1.471-312|oogaw#hc.niweulb.tsuc.58-1.471-312|oogaw) joined #ai
16:37:55 <_death> cwenner: (to the formalization question)
16:37:56 <cwenner> i think it's a great initiative otherwise
16:38:04 * flatliner is presently designing a language that I can pay someone to make that can develop a Ai 'bot', 'interface'
16:38:12 <_death> what are the practical implications of a machine intelligence measure?
16:39:11 - - - join: madjestic (n=ur.tenstm.tan-srpg.85|dsa#ur.tenstm.tan-srpg.85|dsa) joined #ai
16:39:23 <cwenner> i think an accepted measure is important by adding additional momentum to the subject
16:40:49 <cwenner> what does flatliner think?
16:41:27 <cwenner> what do you think about their assumption to normalize rewards?
16:42:19 <cwenner> i.e. they scaled rewards for each environment so that the maximum total reward of any action sequence is at most 1 in each
16:42:43 <_death> looks like that is fine with the right prior distribution
16:42:53 <cwenner> the only prior is the KC here
16:43:24 <_death> i don't think that the algorithmic distribution is the right one
16:43:49 <cwenner> perhaps i'm misinterpreting. do they scale it per or cross environments?
16:44:19 <cwenner> i.e. do they scale by a factor possibly unique for each environment or use the same factor for each environment
16:45:23 <cwenner> i.e. alpha_e := maxarg action_sequence: V(e, action_sequence), or alpha = maxarg e: maxarg action_sequence: V(e, action_sequence) ?
16:45:27 <flatliner> i'm trying to setup a internet business
16:45:29 <_death> well…
16:45:32 <_death> let's see
16:45:47 <_death> what sort of distribution is the algorithmic probability distribution?
16:45:51 <_death> i haven't found anything on it
16:46:07 <_death> i would assume that multiple algorithms would have the same kolmogorov complexity
16:46:39 - - - quit: Wagoo (Read error: 110 (Connection timed out))
16:46:45 <_death> (so actually, you can't even approximate this measure, can you?)
16:46:49 <cwenner> generalized: P(d) = 2^-K(d) where K is the kolmogorov komplexity for the UTM's Binary input string
16:47:12 - - - join: vIkSiT (n=Someone@unaffiliated/viksit) joined #ai
16:47:19 <cwenner> i don't think there's anything that says you cannot approximate it but i donä't think there are any great techniques today
16:47:24 <cwenner> perhaps ask yaroslav
16:47:51 <cwenner> as you said, he does say it depends on your choice of UTM but not a lot (but idk if this is true)
16:48:29 <_death> does anyone know where the algorithmic distribution is documented?
16:48:54 <_death> *algorithmic probability distribution
16:49:18 <YaroslavVB> You can view various probability distributions that assign low probability to complex strings as approximations
16:50:20 <YaroslavVB> For instance you can use the distribution P(x) \propto 2^H(x), where H(x) is the entropy of x
16:51:31 <_death> right, but the algorithmic probability distribution doesn't behave like a normal probability distribution, does it?
16:51:52 - - - quit: Kraig (Excess Flood)
16:51:56 <YaroslavVB> it's a function that assigns a number between 0 and 1 to every string, so it's just like any other probability distribution
16:52:25 <_death> (there are most certainly algorithms for checking to see if a turing program of length N halts)
16:52:35 - - - join: Kraig (n=ten.acirbmalani.1tan|sabes#ten.acirbmalani.1tan|sabes) joined #ai
16:53:02 <YaroslavVB> there are acceptors, but there are no recognizers
16:53:25 <YaroslavVB> in other words there are no algorithms which always tell you whether it halts or not
16:54:34 <YaroslavVB> if turing program halts, the algorithm will say "yes it halts". But what if it keeps running and doesn't say anything?
16:54:35 <_death> actually, i have to say that it would be within memory space of size N
16:55:04 <_death> then it would receive no reward
16:55:05 <cwenner> certainly, they have a finite number of states
16:55:21 <cwenner> ?
16:55:23 <cwenner> sorry, how does the reward come in?
16:55:48 <_death> ok.. so it might not be calculatable for size N (sorry, was confusing memory space and program size- -very silly on my part)
16:57:09 <YaroslavVB> You could simply run the turing machine, and output "yes" if it halts, so that could be your algorithm for telling if the TM halts
16:57:11 <_death> if it keeps running and doesn't say anything, if it were confined to a space of size N, it would eventually repeat its tape image
16:57:25 <_death> YaroslavVB: lol
16:58:01 <YaroslavVB> I think if you restrict the size of the tape you don't have a turing machine anymore - - in other words you can no longer solve some problems
16:58:37 <_death> right, but there would be no reward for those anyway
16:58:49 <cwenner> why wouldn't there be any reward for them?
16:59:10 <_death> because it would take forever to solve them
17:00:23 - - - quit: AIrelay (Remote closed the connection)
17:00:36 - - - join: AIrelay (n=43.26.252.46|zevitom#43.26.252.46|zevitom) joined #ai
17:00:41 <_death> right?
17:00:49 <cwenner> he does say it must be computable, so only environments that halts (even if we cannot determine which those are) needs to halt in order to compute the reward
17:01:01 <cwenner> or rather, expected (modified) reward V
17:01:49 <cwenner> well should we round things off?
17:01:50 <YaroslavVB> _death: if you limit size of the tape, you can compute the size of the smallest machine that produces given string, so it could be an approximation to true Kolmogorov complexity
17:03:19 <cwenner> something that seems weird to me though is that it seems there could be several program strings B that could produce the same environments u
17:03:36 <_death> YaroslavVB: are there any problems with a finite-size answer that cannot be solved with a finite-size tape yet can be solved with an infinite-size tape?
17:03:52 <YaroslavVB> cwenner: yes, so take the smallest one, and use it's length as the complexity
17:04:01 <_death> i think there are
17:04:04 <cwenner> in this case, they don't have to sum of to 1 but wouldn't this definition of the algorithmic probability distrubtion not sum up to 1?
17:04:22 <cwenner> distribution*
17:04:25 <_death> i'm trying to imagine one though
17:04:45 <YaroslavVB> _death: sure, if your tape is finite, say of length n, then you can't compute Pi to n+1 digits
17:05:24 <cwenner> _death: i believe you could say that you do not ever need infinite size (it would take infinite time) but you do need arbitrary large finite numbers
17:05:39 <cwenner> arbitrarily*
17:06:55 <_death> you can tell if an algorithm of length 1 halts just by looking at it
17:07:03 <_death> i think you can do that with an algorithm of length 2 as well
17:07:08 <_death> where does it break down?
17:07:31 - - - join: shea (n=moc.rr.ser.pacyn.442-17-622-27-epc|aehs#moc.rr.ser.pacyn.442-17-622-27-epc|aehs) joined #ai
17:07:32 <_death> actually..
17:07:38 <YaroslavVB> at length 3 ;)
17:07:52 <_death> lol
17:08:14 <_death> i need to look up the precise definition of universal turing machines and think about it a bit
17:08:37 <cwenner> _death: do you know of the halt/stop-problem?
17:08:47 <_death> cwenner: the halting problem? yes
17:08:56 <_death> for now, i need to get driving
17:09:03 <cwenner> well round the night off
17:09:09 <_death> i am going to new york
17:09:19 <cwenner> interesting :)
17:09:22 <cwenner> last time, chessguy gave a summary and minor speech
17:10:19 <cwenner> you do not want to do the honors?
17:10:25 <YaroslavVB> _death: actually we don't know where it ends. All the Halting theorem says is that there exists at least 1 Turing Machine which we can't prove that it halts (or doesn't)
17:11:04 <YaroslavVB> It could conceivably be that Halting problem is solvable for every turing machine except for the self-referential machine used in the proof
17:12:43 <YaroslavVB> I think it could be possible that there's a Turing machine of length n that can solve the Halting problem for all Turing machines of length n-1
17:13:52 <YaroslavVB> for an infinite number of different n's
17:14:18 <_death> YaroslavVB: i would very much like to give that subject some thought sometime
17:15:16 <YaroslavVB> http://www.scottaaronson.com/writings/bignumbers.html
17:15:20 <YaroslavVB> this is somewhat related
17:17:14 <_death> summary: the measure of intelligence is not computable by practical means. it seems largely arbitrary. it lacks any meaningful use of the concept of time. some of the claims are overstated. nonetheless, hutter provides an elegant theoretical measure and an educational paper.
17:18:18 <_death> YaroslavVB: 9 is the biggest number
17:18:28 <YaroslavVB> not computable means you can't compute it to arbitrary precision
17:18:43 <YaroslavVB> but it doesn't mean you can't compute it, to say, 100 digits of precisiou
17:20:10 <_death> so it is computable by practical means?
17:20:19 <cwenner> A summary of the night's discussions: there's some uncertainity of why the PAQ-programs should be considered having neural nets in them as well how the correlation between expert knowledge or the data size and the prediction efficiency. In the second article, there were a number of assumptions or claims that could be seen as not quite justified. Strong emphasis was put on the lack of incoperating the time of the agent'
17:20:25 <cwenner> complications of non-halting computations and peripherals thereof.
17:20:28 <cwenner> and as _death points out whether it's practical or not
17:20:35 <cwenner> good discussion, thanks for your time
17:20:38 <YaroslavVB> _death: yes, the open question is the precision to which you can compute it in reasonable time
17:21:33 <YaroslavVB> I think the first article was somewhat out of the mainstream. Many people have worked with ways of combining models before, but this paper doesn't cite them
17:22:15 <YaroslavVB> _death: you could always go for "bigest number describable in 40 words or less, plus 1"

page_revision: 3, last_edited: 1160102536|%e %b %Y, %H:%M %Z (%O ago)
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.