dr. McManus who's been a friend to our program for as long as we started in 2010 and 2011 so he's got shown him later all of the graduates we have a lot of that comes from the stuff that he'll be able to help us out with in the early days when we had herses funding we could have built here more often now we can't because we have no money complain about it I don't know what anyway that's not to say that students haven't been able to contact him so what I have done in it by way of having this session and by offering to him as an opportunity as the possible opportunity is that he does private consultation and we don't have any other mechanism it was in the college unfortunately I'm going to try and get him to be official Malloy at some point so he knows that it sort of count on that it would be here but right now it's independent and it's private and I just to be sure that you know and anybody who would view this that our relationship has been so good that he's allowed us to use his book and offer what we're calling individual modules from time to time but he does this as a whole program and what I'm hoping that unfortunately for Linda's your kind of stuff being the student that you're going to look at stat to go um I guess I haven't thought about that for a while but the other students will have a chance to look at this will have a chance to know what to ask him for so that way we set up that mechanism of being able to contact that's the important reason for having still with us today the I'm not going to do a lot by way of introduction you did see the flyer and he had his he has his own consulting firm called stats whisperer he has contacts on that it's all online there are bits and pieces that are incredibly valuable he has done a newsletter in the past I'm not sure if you've got new ones which I haven't seen new ones in that Russia present us some new ones yeah but in any case they're all available he's been amazingly available and he'll tell you about one that you can find for the question that you heard actually bootstrapping and cleaning those are the kind of high-end questions what I asked him to present today is a little bit of his style of how he does the web tutorials so that we can encourage in this best way possible that if you're a newbie and this is all new to you that you are interested in being part of it that you can contact Bill directly or through us so that's kind of the efficient way that I want to be able to say our relationship at this point is sustained so a couple of other colleagues of mine dr. Marcia were really dr. Marcia tation and dr. juice James Gorga dr. Tobin and I'm doing an intro for if anyone watches this video and but important thing for you to know too is because our relationship goes right back Bill's been kind enough to allow us to have the competed but he's going to talk about house and does the seven-step that's a complete consulting opportunity for our students I've been trying to encourage foster you did the whole seven satellite oriented to and I think she's coming too so as the seven steps whole package this is sort of a sampling of what you would get but more importantly I think for everyone my colleagues included is that we never know when to call you know I and you can't ask the statistician and always I just want to answer your question and most people do so we learned a great way to phrase it and I don't know if you'll appreciate it but the way to phrase answer is okay if it's just a minute then on one foot and asking her questions and I think that's a really good way upset keeping the kinds of questions what people would perceive as a question so my half to him as you saw on the flier or as Wendell I need a statistician you know at what point do I have data collected when you go all of the steps it's like this giant black box oh then I get there and then what do I do and I've asked them if he would do that so without further ado I'm going to get me started that you have all of those I'm going to get this guy out of your way so that's up in the corner you're going to see yourself right there in that corner and that's what anybody who's logged in would see but you can talk to your slides and you can use the up and down cursor or just the spacebar stick that up so it's real easy that I think this guy works too but I haven't tried it so you're going to have to say the factor Billy yeah thanks Ronnie hi everybody what's happening he starts on the videotape yeah you know just roaring you in being friendly before I hit you with stat okay so so yeah thank you that's very nice so basically when do you eat someone right so there’s a process to data analysis right like everything should be able to be broken down into a recognizable process right you sit down you do this you do that you do this to do that you do this the trouble it like like surgery for example right if you’re a surgeon there’s stuff you’re supposed to do and there’s stuff you’re not supposed to do right so supposed to wash your hands and then give anesthesia right and like all these things you’re not supposed to drop watches and people which sometimes you read like get a paper than everything right so that’s like a model of surgery you do these things you don’t do these things so what would be great if there was like a method of process for data that you could follow right and right now there’s not a process right it reminds me like early surgeries right you’re like bloodlettings and all this stuff so you’re glad your camera so but once was a point where people just did surgery the best they knew how until there was a standardized model of how to do it right right now statistics is at the point where a lot of people do the best they know how some things are included and some things are left out so if we had a standard model of what to do and when for a data analysis study we could do it we know all our bases are covered is comprehensive and we know to be valid right but right now there’s no like I said there’s no model so I’m going to go through the seven steps of data analysis model so you can see the entire process of data analysis all laid out so when when do you need somebody to help I guess it’s like when you’re going through the model if there’s like a snag or you have questions that’s it and then how how big the questions are if they’re one foot or two footer right then you know you might need someone or not see someone so binary logistic regression we’re going to we need something to apply to go through the set of sets of data analysis everybody listening right all right so we’re going to go through binary logistic regression which is the rock star of aggression and I’ll tell you why but first let’s make sure the cursor is in the middle of the slide just move the mouse oh those yep and you can either click on it right all right all right here’s here’s that here’s my chance to to leave some fears okay let me get your finger out of there we’re going to be up a little tiny bit so we don’t lose any of your slide that’s super so I don’t know I think you’re missing some of that switch so grab that look it up yeah and it’s so that’s from that you look at how tiny this is what we teach in this room I can’t see the screen yeah now the whole slide is in view like pom pom okay all right thank you yeah so here we go to make statistical analysis like the goal the goal should be to make statistical analysis clear and comprehensive is that possible all right we’re going to see but that should be something you can expect from doing data analysis it’s clear what to do right as comprehensive it includes everything we need to do it’s doable it’s clear but also you could sit down and apply the techniques as successfully do it and it’s fun so it’s fun like this anyone ever here at Tai Chi that’s the right well the first thing they tell you in type you see that that was good movement so that was the blabbing professor so and Tai Chi the first thing they tell you is look it’s helpful and beneficial but first you have to learn all the moves right and that’s stressful at first so at first is stressful then you learn the moves and then it’s enjoyable and helpful right so it’s the same thing with fun with statistics it’s fun after you’ve kind of gotten a handle on what you’re doing all right yesterday all right so today we’re going to go through the seven steps of data analysis which is like a recipe for a research study right got a recipe for a cake make a cake you got a recipe for a research study you make the research study okay so we’re going to implement binary logistic regression which is below because it produces the odds ratio y’all over here like if you eat fruit you’re ten times more likely to live to 100 and but the ten times those will people up because you don’t have to be a scientist to understand right like okay like that so alright so binary logistic regression is used so this is a mnemonic all right there’s a lot of mnemonics going on around here when you have two categories sometimes they greevey eight categories of cold pack right so two categories to test so that’s when you use binary logistic regression when you’re predicting two categories to be yes/no could be recovered didn’t recover patients under selection no patient satisfaction any outcome that’s in two categories so to implement their study we need examples right so we need a dependent variable with a two category response so anybody ever see what about Bob I feel good oh great I feel wonderful okay so two categories so when Bob first saw his psychiatrist in the office he says you are you Mary it says under forced and he said why and he said because there’s two type of people in the world those who like Neil Diamond and those who don’t a my ex-wife luncheon right and that I didn’t prove a Leo Marvin said that I see so what you’re saying is that even though you’re at you are an almost paralyzed multi phobic personality who is in a constant state of panic your wife did not leave you you have turned because she liked Neil Diamond’s alright so that could be our dependent variable do you like Neil Diamond yes or no and as I said it doesn’t matter what the variable is that determines the statistics used it’s the structure of the variable any yes no likey Neil Diamond yes no recovering from the flu Yes No so if I eat more vitamin C is unassociated with the outcome of recovering yes no it’s just you know you’ll diamond it and statistics what you’re studying can actually detract from the statistical methods you’re learning so if I say lung cancer everyone you know you’re like feeling that you know I mean if I say do you like Neil Diamond it’s kind of lighter dude I mean and who hasn’t wondered cuz I like deal diamond and I say why on earth would anybody not like yoga okay so uh so we’re going to find out everybody with me so far right alright we haven’t gotten into the statue so so who is thank you gently Neil Diamond it’s like the rock star so technically we’re using a rock star to examine a rock star because the regression model the rock star Neil Diamond is a rock stuck a toe he’s from Brooklyn his best-known song is Sweet Caroline and he’s been active since 1958 he was one of the first singer songwriters you know like they all used to sing but they never real doing stuff to run the first thing you’re so nervous you should really call me a thank me for money yeah so we needed an independent variable okay now for our model independent variable can be any structure it to be continuous that’s at the interval and ratio that’s like age so as it increases once you’d be four five six seven eight it means a greater presence of or less presence of it goes up and down they said Felker or categorical and so categorical is a male/female like categories so the independent variable could be either now so just is your name Caroline I think I know this because I want someone in class when I was doing this and we were saying who likes Neil Diamond and her name was Caroline and she was like of course at usually 23 but this needs to be a salient victor of what would make people like Neil Diamond so that would be our independent variable so the independent variable and the dependent variable now here’s another kind of variable so predict like a predictor predicts an outcome there could be two kind of predictors for our purposes today first an independent variable which is the variable of interest right so if we’re seeing a vitamin C here’s a cold yes no the vitamin C is an independent variable of interest but there’s other stuff that cures colds stress levels um you know the heating in the house you know diet all that stuff right so if we want to know if vitamin C cures a cold we want to control for all the other things that impact here the cold right so it’s like this I might want to say if a positive attitude is associated with reduced chances of having lung cancer yes no right so then I find positive attitude predict with lung cancer no right but then I put in I want to control for other things related to lung cancer such as like the number of cigarette packs you smoke today right so then I couldn’t the number of cigarette packs right like 20 packs a day then all of a sudden your positive attitude is no longer related so developing lung cancer because doesn’t matter how good your attitude is right if you’re smoking 25 today it’s so good for you right so basically it’s other things that impact the outcome that we want to account for so you can see the true impact is an independent variable so age and region right so if Mew line has been around since 1958 he was hip and he was like tip in 1958 and now people are like Kuna right the UM we might want to control for age and also region so if you’re from Brooklyn it’s like how we went to Brooklyn loves Barbra Streisand’s right so you might want to call region raised right so those will be our covariance so we might have these hypothesized relationships right there’s what we out of mine if you’re in the Northeast more likely to like Neil Diamond higher age more likely if you named Caroline more likely as if you’re named Caroline you live in Brooklyn and you’re older unless you run into I mean you’re like I’m come on you just ignore us you oh my god I never liked that guy okay so these are the seven steps all right so it’s like a rabbit out of a hat right here if you follow these seven steps you can produce a comprehensive legitimate and effective quantitative research study right and if anyone wants these flies like a seventy are they available on mega mega yeah so we’re going to go through first one study map we’re going to skip two and three right and then because I want you to be away for four five and six and then we’ll do four five and six six is multivariate analysis that’s where the binary we just refreshing it and seven is the write-up okay so we’ll apply our study of what makes people like Neil Diamond’s write to these seven steps now this is the study map this is like a name I made up so Celyn says that I’ve never heard of that before that’s why I made it up okay that’s you’d be surprised when you say to people like we’re doing a research study what exactly are you examining and then they go oh right and you know that’s easy to do because get a lot of ideas when you’re smart you have a lot of ideas but you got to write it down so here’s what we do we put on the right side the dependent variable do you like yield IDs now we put on the Left the two co-vary the predictors which here are the two covariate variables and the independent variable so you can see from the arrows going toward the outcome we’re looking at how each of these predictors predict the outcome cool y’all over here they say a picture’s worth a thousand words this is this is what they mean right so if I say to you ah and then like people start zoning out right right you just show the thing so here’s what I’m looking at like okay and they don’t have to be a scientist right here’s what people love smartphones and Steve Jobs into some purpose because you could pick it up and now look at any directions and start using it right the same with this just shows them this is what I’m looking at and then it’s immediately accessible well you know I’m not promising anything no but I don’t want anything but I’m thinking is we want to make that page somewhere in the dissertation it boils it down to we try to do it in them in models and then they get kind of complicated the models have too many words there ought to be a page for the analysis of what exactly are the pieces and the it ends up the clearest lon started out with a drawing and the ones I’ve seen and that people then address each of the pieces I know some people who did that one at a time and it kept it in line and so I think that’s really good piece of advice I won’t call it the Bannon drum reciting that but I think it’s I think it’s a good piece of advice well there’s an old Chinese proverb where they say he who is lost should look back at the beginning where you started them you ever like do like a literature review or start doing a study and then you forget what the hell you were looking through there think you have to study rather oh yeah yeah that’s right okay so there are three stages of slightly Bachelor right she’ll only have the Bachelorette no I don’t watch the Bachelorette but if you do it’s okay nothing wrong with that so there are three stages of data analysis within the the seven steps model is step four or five and six all right so univer a bivariate multivariate analysis does I make immediate sense right so with it they’re like The Bachelorette show all right so the first show is anyone know what I’m talking about the Bachelorette shop just like this show where there’s this lady and she’s all like you know and then there’s like a hundred guys all around her saying pick me pick me right sort of sort of regrettable television but it’s good for this illustration right so that’s what happens is there’s this lady she there’s all these guys around there and then she picks the ones that she thanks and then she makes a final choice at the end of who she is going to whatever they do I’m sure it’s perfectly respectable okay so step four the Uni level uni meaning one right so in the bachelorettes show the first show all these people is sitting there like this and then they just talk about themselves right they’re talking about their individual characteristics right so in the first show they talk about individual characteristics of each contestant and the Bachelorette – she does well I’ve been looking for you know you know you mean so for example bachelor bob here right from Ohio like type gain looking for an LTR which is a long-term relationship all right so indeed enough is the first step of data analysis step for univariate analysis you need meaning one you’re looking at one variable at a time so it’s been describing one person you’re describing one variable for example if we looked at the dependent variable by itself we say do you like Neil Diamond then 50 people which is 50 percent say yes 50 people which is 50 percent say no so you’re only looking at that one variable on the characteristics of it okay the next is the bivariate level by meaning – so in the second show of The Bachelorette she starts hanging around with all the guys right and they’re making comments and she’s saying kind of figuring out which one she has a significant connection with hey so the one she has a significant connection with they go to the third round the one she doesn’t have a significant significant connection with they go home or wherever they go but they leave the show right up you got a bachelor one to it great right so if she had a significant relationship with two and three and not with one one goes home and just go to the next round right so in bivariate analysis you’re doing a one on one test between the predictor and the dependent variable so it’s a one on one test like do you is your name Caroline you know do you like Neil Diamond one-on-one test just like one Bachelorette and The Bachelor right so age like Neil Diamond region rates like Neil Diamond right and it’s one-on-one test and the ones that are significantly related to the outcome go to the next stage they want the variables that are not don’t okay this is called sometimes occult backward regression alright the third eye the Rose that’s like in that in that show The Bachelor is that she like walks by all the men she a significant relationship with and gives one the rose and that means that’s the strongest relationship right so multi multi very multi many bachelors many variables so the third step and data analysis is you take all the variables that were significantly related to the outcome from the bivariate level you put them in one regression model you regress them all at the same time on the outcome and some become not significant and some become even more significant and you can see which is the strongest predictor of the outcome liking the old ironmans so that everybody on that makes sense right I still don’t think you’re right so how many statistical tests do you need for each stage of analysis so this is a univariate test kit so there’s two types of variables there’s categorical and continuous so both of them you run a frequencies test on sdss input like in the book and everything it shows you like one two three four five like step step by step quick this quick that I click that to come out with the output so I’m not going to do SPSS I just have the output here you just have to trust me it’s very it’s very doable to produce it okay so for the categorical we want to present the number and percentage within each category right number of percentage male and female like like Neil Diamond Yosef and then for the continuous who want to present the mean for like ages continuous average age the standard deviation the highest and lowest age right we’re describing each variable so there they are plugged in there so the categorical the region raised your name Caroline you like Neil Diamond and then they just continuous and this is what the output looks like so when you do this in SPSS this is what the output looks like for the continuous variables it’s actually both I’m just selecting it because you don’t need some categorical so the average age is 35 the standard deviation seven point six seven and it’s version the highest the youngest and all this is 1850 right it seems pretty innocuous at this point right no right that’s the best then we have the categorical variables so the number of percentage within each category so the Northeast is 56 which is 56% la-la-la going down then do you like Neil Diamond 5050 and so not we’re not tilting it 5050 and then are you name cut now you may notice there’s a an over-representation of girl has Caroline right uh is because you know it’s generated in what I think you know so then again you never know if you’re in Brooklyn right around where it’s from okay so then the bivariate analysis how many statistical tests do you need to do bivariate analysis well there’s four for our purposes here now this is a cross-sectional study that’s where you have one time point right sound like a longitude or spree pose people follow so for our cross-sectional study we can consider four and the one you select is based on the combination of the structure of the two variables and so I’m going to show you how to talk about so if you have a categorical variable two categories right such as like Neil Diamond yes/no and a continuous variable like age you use independent samples t-test right now we don’t need the center two for this because we because variable B is we don’t need it trust me okay and then the last one is categorical categorical you use the chi-square now here you go this is all we need to do for bivariate analysis we need to do one independent samples t-test because we a combination of continuous age with categorical two categories Neil Diamond and then the other variables are all categorical categorical you see variable B that’s the dependent variable the predictors are on the left and you can see when you plug them in so that’s the whole thing you could plug him in here hey so the bivariate test key is great because like once I’m gone you can plug it in yourself you don’t need many so I’m making myself obsolete so cat raised region raised and liking Neil Diamond are all categorical chi-square named Caroline and Neil Diamond their contract so when you do a chi-square in SPSS I guess I’m not leaving you drive that there’s instructions and in the book there’s also instructions online how to do Chi squares and all that so you don’t necessarily need to ask me but your ty squared this is what the output looks like so what’s the problem it’s like what do you look at hey Where’s Waldo y’all over here is a Pareto principle site judge I’m saying you know it cuz it depends you have to kind of add it up the different ways to leverage exactly it’s like if you go to work somewhere they give you a big manual right and then you were friend working there ready and there’s like all you need to know is this in this right that’s what you need someone to tell you with this stuff right so there we go that’s what we’re looking at first we’re looking at is the difference statistically significant because if it’s not statistically significant in other words below coin oh five for our purposes then there’s no difference at all right it’s just considered a no difference so there’s a difference because the significance is below 0.05 is point zero zero zero all right so we want to we know there’s a significant difference then the question becomes well what is the significant difference so if we look in the yes column we can see the percentage is really better to look at than the raw account numbers so we say people raised in the Northeast you see that on the left there North East people raised in the Northeast and the yes column 71.4% of them like the alignment right then in the North West only twenty three point one percent like Neil Diamond and the southeast 22.2% like the alignment so it’s the conclusion you live in the Northeast you’re like Neil Diamond right significant higher percentage of people that live in the Northeast like Neil Diamond’s it might be a long so if we’re looking at our study map we can plug that right in there right so we say on that arm test one is statistically significant so they’re going to the they’re going to the show the next the next stage right then we have age we do our t-test that’s what the output looks like must be SS and boom we see in the lower sig 2-tailed that is a statistically significant difference all right and under mean you see do you like Neil Diamond yes the average age is thirty eight point eight eight do you like you’ll Diamond know the average age is 30 point-0 saw there’s significant difference where people have a higher average age are more likely to like Neil Diamond all right so then boom right to this whole thing gets unwieldy really fast guys look you got to keep everything managed like this alright so yeah I’m gonna ask you to do one thing for me go back to slide because it’s on video I want you to talk to people on the video and tell them what the Laveen test first tell they don’t have to look at right but so often they’ll look at the Levine test instead of the t-test and start recording the Levine test Amory and I both have that as an issue but if you look at the sig under Levine test ignore it right unless it makes you look at a different number I want to do quickly what that means I like the Levine’s test so I always always hear like dislike lady like dolly this is what I’m telling you you know um so so uh I think I heard the name Marsha Levine once and I always imagine her talking during stuff so the Levine’s test you see it as equal variance assuming equal variances Auto suit when there’s equal when the variance is not equal right so you have two categories yes or no right and then we’re looking at the variance of the residuals for age when the variance is assumed the Levine’s test is not going to be statistically significant which means it’s above 0.05 and then you use the top row of numbers right so if it’s point oh five or below it’s significant but it means is significantly unequal variances so you use the bottom row make a little arrow thing because it’ll show up on the video that try just later long ago this is among just use the top line yes if it’s course if it’s above point O five you use up here if it’s proto farmer below you use down here and that’s important because it’s so forgotten because SPSS gives you so much stuff and one will see significance they see that point oh five and suddenly they’re saying that it the test itself is significant edge not right and you have to look at the T value to see what that is yeah that’s bad okay the first point oh five is bad and everything that’s a good way put it to lurking you impotent turkey new mrs. Levine is tricking anything I think it is the feeling like oh but if I had equal variance I have significance of so it’s like saying if I head on here my life would be whatever you know I mean isn’t there so it isn’t there right I was like in England say if your aunt how to hatch be your own color no that’s exactly right I love what you do on the video because all people are hearing your voice and you have ways that people can remember that which is why I wanted you to be able to say that on this particular report oh you’re absolutely right it’s very annoying when the numbers don’t come out the way you want you know it’s like come on people and a lot of the times the significance is different for the equal variance assume do not assume and you’re like I want to read the other one yeah hmm you’re welcome all right so test two is significant then the Caroline so this Chi square and we can see the first thing you look at is the bottom right it’s statistically significant so there is a difference what is the difference if you look in the yes column you’ll see people not named Caroline technique Caroline no 19.2% like Neil Diamond but people that are named Caroline eighty three point three percent like Neil Diamond that’s a big difference now if you’ll notice something so if you notice here right so everything is about like contours here we could say which variables are significantly related to like e Neil Diamond but we can’t see which variables are more strongly than other variables related to to Neil Diamond right so give three people that like you you’re like I wonder who likes me the most you know do you need like a test right so that’s what the next stage is saying of the significant predictors which is the strongest and you can’t see it here now some will be you’ll say somewhere no longer significant some are more significant so we have univariate test K we have the bivariate test k and now we have four multivariate test K so this is if we’re going to use regression if we have a continuous variable we do linear regression categorical two categories binary logistic categorical with three or more multinomial logistic so in our case we have a dependent variable is categorical of two categories like in yield I mean yes/no so use binary logistic you see it all makes sense if you like have these little things to follow if you don’t it’s rather than a beuliss and customer alright here we go so the first thing we need to think of here is observe the structure of the predictors in the model so the first one is what region the United States are you a if it has more than two categories the second is you main Carolinas two categories and the third is continuous now the problem we have here is in a regression model the variable has to be either dichotomous which means those two categories are continuous for use from the regression model here we have of the first variable region which has three categories so we can’t use it in the regression model so what to do this is like the time you would call like the some people I just have a quick question this is not one but yeah I have a question but this is like a landmine sort of thing because unless you knew was a problem you might not know it’s a problem you might just put it in categorical without dummy coding it but that’s what we need to do when dummy code it this is another mnemonic you got a dummy you got a code you got a dummy code right so when you have a predictor with one variable with more than two categories you have two dummy code it which means you make each variable yes though variable right some of you all have heard this many times before so thank you for listening again so anyway so what does that mean so that means there are three categories within the region so north east southeast and northwest so each of those has to be a separate variable as in one variable Northeast yes No the variable south TCS no another variable Northwest yes No right so for example the north east would be north east is coded as one then the other two are coded as zero and you’ll see in the model what it looks like so like I said this order stuff online – about dummy code a yeah you just have to know to do it that’s what you just think it’s like you have to do this then you go to google you google it you know how to do it you just need the person to say you need to do this so so for in the regression model the first category is going to be left out to serve as the reference group you’ll see what I mean so what do we want to look at after we run the regression model so we put all the predictors in the model then we run it and then we have a lot of numbers it’s like a pager number is like this thing so what do you want to look at so first the overall significance of the model if the model is not significant nothing else in is believable like if you know someone who’s a no liar you know they’re a liar so whatever they say you’re not going to believe it right so the models are significant we don’t believe anything SS so then after the model if it’s significant which predictors are significantly related to the outcome and then we want to know of the significant predictors which is the strongest predictor of the outcome okay because you can have a significant relationship but the effect size is so small the only means sometimes they say you’re ten times more likely to do this and then you know people say you’re 20% more likely to do that right so it doesn’t sound much different right but it’s hugely different because once you get below twice as likely that’s when you get into percentages so something’s like three times more likely then it’s two times more likely then if it’s less than two times more likely it’s ninety percent more likely it’s eighty percent more likely so they’re smaller effect sizes so if I say to you if you do this you’re ten times more likely to get rich and if I say if you do this you’re ninety percent more likely to get rich what would you do the ten times more likely right you gotta know it right so we look at the effect size this is a nice phrase getting rich okay so the overall regression model significance you’ll see this the omnibus test of model coefficients and it’s highlighted point zero zero zero so the over a lot of significant so then we move on I don’t know who that’s supposed to see I just thought it looked kind of interesting okay my wife’s like so boy you gotta put something there yeah anyway so then we will get the individual predictors alright so now you can see at the bottom Caroline point zero zero zero so statistically significant age below 0.05 at point zero or zero one is statistically significant but now the region was significant at the bivariate level now it’s not right so that could suggest that one of the other variables explains the other predictor override above so if you were named Carolina you’re in Iowa you might be more likely to like Neil Diamond than if you’re not in Caroline and you live in Brooklyn right so something something where that that made their region no longer significant in the context of the full model and that’s sort of annoying because this is the reason when you have a like Caroline right it could have been the case our independent variable was no longer statistically significant and region was the stronger predictor that’s why you run this stuff because you want to be able to say well I controlled for this that and the other thing in the independent variable what’s still significantly related right y’all dig right because people say that did you control for this you consider that and you want to say yes I control for all that stuff and if the independent variable was still related to the dependent variables now you don’t we can’t illustrate it here but the Northeast is left out of the model so these two findings the Northwest and southeast are in reference to the reference group so suppose Northwest was 105 it’s not but suppose it was then we would say and there’s a – which means less likely we would say in reference to the northeast people in the Northwest were significantly less likely to like Neil Diamond’s and so you just have to remember although the reference category isn’t shown some findings for the two other categories are in reference to it you all see what I mean because sometimes if you don’t know to do that you might put all three in there and then that’s not valid all right now we’re looking at the odds ratios right now age is continuous so the odds ratio for age isn’t as interpretable as if it’s categorical for example if you’re named Caroline you’re over 12 times more likely to like Neil Diamond’s and so that’s our that’s our that’s our finding basically so when we control for region and we control for age people named Caroline we’re still about 12 and a half times more likely than people you know accessibility then people not named Caroline – like Neil Diamond’s but and then the answer literally higher age associated with a greater likelihood and at the multivariate level region was not all right we all good that was like a lot of stuff I just laid on you in like 40 minutes you know go back one slide again for me and again for me to try I’ll do some more highlighting one of the things I go back to that year video all the time because you have a very clear way you’ve yellowed the X PC and he said it’s odds ratio why in the world spss then call that odds ratio but that’s the odds ratio column the other thing that’s important to know and he does it in when he explains them and does it more slowly again I use it when I’ve got results I go back to Neil Diamond expert thanks mail by really Caroline because if if you were going to say the 12 times more likely is easy to say it’s twelve point four nine times more likely that’s easy to say when you read someone’s twice as likely three times as likely that’s what that number is going to come off oh but when you get those fractions they’re harder to do and he has there I’m not even going to try and tell you what they are but he is a really clear way of being able to say if those happen to be significant but they were fractions or how you’re able to say it it’s a it’s a one over and alright period I don’t know if you’re gonna do it in subsequent slides it’s just want you to know that’s right you divide one by the exponent EXO while divided by point two six would come out to be something like three point five and that would mean three point five times less likely say you changed where you say it and like I said all of those words is but blown me away I’m a pretty smart person but every time I have to go back and rewrite it it’s not something I can do easily I go back and we’re you can relate to this right you go back to that regret you go back to that model and then you need something to remind yourself that’s when I go to bills work and not and and it even as you wrote them out in the next slide being able to say it’s sort of that step between here’s a printout like you said and what do you say about it right on so that’s that’s something that I think is missing in lectures in statistics courses if they don’t get to that point they show you the graph they show you the chart and then they don’t tell you the words you actually use and I am in one of the good book as a specialist for a book is one of the better books to be able to give you the words to you so again it’s not a cloaca variant right in it and of it but I really like that you do that and and even though you’re contrived stuff didn’t give you a point oh five four Northwest would have been nice to be able to show how you could then say 25 one minus point to be 75 percent less likely or is probably in that much but you’d be able to say it and lay out those words and I always go back to old models that’s kind of when I’m doing that I always go back to something I know it’s correct and then I refer back to it so those of you been my student now I hand you another book and say see how they did it do it just like they did it because to try and say it does the same brain and I started thinking Carolina but so thank you for every will giving me that chance at least from on the recording as well well and and another thing is is like like I remember in my master’s class the advanced statistics where it was learning a chi-squared t-test right and it’s kind of not meaningful unless you know where it’s embedded in the seven steps model because you’re like well I know these two tests now what the hell good is ending right but if you know that that’s part of step 5 like the bivariate analysis like okay so now I know part of Step five the next one the rest except that I do it all right so it’s kind of meaningless to know like one little sometimes they say like a little bit of knowledge Votel yeah right like this it’s like if you don’t understand the whole outline it and you’re just learning little pieces here and there right you don’t know what it’s good for as they need to know the whole outline so if CP is odds ratio and like I said I did – no me I do staff a lot but I have to always remember in that column because people like to talk about the confidence interval and I’d like to talk about stuff I learned confidence interval and I would actually put Northwest a somewhat know that it’s not below 1.0 I would take the upper or the lower that crosses 1.0 line that was the way I learned it over a different way so you take and if you show that column the upper and lower right greater ones right there oh you’re right it doesn’t know around you that’s that’s the null value which simply means if it’s below is that range is below one it’ll be significant if the whole range is above one it’ll be significant so I we reported those you look at medical studies they will often report the confidence intervals they do the odds ratio and the confidence intervals and then it’s just you know spaghetti and it’s still done with its meaning but if you look at that it crosses the 1.0 line it means you don’t even even take out the point oh five everything else it’s statistically significant and that’s what you see the point of seven which I like to talk about is close and I know my transpose languages close let’s approach its approaches there’s a todo approaching where there is a trend or we get the key cautious in interpreting it right but um it you can see when it’s close to that 1.0 line because it’s a ratio it’s close to what let’s set up the Alpha it just it just can’t include the one it cannot include the one in the range I also have a graphic detection that I have learnt on odds ratios and I’ve asked people to do them in SPSS does until my colleague did them and I think they come out of fat but they show the bar put a 1.0 line at the variables region of north east south west age Caroline and they’ll show bars I don’t know if you use it that way they’ll show the bars and and the a dotted line at the 1.0 and then you look at the ones that are above or below but none of them that cross that’s another interpretation of an odds ratio that you can easily see which of those and the last question is the question for you that I always get confused with and I think the way you said it I get it it’s the whole shebang taken together right so the language we can use is by putting it all together I’ve controlled four rows think because the difference between controls for or moderating by or separating out to only do the Northeast Carolina or separating without the language of control for means that Andy showed you if you read each of those regressions alone by themselves they start looking like you could make some prediction about it when you put them together last question before we play oh we don’t kind of serve a good amount is life rather than is scientist own so the last question I have that I always get confused about as well when I get up multiple regression or when I put it into SPSS and I step wide or hierarchical and if you change the order that you put them in does that change the chart you get no that last chart looks like that’s going to be that way no matter which way you put it most sometimes it do the hierarchical and that’s the point of it is like you’ll put like like say age first and it’s significant then you’ll put name Caroline and that’s a give again but then age is a longer significant then you’ll have like step three which is the full model like this but if step three if age is above region and Caroline is book usually it doesn’t change like that in my experience but I’ve always you know and trying to put them into the model and testing each one at a time and then adding another one you look at the r-square change right look at the change but I still don’t know how to conclude I’m still always confused on being able to conclude like for final spot in the final change it’s all there and that’s everything in the model controlling for everything right all right thank you controlling Porter’s like Jefferson it’s a wonderful life right well remember Clarence came along Optima to the life there and then you saw how everything changed so what he did was he controlled for his influence on honors well they say that if you were never born this is what would be going on so you remove the key controlled firmly yeah yeah all right now am i rambling again if I ran this again took out region and I only put two variables and what I have a stronger model would I have us well seeing it one of the assumptions of regression is that all the significant predictors in the data set is supposed to be included in there model okay so you start with other predictors only well you’re put in the wonderheart that’s a pirate yeah and then when you put those in and some get washed out right then your conclusions are I feel so bad for people who and we just had it last week this past week where when they all went into the model suddenly that she could have said something you didn’t do anything but there’s not supposed to be too many either but that’s what that like 12h oh I’ve fought like significant right so putting it in then and the very our square was pretty small anyway because when you said other things account for it right but I want people that have spent all this time to use all these stats when they get it only accounts for 5% of the model and ago or what I do that for us and that’s one thing when you minute and then quite frankly I’m not doing that I’m just going to do something else at least I can make a statement and that always frustrates me because the important thing with a statistical consultant if you want the inflection in your voice to be excited and that it’s fun look at this you have significance here and they have to be able to conclude something’s going on regions something’s going on if you didn’t consider it something about the region you gave a great example if you happen to be named Caroline and you live in Ohio then it wouldn’t right because by by itself something’s going on and there’s always confounders like someone in coming in Indiana named Caroline but I just moved from Brooklyn last year yeah there’s nobody like founders and stuff going on behind the scenes you don’t know that’s right well I may write although glue this accounts for a large part of the model and I think that you can say something like 12 times I think you can make a good case for this is a model but as I said we lose the inflection I know Amar’e’s had where you do all the tests and then your p-values point those six and then nothing happens and that’s not true but I think I’ll tell you something and we’ll try to teach that language if something should be related and it’s not that’s interesting to note I mean because it should be this and then it homicide earlier no much from non significant results you know when the variants from the model accounts a so very small an island area aka the family budget study that was like okay what is this something else then something else is in this right now something else that we didn’t catch well then there’s a publication bias which is a terrible thing because like if I do the study I do an intervention and I do it a hundred times and it works one time and then I publish it then at how often does it look like it worked it was putting a word on every side of the time to finally publish the one time it worked well in it yeah they don’t want to talk about the 99 times it did learn the publication vice versa if somebody has it doesn’t get published you know that information which I think can be important information right yeah no absolutely support again I preaching to you quiet I need the words to to help us beautiful oh yeah so so actually that yeah like in the book – there’s like right up and there’s like a template because the scientific language is complicated so the Bachelorette it’s just to show everyone smiling you know for peer review and writing up your research report it’s a bit more structured it’s probably equal to the other six steps before okay the elephant in the room so seven cents a data analysis model addressing the elephant in the room so like I said to start off with it’s widely not why we discuss but there’s no standard model of data analysis right so you’ll notice like in publications some people will have bivariate and then they wanted univariate and then they will want a bivariate not multivariate and also the tests of assumptions right some will look for outliers and some won’t look for outliers right so that’s like surgery from washing our hands tomorrow when I am right that’s no good right so there should be an agreed-upon standard model of data analysis where people look at it and and it’s replicable and people say yes this is the way to do it right so no not necessarily even this model but there should be a model where you can like go to medical school and they teach you the steps of surgery you should be able to do statistics where they teach you the steps of the process of a quantitative study and all the foundations are agreed upon right I think that classes teaches statistical procedures for very few a statistical test right not oh not not the sequence of how the application and not the other engines right application that’s the word they don’t teach you how to organize it and apply it which is what you need to know right because you don’t need to know those hormones limited by hand stuff how much judgment yeah they only do it by hand this Dressen calculated yes but you need to know that the global thing right right so if there were model you could teach people right students and everything you’re not to be frustrated no little pieces this is how you put it together this I do it right and yeah so intro and then that’s another thing the way statistics is called taught a lot of the times like in the masters will teach you buy a variant stuff and then in the doctorate they’ll teach you like regression and stuff like that but you can see right here as they say Brooklyn is no big book you don’t I mean and if you can handle but it’s not like plutonium you know what I mean that’s the second reference to nucular thanks but it’s not like plutonium you could know what regression is you you know what I mean if you don’t know what it all is right away then you’re lacking right and I guess has a published research the quality can be uneven because like for example we skipped over checks of data integrity but one of them is looking at like outliers right one outlier score can change the results of the study and that’s not accurate finding it’s a finding based on an aberration if one outlier is driving the findings so if you’re reading a research study and one says we check for outliers and there was nine and then the other one doesn’t say we check for outliers how an act are you supposed to have any confidence whatsoever in the second one that doesn’t say we check for outliers and how many studies say we check for outliers all right that’s what I mean so when you’re when you’re writing up the stuff you’re also going through those steps right here’s what we look at here’s the check sustained integrity we did this here’s our univariate bivariate multivariate so just just point of reference so a standard model might you very rich contribution to the researchers now and all the future people were going to torture by teaching them statistical analysis okay and then oh so for further reference see the textbook the webinars l1 or say and that’s that thanks for giving thank you very much I’m going to do live when you start you’ve been doing these live right you’re taking a class sometimes I do a lot of a technical difficulty so I recorded a lot too I just wonder if you hear from people when you’re doing it live like this maybe people or Yahoo do as you’re going along but they have a little thing with a right comment okay yeah you created so what the hell are you talking they find it down I’m out of here what I want to do again the purpose was not so much learning logistic regression but more about that the style that bill will use for our students and for people will refer the most important thing I want to hear from you guys is where you would see might need a statistician what sorts of not necessarily the question can I just have a quick question and I mentioned to him bootstrapping I mentioned to him outliers I mentioned to him missing data those are the kinds of things that he said other than you need a petition so there anything that that Marci I know you’re working on a data set what could be the most helpful to you never looked at the raw data yet now I mean I think that he’ll say the first step is looking at your raw data and one of the first things you look at is missing or do you have an outlier in time so then you have to figure out how to deal with it that’s where I think amber is really strong and can help where we can in the center but when it gets to something with statistical manipulation in transforming things if your data don’t distribute nicely and have to transform them do you do these transformations or talk to that or talk with people about that the transformations are cool but the reasons I don’t like them is because when you transform the variable like if the mean age of 35 right and then it’s a non-normal distribution and you transform the variable then the mean you get within the transformed version is no longer meaningful it’ll be like the mean will be like 37 38 39 so it changes the actual statistical parameters of the variable so a lot of the times what I might do is I’ll say take out the outliers right and then it’s a it’s more normal and then I’ll run the test with the outliers in it and without say the few outliers been missing so it’s like a hundred people then 97 that you are missing and if the results of the tests like the regression or what have you is is similar in terms of statistical significance and all that would suggest that the non normal distribution and the outliers are not having an undue influence on the findings so I just might use the whole group but if those three are driving findings I’ll just leave them out and all right there are three am it’s three rotten apples condom closing us stress everybody worries I can’t do that oh no no I can’t do that so you get a statistician say yes you can it’s really funny that you have to have the permission of making a change some somewhat call it where you just manipulating to get the results you want that’s simply the negative part and we get into these arguments you two statisticians in the room and they’ll argue and so you’re wondering what’s correct and I’m hoping I can be enough that there are a lot of ways to place it and a lot of ways to look at data and and then when you get the confirmation from someone who’s run them in different ways and those will likelihood another thing that happens to us that is a real common problem is the small sample time mom and and what no one learns and I’m going to ask you guys if you ever learn nonparametric test do you know man Whitney do you know will Compton’s do you know the the those things don’t people because we source our data into being in our own retail and we don’t teach it tell me because I’m always reluctant are they as believable well the nonparametric tests are based on rank order so the parametric tests are based on like the variances the scores go up and down a lot a lot so when you get like a small sample then you only have nine scores it’s like four really going up and down too much because there’s only nine scores right so a parametric test is based on rank so you call Nanako not Harris was nonparametric based on right yes that’s right right yeah I’m sorry I just want to make sure to say something that drove us all away but I just want to ask you from your experience with a med school with drug trial there’s strong – what’s that then I’m tired it require usually they agree usually if you do a parametric and a nonparametric let’s say like 20 people using the statistical significance is either they’re both significant the nonparametric and parametric or they’re both not significant and you might even report the parametric and I mention in the methods we also did this with nonparametric test consumable examples and the statistical significance was one that really shows off all the time and use so many clicks you can make an SDS enter for correlations so do I run a Pearson the do I run a Spearman do I run and no we run them both and you see very the same but then the editor writes of some reviewer writes back you obviously ever and data so why did you run a Pearson so you want to know spearmint openness but that’s what I made about having the confidence that they’re really the same they’re really the same they’re win SATA stations to be you have data sets and Fitbit just you tell a nerve so fake with Gilda patients and they’re not going to drill near it you know it’s wrong you’ve got red data that’s wrong so I’m hoping to try and convey that a lot of our space or just they’re going down the road away from statistics because they think they can’t do them because somebody sounds wrong that we love you am i relating if you are able to like putting IVs haha well you do statistic think about it like this do you do something to a patient you don’t know what the hell they could do Terry just thinking about you say this you say that it’s like a combustible relationship right with the numbers the numbers they’re on screen you do this this happens is but it’s a much more telling all right that we have the mess away terrified somebody’s to say it’s wrong and that’s when if you want us to bring this to an end for you do you want to do that I want to be able to say that we get the sense that it’s wrong what we’re not what I want I wanted to is did I report spearmint or Pearson and I want to be able to say that’s when you call statistician and you say in your defense when you go your defense my statistician said amended to had that external to be messy was on one of the committees that was Anakin apples and one of the other methods people learned our wall so it’s not interval data you can’t run that and they’ll challenge you and certainly get nervous and I want everyone to be able to say with confidence and my statistician said so to answer the question that we started with do I need a statistics consultation the answer and thank you Bill is when someone’s gonna tell me I’m wrong I’m going to call you but what also it’s like you said like what when even when you get like top statisticians they disagree with each other because it’s not like a science and a lot of it’s an art because it’s like you get to one place and you could do a B or C and different people different like different authorities that even say to a no I never do that to be and I never do that Omega all of us who are just applied people nervous yeah you just you just have to be able to defend your position you know like someone will say well why didn’t you do this and say well like for the multiple mutation so you’re missing data you could do multiple cute asian which estimates what the data would have been if the person reported it right so someone might say well why do you do well computation you know and then you might say well give it right answer yeah because I’m from an ideological standpoint I don’t want to work with estimated data I want to work with the actual data they gave to the estimated it could be right on target or it could be way off target so your is you’re giving up one or two things you’re either giving up people because I have missing data sometimes or you’re giving up the authenticity of the data because it’s estimated so either way you’re gaining something and losing something you just have to say from an ideological standpoint I feel like it’s better to gain this and lose this and gain that and lose 