Statistics and Linear Algebra 5

Posted 阿难的机器学习计划

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Statistics and Linear Algebra 5相关的知识,希望对你有一定的参考价值。

1. The way to get the minimum number in Pandas:

  lowest_income_county = income["county"][income["median_income"].idxmin()] #[income["median_income"].idxmin()] returns the index of minimum number.

  high_pop_county = income[income["pop_over_25"] > 500000]

  lowest_income_high_pop_county = high_pop_county["county"][high_pop_county["median_income"].idxmin()] #find the county that has more than500000 residents with the lowest median income

2. random function , after random seed, only one call of random will follow the seed:

  random.seed(20) #setup the random seed

  new_sequence = [random.randint(0,10) for _ in range(10)]

3. To select certain number of sample form data:

  shopping_sample = random.sample(shopping, 4) #select 4 data from list shopping 

4.  Roll a dice for 10 times in the range 1 to 6, and histogram the result into to a hist with 6 bins.

  def roll():
    return random.randint(1, 6) # create a function to generate a random number from 1 to 6

  random.seed(1)
  small_sample = [roll() for _ in range(10)]

  plt.hist(small_sample, 6)
  plt.show()

5. Roll the dice for 100 times, and repeat this expertment 100 times:

  def probability_of_one(num_trials, num_rolls):
    probabilities = []
    for i in range(num_trials):
      die_rolls = [roll() for _ in range(num_rolls)]
      one_prob = len([d for d in die_rolls if d==1]) / num_rolls
      probabilities.append(one_prob)
    return probabilities

  random.seed(1)
  small_sample = probability_of_one(300, 50)
  plt.hist(small_sample, 20)
  plt.show()

6. Random sampling is more important than picking up samples:  

  mean_median_income = income["median_income"].mean()
  print(mean_median_income)

  def get_sample_mean(start, end):
    return income["median_income"][start:end].mean()

  def find_mean_incomes(row_step):
    mean_median_sample_incomes = []
    for i in range(0, income.shape[0], row_step):
      mean_median_sample_incomes.append(get_sample_mean(i, i+row_step)) # pick up the mean of 1-100, 2-101 ,3 -102
    return mean_median_sample_incomes

  nonrandom_sample = find_mean_incomes(100)
  plt.hist(nonrandom_sample, 20)
  plt.show()

 

  def select_random_sample(count):
    random_indices = random.sample(range(0, income.shape[0]), count)
    return income.iloc[random_indices]

  random.seed(1)

  random_sample = [select_random_sample(100)["median_income"].mean() for _ in range(1000)] # get the mean  of randomly 100 number 
  plt.hist(random_sample, 20)
  plt.show()

7. If we would like to do some calculations between the sample columns, we can do it like this:

  def select_random_sample(count):# This function is to get "count" number of sample from the data set
    random_indices = random.sample(range(0, income.shape[0]), count)
    return income.iloc[random_indices]

  random.seed(1)

  mean_ratios = []
  for i in range(1000): # loop 1000 times
    sample = select_random_sample(100)
    ratio = sample[‘median_income_hs‘]/sample[‘median_income_college‘]
    mean_ratios.append(ratio.mean()) # Get the mean of the ratio between two column and append it into the target list.

  plt.hist(mean_ratios,20)
  plt.show

8. Santistical Signifcance, the way to determine if a result is valid for a population or not:

  significance_value = None

  count = 0
  for i in mean_ratios:
    if i > 0.675: # We get 0.675 from another dataset
      count += 1
  significance_value = count / len(mean_ratios)# The result is 0.14, which means in the result there is only 1.4% percent of country salary is higher than the one we get from salary data from after the program. Which means the program is really successful

以上是关于Statistics and Linear Algebra 5的主要内容,如果未能解决你的问题,请参考以下文章

Statistics and Linear Algebra 5

Statistics and Linear Algebra 6

Statistics and Linear Algebra 2

Statistics and Linear Algebra 3

The implementation and experimental research on an S-curve acceleration and deceleration control alg

Statistics gathering and SQL Tuning Advisor