Statistics and Linear Algebra 5
Posted 阿难的机器学习计划
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Statistics and Linear Algebra 5相关的知识,希望对你有一定的参考价值。
1. The way to get the minimum number in Pandas:
lowest_income_county = income["county"][income["median_income"].idxmin()] #[income["median_income"].idxmin()] returns the index of minimum number.
high_pop_county = income[income["pop_over_25"] > 500000]
lowest_income_high_pop_county = high_pop_county["county"][high_pop_county["median_income"].idxmin()] #find the county that has more than500000
residents with the lowest median income
2. random function , after random seed, only one call of random will follow the seed:
random.seed(20) #setup the random seed
new_sequence = [random.randint(0,10) for _ in range(10)]
3. To select certain number of sample form data:
shopping_sample = random.sample(shopping, 4) #select 4 data from list shopping
4. Roll a dice for 10 times in the range 1 to 6, and histogram the result into to a hist with 6 bins.
def roll():
return random.randint(1, 6) # create a function to generate a random number from 1 to 6
random.seed(1)
small_sample = [roll() for _ in range(10)]
plt.hist(small_sample, 6)
plt.show()
5. Roll the dice for 100 times, and repeat this expertment 100 times:
def probability_of_one(num_trials, num_rolls):
probabilities = []
for i in range(num_trials):
die_rolls = [roll() for _ in range(num_rolls)]
one_prob = len([d for d in die_rolls if d==1]) / num_rolls
probabilities.append(one_prob)
return probabilities
random.seed(1)
small_sample = probability_of_one(300, 50)
plt.hist(small_sample, 20)
plt.show()
6. Random sampling is more important than picking up samples:
mean_median_income = income["median_income"].mean()
print(mean_median_income)
def get_sample_mean(start, end):
return income["median_income"][start:end].mean()
def find_mean_incomes(row_step):
mean_median_sample_incomes = []
for i in range(0, income.shape[0], row_step):
mean_median_sample_incomes.append(get_sample_mean(i, i+row_step)) # pick up the mean of 1-100, 2-101 ,3 -102
return mean_median_sample_incomes
nonrandom_sample = find_mean_incomes(100)
plt.hist(nonrandom_sample, 20)
plt.show()
def select_random_sample(count):
random_indices = random.sample(range(0, income.shape[0]), count)
return income.iloc[random_indices]
random.seed(1)
random_sample = [select_random_sample(100)["median_income"].mean() for _ in range(1000)] # get the mean of randomly 100 number
plt.hist(random_sample, 20)
plt.show()
7. If we would like to do some calculations between the sample columns, we can do it like this:
def select_random_sample(count):# This function is to get "count" number of sample from the data set
random_indices = random.sample(range(0, income.shape[0]), count)
return income.iloc[random_indices]
random.seed(1)
mean_ratios = []
for i in range(1000): # loop 1000 times
sample = select_random_sample(100)
ratio = sample[‘median_income_hs‘]/sample[‘median_income_college‘]
mean_ratios.append(ratio.mean()) # Get the mean of the ratio between two column and append it into the target list.
plt.hist(mean_ratios,20)
plt.show
8. Santistical Signifcance, the way to determine if a result is valid for a population or not:
significance_value = None
count = 0
for i in mean_ratios:
if i > 0.675: # We get 0.675 from another dataset
count += 1
significance_value = count / len(mean_ratios)# The result is 0.14, which means in the result there is only 1.4% percent of country salary is higher than the one we get from salary data from after the program. Which means the program is really successful
以上是关于Statistics and Linear Algebra 5的主要内容,如果未能解决你的问题,请参考以下文章
Statistics and Linear Algebra 5
Statistics and Linear Algebra 6
Statistics and Linear Algebra 2
Statistics and Linear Algebra 3
The implementation and experimental research on an S-curve acceleration and deceleration control alg