json 模糊匹配
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了json 模糊匹配相关的知识,希望对你有一定的参考价值。
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fuzzy Matching Sample Notebook (String to String Relationships)\n",
"Andrew Stearns<br>\n",
"8/5/2018"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### References:\n",
"https://www.neudesic.com/blog/fuzzywuzzy-using-python/<br>\n",
"https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/<br>\n",
"https://www.youtube.com/watch?v=ahn-iyQPgpQ&t=1743s"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Import packages\n",
"import pandas as pd\n",
"from fuzzywuzzy import process\n",
"from fuzzywuzzy import fuzz"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Load Dataset__"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"#Full Dataset\n",
"#cities = pd.read_excel('Fuzzy Matching Dataset.xlsx')\n",
"d={'State':[Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,Florida,Georgia,Hawaii,Idaho,Illinois,Indiana,Iowa,Kansas,Kentucky,Louisiana,Maine,Maryland,Massachusetts,Michigan,Minnesota,Mississippi,Missouri,Montana,Nebraska,Nevada,New Hampshire,New Jersey,New Mexico,New York,North Carolina,North Dakota,Ohio,Oklahoma,Oregon,Pennsylvania,Rhode Island,South Carolina,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming],\n",
" 'Abbreviation':[AL,AK,AZ,AR,CA,CO,CT,DE,FL,GA,HI,ID,IL,IN,IA,KS,KY,LA,ME,MD,MA,MI,MN,MS,MO,MT,NE,NV,NH,NJ,NM,NY,NC,ND,OH,OK,OR,PA,RI,SC,SD,TN,TX,UT,VT,VA,WA,WV,WI,WY],\n",
" 'No Vowels':[lbm,lsk,rzn,rknss,Clfrn,Clrd,Cnnctct,Dlwr,Flrd,Grg,Hw,dh,llns,ndn,w,Knss,Kntcky,Lsn,Mn,Mrylnd,Msschstts,Mchgn,Mnnst,Msssspp,Mssr,Mntn,Nbrsk,Nvd,Nw Hmpshr,Nw Jrsy,Nw Mxc,Nw Yrk,Nrth Crln,Nrth Dkt,h,klhm,rgn,Pnnsylvn,Rhd slnd,Sth Crln,Sth Dkt,Tnnss,Txs,th,Vrmnt,Vrgn,Wshngtn,Wst Vrgn,Wscnsn,Wymng],\n",
" 'Scrambeled Letters':[abamaal,lksaaa,aoanriz,sarkaans,ofnaclairi,rooldaoc,tcuninectco,ewladear,lfidroa,geaiorg,iawiah,ohaid,inliolis,dianian,owia,akassn,tkukncey,saniaulio,inema,anydrlma,tssseataumchs,mhcagiin,antisomne,ispissmiisp,smruisio,atnnmao,bkaaesnr,edvnaa,nmhihesraew p,eyjnwsee r,nm xeeoiwc,nr eoykw,l caononhrtari,rtn ohdtaaok,hooi,lhooaakm,engroo,laipysnaenvn,ndlea rhsdoi,sah lcoroiutan,dooh utaatks,nsseentee,xaets,thau,notrmev,vrngiiia,itnowsahgn,gisrivit wane,niscsnwoi,mowigyn]\n",
" }\n",
"\n",
"cities = pd.DataFrame(data=d)\n",
"\n",
"#Correct name\n",
"cities_correct = cities['State']\n",
"\n",
"cities_abbreviation = cities['Abbreviation']\n",
"cities_NoVowels = cities['No Vowels']\n",
"cities_ScrambeledLetters = cities['Scrambeled Letters']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Define Fuzzy Matching Tests Functions__"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"#Ratio match\n",
"def get_matches_Ratio(query, choices, limit = 1):\n",
" results_Ratio = process.extract(query, choices, scorer= fuzz.ratio, limit=limit)\n",
" return results_Ratio\n",
"\n",
"#Partial Ratio\n",
"def get_matches_PartialRatio(query, choices, limit = 1):\n",
" results_PartialRatio = process.extract(query, choices, scorer= fuzz.partial_ratio, limit=limit)\n",
" return results_PartialRatio\n",
"\n",
"#Token Sort Ratio\n",
"def get_matches_TokenSortRatio(query, choices, limit = 1):\n",
" results_TokenSortRatio = process.extract(query, choices, scorer= fuzz.token_sort_ratio, limit=limit)\n",
" return results_TokenSortRatio\n",
"\n",
"#Token Set Ratio\n",
"def get_matches_TokenSetRatio(query, choices, limit = 1):\n",
" results_TokenSetRatio = process.extract(query, choices, scorer= fuzz.token_set_ratio, limit=limit)\n",
" return results_TokenSetRatio"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Test and Compare Outputs"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[('South Carolina', 83, 39)]\n",
"[('North Carolina', 80, 32)]\n",
"[('South Carolina', 83, 39)]\n",
"[('South Carolina', 83, 39)]\n"
]
}
],
"source": [
"#My test Text (NO Vowels)\n",
"test_text = 'sth carlna' #South Carolina\n",
"\n",
"#Compare the result of different string tests\n",
"print(get_matches_Ratio(test_text, cities_correct))\n",
"print(get_matches_PartialRatio(test_text, cities_correct))\n",
"print(get_matches_TokenSortRatio(test_text, cities_correct))\n",
"print(get_matches_TokenSetRatio(test_text, cities_correct))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
以上是关于json 模糊匹配的主要内容,如果未能解决你的问题,请参考以下文章