json 模糊匹配

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了json 模糊匹配相关的知识,希望对你有一定的参考价值。

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fuzzy Matching Sample Notebook (String to String Relationships)\n",
    "Andrew Stearns<br>\n",
    "8/5/2018"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### References:\n",
    "https://www.neudesic.com/blog/fuzzywuzzy-using-python/<br>\n",
    "https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/<br>\n",
    "https://www.youtube.com/watch?v=ahn-iyQPgpQ&t=1743s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Import packages\n",
    "import pandas as pd\n",
    "from fuzzywuzzy import process\n",
    "from fuzzywuzzy import fuzz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Load Dataset__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Full Dataset\n",
    "#cities = pd.read_excel('Fuzzy Matching Dataset.xlsx')\n",
    "d={'State':[Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,Florida,Georgia,Hawaii,Idaho,Illinois,Indiana,Iowa,Kansas,Kentucky,Louisiana,Maine,Maryland,Massachusetts,Michigan,Minnesota,Mississippi,Missouri,Montana,Nebraska,Nevada,New Hampshire,New Jersey,New Mexico,New York,North Carolina,North Dakota,Ohio,Oklahoma,Oregon,Pennsylvania,Rhode Island,South Carolina,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming],\n",
    "  'Abbreviation':[AL,AK,AZ,AR,CA,CO,CT,DE,FL,GA,HI,ID,IL,IN,IA,KS,KY,LA,ME,MD,MA,MI,MN,MS,MO,MT,NE,NV,NH,NJ,NM,NY,NC,ND,OH,OK,OR,PA,RI,SC,SD,TN,TX,UT,VT,VA,WA,WV,WI,WY],\n",
    "  'No Vowels':[lbm,lsk,rzn,rknss,Clfrn,Clrd,Cnnctct,Dlwr,Flrd,Grg,Hw,dh,llns,ndn,w,Knss,Kntcky,Lsn,Mn,Mrylnd,Msschstts,Mchgn,Mnnst,Msssspp,Mssr,Mntn,Nbrsk,Nvd,Nw Hmpshr,Nw Jrsy,Nw Mxc,Nw Yrk,Nrth Crln,Nrth Dkt,h,klhm,rgn,Pnnsylvn,Rhd slnd,Sth Crln,Sth Dkt,Tnnss,Txs,th,Vrmnt,Vrgn,Wshngtn,Wst Vrgn,Wscnsn,Wymng],\n",
    "  'Scrambeled Letters':[abamaal,lksaaa,aoanriz,sarkaans,ofnaclairi,rooldaoc,tcuninectco,ewladear,lfidroa,geaiorg,iawiah,ohaid,inliolis,dianian,owia,akassn,tkukncey,saniaulio,inema,anydrlma,tssseataumchs,mhcagiin,antisomne,ispissmiisp,smruisio,atnnmao,bkaaesnr,edvnaa,nmhihesraew p,eyjnwsee r,nm xeeoiwc,nr eoykw,l caononhrtari,rtn ohdtaaok,hooi,lhooaakm,engroo,laipysnaenvn,ndlea rhsdoi,sah lcoroiutan,dooh utaatks,nsseentee,xaets,thau,notrmev,vrngiiia,itnowsahgn,gisrivit wane,niscsnwoi,mowigyn]\n",
    "  }\n",
    "\n",
    "cities = pd.DataFrame(data=d)\n",
    "\n",
    "#Correct name\n",
    "cities_correct = cities['State']\n",
    "\n",
    "cities_abbreviation = cities['Abbreviation']\n",
    "cities_NoVowels = cities['No Vowels']\n",
    "cities_ScrambeledLetters = cities['Scrambeled Letters']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Define Fuzzy Matching Tests Functions__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Ratio match\n",
    "def get_matches_Ratio(query, choices, limit = 1):\n",
    "    results_Ratio = process.extract(query, choices, scorer= fuzz.ratio, limit=limit)\n",
    "    return results_Ratio\n",
    "\n",
    "#Partial Ratio\n",
    "def get_matches_PartialRatio(query, choices, limit = 1):\n",
    "    results_PartialRatio = process.extract(query, choices, scorer= fuzz.partial_ratio, limit=limit)\n",
    "    return results_PartialRatio\n",
    "\n",
    "#Token Sort Ratio\n",
    "def get_matches_TokenSortRatio(query, choices, limit = 1):\n",
    "    results_TokenSortRatio = process.extract(query, choices, scorer= fuzz.token_sort_ratio, limit=limit)\n",
    "    return results_TokenSortRatio\n",
    "\n",
    "#Token Set Ratio\n",
    "def get_matches_TokenSetRatio(query, choices, limit = 1):\n",
    "    results_TokenSetRatio = process.extract(query, choices, scorer= fuzz.token_set_ratio, limit=limit)\n",
    "    return results_TokenSetRatio"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Test and Compare Outputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('South Carolina', 83, 39)]\n",
      "[('North Carolina', 80, 32)]\n",
      "[('South Carolina', 83, 39)]\n",
      "[('South Carolina', 83, 39)]\n"
     ]
    }
   ],
   "source": [
    "#My test Text (NO Vowels)\n",
    "test_text = 'sth carlna' #South Carolina\n",
    "\n",
    "#Compare the result of different string tests\n",
    "print(get_matches_Ratio(test_text, cities_correct))\n",
    "print(get_matches_PartialRatio(test_text, cities_correct))\n",
    "print(get_matches_TokenSortRatio(test_text, cities_correct))\n",
    "print(get_matches_TokenSetRatio(test_text, cities_correct))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

以上是关于json 模糊匹配的主要内容,如果未能解决你的问题,请参考以下文章

json 模糊匹配

JavaScript根据Json数据来做的模糊查询功能

php mysql 分词 模糊查询 并根据分词匹配度排序

string模糊匹配*代表多个

python模糊匹配库能否定制匹配关系

SQL模糊匹配