如何自动将csv转换为pandas?
Posted
技术标签:
【中文标题】如何自动将csv转换为pandas?【英文标题】:How to automatically convert csv to pandas? 【发布时间】:2021-11-24 16:07:56 【问题描述】:我在将 csv 文件加载到 pandas Dataframe 时遇到问题。问题是 csv 是从 json 文件转换而来的,这些文件在一些记录中包含嵌套的 json。通常使用 pd.read_csv 读取 csv 文件后,一切都是字符串,即使是这些嵌套的 json(字典)。我想将它们作为字典,然后我可以将两列转换为数据框。这是csv的示例。我希望“LoadNodes”和“LoadMembers”作为具有该行的 SubProjectId 的数据框。
pd 的可视化示例:
ProjectId ModifiedBy UpdatedOn Nodes Members Pipes AirDucts CableTrays Objects3D LoadNodes LoadMembers LoadCases LoadObject3Ds GeometryObjects PipeRings Results-Id Results-Description Results-Name Results-AppData-Version Results-AppData-Application Results-AppData-ApplicationVersion Results-AppData-Name Results-AppData-ModelClientId Results-AppData-GlobalOrientation Results-Statistics-DurationTime Results-Statistics-LoadingDataDuration Results-Statistics-CalculationDuration Results-Statistics-LoadingResultsDuration Results-Statistics-IsCalculated Results-Statistics-AreExceptions Results-Statistics-CalculatorVersion Results-Combination Results-UlsResults Results-SlsResults Results-Exceptions Results-Utilization Results-Spacing Results-Buckling Results-LoadCases Results
SubProjectId
32212 10949 antonio_melicano@cameco.com 2021-04-22T23:02:18.073 ['Id': 1, 'Description': None, 'Name': 'Node ... ['Id': 1, 'Description': None, 'Name': 'Bar 1... [] [] [] [] [] ['Id': 1, 'Description': '', 'Name': 'Point',... ['Id': 255, 'Description': None, 'Name': 'Liv... NaN ['Id': 7404, 'Description': None, 'Name': Non... [] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
链接到 csv 文件:https://drive.google.com/file/d/1IyONYdnIS44avtxXMLZN7ju8r1i2bq7r/view?usp=sharing
你知道怎么做吗?
这里是csv文件的标题行和第一个数据行:
,ProjectId,SubProjectId,ModifiedBy,UpdatedOn,Nodes,Members,Pipes,AirDucts,CableTrays,Objects3D,LoadNodes,LoadMembers,LoadCases,LoadObject3Ds,GeometryObjects,PipeRings,Results-Id,Results-Description,Results-Name,Results-AppData-Version,Results-AppData-Application,Results-AppData-ApplicationVersion,Results-AppData-Name,Results-AppData-ModelClientId,Results-AppData-GlobalOrientation,Results-Statistics-DurationTime,Results-Statistics-LoadingDataDuration,Results-Statistics-CalculationDuration,Results-Statistics-LoadingResultsDuration,Results-Statistics-IsCalculated,Results-Statistics-AreExceptions,Results-Statistics-CalculatorVersion,Results-Combination,Results-UlsResults,Results-SlsResults,Results-Exceptions,Results-Utilization,Results-Spacing,Results-Buckling,Results-LoadCases,Results
0,10949,32212,antonio_melicano@cameco.com,2021-04-22T23:02:18.073,"['Id': 1, 'Description': None, 'Name': 'Node 1', 'No': 1, 'Position': 'X': -0.48595, 'Y': 0.0, 'Z': 0.56283, 'MemberIds': [1], 'Details': 'SelectedId': 339738627, 'ConnectorId': 547, 'ConnectorVariantId': 2099, 'AnchorId': None, 'AnchorVariantId': None, 'BuildingMaterialId': 1, 'Rotation': 0.0, 'IsHinge': False, 'IsAutoGenerated': False, 'PushButtonId': None, 'UserSelection': False, 'IsBracket': False, 'NodeOffset': 'X': 1.407575607292877e-06, 'Y': -1.1102230246251565e-16, 'Z': 0.06904959771156305, 'ArmDetails': ['ArmId': 1, 'RowHoleId': -1], 'Connectors': ['Id': 547, 'Name': 'MT-B-GXL S3 OC', 'VariantId': 2099, 'ArticleNumber': 2272108, 'VariantName': 'MT-B-GXL S3 OC (C_SW_MT-90_web)', 'IsBase': True, 'ArmDetails': ['Id': 1, 'MemberId': 1, 'BorderBeam': True, 'Dimension': 0.0, 'Resistances': None, 'ReleaseId': 0, 'ReleaseAlternativeId': 1, 'ConnectorId': 547, 'ConnectorName': None, 'Transformation': None, 'OffsetModels': None], 'Transformation': None, 'SelectedId': 339738627, 'Anchor': None, 'LogicVerifications': [], 'Weight': 0.0], 'FasteningMethod': 0, 'BeamClampingType': 5, 'NodeSupportId': 1, 'GeometryObjectId': 7405, 'Type': 0, 'Members': [], 'NodeVirtuals': [], 'PipeRing': None, 'Id': 2, 'Description': None, 'Name': 'Node 2', 'No': 2, 'Position': 'X': -0.48595, 'Y': 0.0, 'Z': -0.6, 'MemberIds': [1, 2], 'Details': 'SelectedId': -233701373, 'ConnectorId': 388, 'ConnectorVariantId': 2063, 'AnchorId': None, 'AnchorVariantId': None, 'BuildingMaterialId': 2, 'Rotation': 0.0, 'IsHinge': False, 'IsAutoGenerated': False, 'PushButtonId': None, 'UserSelection': False, 'IsBracket': False, 'NodeOffset': 'X': 0.0, 'Y': 0.0, 'Z': 0.0, 'ArmDetails': [], 'Connectors': ['Id': 388, 'Name': 'MT-C-GLP T OC', 'VariantId': 2063, 'ArticleNumber': 2272075, 'VariantName': 'MT-C-GLP T OC (C_D)', 'IsBase': False, 'ArmDetails': ['Id': 1, 'MemberId': 1, 'BorderBeam': False, 'Dimension': 0.0, 'Resistances': None, 'ReleaseId': 0, 'ReleaseAlternativeId': None, 'ConnectorId': 388, 'ConnectorName': None, 'Transformation': None, 'OffsetModels': None, 'Id': 2, 'MemberId': 2, 'BorderBeam': True, 'Dimension': 0.0, 'Resistances': None, 'ReleaseId': 0, 'ReleaseAlternativeId': None, 'ConnectorId': 388, 'ConnectorName': None, 'Transformation': None, 'OffsetModels': None], 'Transformation': None, 'SelectedId': -233701373, 'Anchor': None, 'LogicVerifications': [], 'Weight': 0.0], 'FasteningMethod': 0, 'BeamClampingType': None, 'NodeSupportId': None, 'GeometryObjectId': None, 'Type': 0, 'Members': [], 'NodeVirtuals': [], 'PipeRing': None, 'Id': 4, 'Description': None, 'Name': 'Node 4', 'No': 3, 'Position': 'X': 1.14756, 'Y': 0.0, 'Z': -0.6, 'MemberIds': [2, 3], 'Details': 'SelectedId': 907149315, 'ConnectorId': 388, 'ConnectorVariantId': 2063, 'AnchorId': None, 'AnchorVariantId': None, 'BuildingMaterialId': 2, 'Rotation': 0.0, 'IsHinge': False, 'IsAutoGenerated': False, 'PushButtonId': None, 'UserSelection': False, 'IsBracket': False, 'NodeOffset': 'X': 0.0, 'Y': 0.0, 'Z': 0.0, 'ArmDetails': [], 'Connectors': ['Id': 388, 'Name': 'MT-C-GLP T OC', 'VariantId': 2063, 'ArticleNumber': 2272075, 'VariantName': 'MT-C-GLP T OC (C_D)', 'IsBase': False, 'ArmDetails': ['Id': 1, 'MemberId': 2, 'BorderBeam': False, 'Dimension': 0.0, 'Resistances': None, 'ReleaseId': 0, 'ReleaseAlternativeId': None, 'ConnectorId': 388, 'ConnectorName': None, 'Transformation': None, 'OffsetModels': None, 'Id': 2, 'MemberId': 3, 'BorderBeam': True, 'Dimension': 0.0, 'Resistances': None, 'ReleaseId': 0, 'ReleaseAlternativeId': None, 'ConnectorId': 388, 'ConnectorName': None, 'Transformation': None, 'OffsetModels': None], 'Transformation': None, 'SelectedId': 907149315, 'Anchor': None, 'LogicVerifications': [], 'Weight': 0.0], 'FasteningMethod': 0, 'BeamClampingType': None, 'NodeSupportId': None, 'GeometryObjectId': None, 'Type': 0, 'Members': [], 'NodeVirtuals': [], 'PipeRing': None, 'Id': 9, 'Description': None, 'Name': 'Node 9', 'No': 4, 'Position': 'X': 1.14756, 'Y': 0.0, 'Z': 0.8, 'MemberIds': [3], 'Details': 'SelectedId': -1602224125, 'ConnectorId': 486, 'ConnectorVariantId': 1926, 'AnchorId': 146, 'AnchorVariantId': 95, 'BuildingMaterialId': 2, 'Rotation': 0.0, 'IsHinge': True, 'IsAutoGenerated': False, 'PushButtonId': None, 'UserSelection': True, 'IsBracket': False, 'NodeOffset': 'X': 0.0, 'Y': 0.0, 'Z': 0.0, 'ArmDetails': ['ArmId': 1, 'RowHoleId': -1], 'Connectors': ['Id': 486, 'Name': 'MT-C-GL OC', 'VariantId': 1926, 'ArticleNumber': 2272066, 'VariantName': 'MT-C-GL OC (C_2)', 'IsBase': True, 'ArmDetails': ['Id': 1, 'MemberId': 3, 'BorderBeam': True, 'Dimension': 0.0, 'Resistances': None, 'ReleaseId': 0, 'ReleaseAlternativeId': 1, 'ConnectorId': 486, 'ConnectorName': None, 'Transformation': None, 'OffsetModels': None], 'Transformation': None, 'SelectedId': -1602224125, 'Anchor': None, 'LogicVerifications': [], 'Weight': 0.0], 'FasteningMethod': 1, 'BeamClampingType': None, 'NodeSupportId': 1, 'GeometryObjectId': 7404, 'Type': 0, 'Members': [], 'NodeVirtuals': [], 'PipeRing': None]","['Id': 1, 'Description': None, 'Name': 'Bar 1', 'No': 1, 'Gamma': 0.0, 'NodeStartId': 1, 'NodeEndId': 2, 'CrossSectionId': 405, 'Details': 'HasSupport': False, 'BuildingMaterialId': 0, 'Type': None, 'AnchorLocation': '', 'Distance': 0.0, 'IsBracket': False, 'HingedStart': 'IsHinged': False, 'Code': 2, 'HingedEnd': 'IsHinged': False, 'Code': 0, 'CrossOverDir': None, 'ChannelSupportSingle': False, 'MediaInterface': 'IsActive': False, 'MemberId': 0, 'StartingOffset': 0.0, 'Length': 0.0, 'Details': 'RowHoleId': None, 'Normal': None, 'SelectedId': None, 'ConnectorId': None, 'ConnectorVariantId': None, 'Connectors': [], 'Static': 'IsStatic': False, 'NodeId': None, 'IsSeismicArm': False, 'NodeIds': [], 'DivisionId': None, 'ReleaseStartId': None, 'ReleaseEndId': None, 'EccentricityId': None, 'Type': 0, 'Buckling': 'IsActivate': True, 'BucklingLengthCoeffY': 2.0, 'BucklingLengthCoeffZ': 2.0, 'LtbLengthCoeff': 2.0, 'NonLinearId': None, 'Nodes': [], 'Length': 0.0, 'Id': 2, 'Description': None, 'Name': 'Bar 2', 'No': 2, 'Gamma': 0.0, 'NodeStartId': 2, 'NodeEndId': 4, 'CrossSectionId': 405, 'Details': 'HasSupport': False, 'BuildingMaterialId': 0, 'Type': None, 'AnchorLocation': '', 'Distance': 0.0, 'IsBracket': False, 'HingedStart': 'IsHinged': False, 'Code': 0, 'HingedEnd': 'IsHinged': False, 'Code': 0, 'CrossOverDir': None, 'ChannelSupportSingle': False, 'MediaInterface': 'IsActive': False, 'MemberId': 0, 'StartingOffset': 0.0, 'Length': 0.0, 'Details': 'RowHoleId': None, 'Normal': None, 'SelectedId': None, 'ConnectorId': None, 'ConnectorVariantId': None, 'Connectors': [], 'Static': 'IsStatic': False, 'NodeId': None, 'IsSeismicArm': False, 'NodeIds': [], 'DivisionId': None, 'ReleaseStartId': None, 'ReleaseEndId': None, 'EccentricityId': None, 'Type': 0, 'Buckling': 'IsActivate': True, 'BucklingLengthCoeffY': 2.0, 'BucklingLengthCoeffZ': 2.0, 'LtbLengthCoeff': 2.0, 'NonLinearId': None, 'Nodes': [], 'Length': 0.0, 'Id': 3, 'Description': None, 'Name': 'Bar 3', 'No': 3, 'Gamma': 0.0, 'NodeStartId': 4, 'NodeEndId': 9, 'CrossSectionId': 405, 'Details': 'HasSupport': False, 'BuildingMaterialId': 0, 'Type': None, 'AnchorLocation': '', 'Distance': 0.0, 'IsBracket': False, 'HingedStart': 'IsHinged': False, 'Code': 0, 'HingedEnd': 'IsHinged': True, 'Code': 2, 'CrossOverDir': None, 'ChannelSupportSingle': False, 'MediaInterface': 'IsActive': False, 'MemberId': 0, 'StartingOffset': 0.0, 'Length': 0.0, 'Details': 'RowHoleId': None, 'Normal': None, 'SelectedId': None, 'ConnectorId': None, 'ConnectorVariantId': None, 'Connectors': [], 'Static': 'IsStatic': False, 'NodeId': None, 'IsSeismicArm': False, 'NodeIds': [], 'DivisionId': None, 'ReleaseStartId': None, 'ReleaseEndId': 1, 'EccentricityId': None, 'Type': 0, 'Buckling': 'IsActivate': True, 'BucklingLengthCoeffY': 2.0, 'BucklingLengthCoeffZ': 2.0, 'LtbLengthCoeff': 2.0, 'NonLinearId': None, 'Nodes': [], 'Length': 0.0]",[],[],[],[],[],"['Id': 1, 'Description': '', 'Name': 'Point', 'No': 0, 'LoadCaseId': 255, 'Rotation': 'X': 0.0, 'Y': 0.0, 'Z': 0.0, 'LTBSupport': False, 'IsConcetrated': True, 'IsRelative': False, 'Points': ['Forces': 'X': 0.0, 'Y': 0.0, 'Z': -10000.0, 'Moments': 'X': 0.0, 'Y': 0.0, 'Z': 0.0, 'Position': 0.8121197179978898], 'MemberIds': [2], 'IsGlobal': True, 'IsProjected': False, 'Type': 'BarForce', 'DefinitionType': 0, 'Details': 'IsMirrored': False, 'MirroredLoadId': None, 'SeismicLoad': None, 'Members': [], 'Id': 2, 'Description': '', 'Name': 'Point (1)', 'No': 0, 'LoadCaseId': 261, 'Rotation': 'X': -0.0, 'Y': -0.0, 'Z': -0.0, 'LTBSupport': False, 'IsConcetrated': False, 'IsRelative': False, 'Points': ['Forces': 'X': 1000.0, 'Y': -0.0, 'Z': -0.0, 'Moments': 'X': -0.0, 'Y': -0.0, 'Z': -0.0, 'Position': 0.70616, 'Forces': 'X': 1000.0, 'Y': -0.0, 'Z': -0.0, 'Moments': 'X': -0.0, 'Y': -0.0, 'Z': -0.0, 'Position': 1.4], 'MemberIds': [3], 'IsGlobal': True, 'IsProjected': False, 'Type': 'BarForce', 'DefinitionType': 0, 'Details': 'IsMirrored': True, 'MirroredLoadId': 3, 'SeismicLoad': None, 'Members': [], 'Id': 3, 'Description': '', 'Name': 'Point (1) (1)', 'No': 0, 'LoadCaseId': 262, 'Rotation': 'X': -0.0, 'Y': -0.0, 'Z': -0.0, 'LTBSupport': False, 'IsConcetrated': False, 'IsRelative': False, 'Points': ['Forces': 'X': -1000.0, 'Y': -0.0, 'Z': -0.0, 'Moments': 'X': -0.0, 'Y': -0.0, 'Z': -0.0, 'Position': 0.70616, 'Forces': 'X': -1000.0, 'Y': -0.0, 'Z': -0.0, 'Moments': 'X': -0.0, 'Y': -0.0, 'Z': -0.0, 'Position': 1.4], 'MemberIds': [3], 'IsGlobal': True, 'IsProjected': False, 'Type': 'BarForce', 'DefinitionType': 0, 'Details': 'IsMirrored': True, 'MirroredLoadId': 2, 'SeismicLoad': None, 'Members': []]","['Id': 255, 'Description': None, 'Name': 'Live ', 'DesignBasisId': 2, 'LoadCombinationBasisId': 19, 'LoadActionId': 83, 'No': 5, 'DeadWeightDirection': None, 'LoadCaseType': 4, 'LoadActionType': 1, 'AnalysisType': 0, 'LeadingVariableAction': None, 'LifeFactor': 0.0, 'IsConvertedToSeismic': True, 'IsMirrored': False, 'MirroredWithLoadCaseId': None, 'SeismicLoadCase': None, 'MappingCode': None, 'Id': 261, 'Description': None, 'Name': 'Seismic (+x)', 'DesignBasisId': 2, 'LoadCombinationBasisId': 19, 'LoadActionId': 86, 'No': 13, 'DeadWeightDirection': None, 'LoadCaseType': 4, 'LoadActionType': 2, 'AnalysisType': 0, 'LeadingVariableAction': None, 'LifeFactor': 0.0, 'IsConvertedToSeismic': False, 'IsMirrored': True, 'MirroredWithLoadCaseId': 262, 'SeismicLoadCase': '+X', 'MappingCode': None, 'Id': 262, 'Description': None, 'Name': 'Seismic (-x)', 'DesignBasisId': 2, 'LoadCombinationBasisId': 19, 'LoadActionId': 86, 'No': 14, 'DeadWeightDirection': None, 'LoadCaseType': 4, 'LoadActionType': 2, 'AnalysisType': 0, 'LeadingVariableAction': None, 'LifeFactor': 0.0, 'IsConvertedToSeismic': False, 'IsMirrored': True, 'MirroredWithLoadCaseId': 261, 'SeismicLoadCase': '-X', 'MappingCode': None, 'Id': 251, 'Description': None, 'Name': 'Self Weight Support', 'DesignBasisId': 2, 'LoadCombinationBasisId': 19, 'LoadActionId': 82, 'No': 1, 'DeadWeightDirection': 'X': 0.0, 'Y': 0.0, 'Z': -1.0, 'LoadCaseType': 0, 'LoadActionType': 0, 'AnalysisType': 0, 'LeadingVariableAction': None, 'LifeFactor': 0.0, 'IsConvertedToSeismic': True, 'IsMirrored': False, 'MirroredWithLoadCaseId': None, 'SeismicLoadCase': None, 'MappingCode': None]",,"['Id': 7404, 'Description': None, 'Name': None, 'Min': 'X': 0.89756, 'Y': -0.25, 'Z': 0.8, 'Max': 'X': 1.39756, 'Y': 0.25, 'Z': 1.0, 'ElementType': 'Ceiling', 'BuildingMaterialId': 2, 'BoundBox': 'Min': 'X': 0.89756, 'Y': -0.25, 'Z': 0.8, 'Max': 'X': 1.39756, 'Y': 0.25, 'Z': 1.0, 'Size': 'X': 0.4999999999999999, 'Y': 0.5, 'Z': 0.19999999999999996, 'MidPoint': 'X': 1.14756, 'Y': 0.0, 'Z': 0.9, 'Direction': 'X': 0.0, 'Y': 1.0, 'Z': 0.0, 'Steel': None, 'Concrete': 'DrillingDepth': 0.15, 'Grade': 'C20/25', 'Thickness': 0.2, 'Id': 7405, 'Description': None, 'Name': None, 'Min': 'X': -0.6193, 'Y': -0.25, 'Z': 0.56283, 'Max': 'X': -0.3526, 'Y': 0.25, 'Z': 0.70883, 'ElementType': 'SteelElement', 'BuildingMaterialId': 1, 'BoundBox': 'Min': 'X': -0.6193, 'Y': -0.25, 'Z': 0.56283, 'Max': 'X': -0.3526, 'Y': 0.25, 'Z': 0.70883, 'Size': 'X': 0.26669999999999994, 'Y': 0.5, 'Z': 0.1459999999999999, 'MidPoint': 'X': -0.48595, 'Y': 0.0, 'Z': 0.63583, 'Direction': 'X': 0.0, 'Y': 1.0, 'Z': 0.0, 'Steel': 'SteelShapeId': 1721, 'Rotation': 1.5707963267948966, 'IsCustom': False, 'Parameters': 'ShapeCode': 'H', 'H': 0.2667, 'B': 0.146, 'Tfl': 0.0127, 'Tw': 0.0079, 'R': 0.0159, 'MateriaId': 22, 'Concrete': None]",[],,,,,,,,,,,,,,,,,,,,,,,,,
【问题讨论】:
首先,我们不需要一个巨大的 csv 文件的链接,但是您应该在问题本身中添加几行 as text 以便我们可以轻松地尝试加载文件(我已经为你完成了那部分)。但你真正的问题是这个 csv 文件只是垃圾。如果您最初有 json 文件,则应该在构建 csv 文件之前进行处理。它可以在之后完成,但在之前完成它肯定会更容易和更健壮。 【参考方案1】:您可以使用自定义转换器将字典的字符串表示形式加载为字典:
import pandas as pd
df = pd.read_csv('filename.csv', converters='LoadNodes': pd.eval,'LoadMembers': pd.eval)
【讨论】:
以上是关于如何自动将csv转换为pandas?的主要内容,如果未能解决你的问题,请参考以下文章
python - 如何将csv转换为python pandas中的嵌套json?
如何在使用 pandas.read_csv 读取 csv 文件时将 pandas.dataframe 中的元素转换为 np.float?
如何在使用 pandas.read_csv 读取 csv 文件时将 pandas.dataframe 中的元素转换为 np.float?