如何删除一定长度的字符串?
Posted
技术标签:
【中文标题】如何删除一定长度的字符串?【英文标题】:How to remove a string of a certain length? 【发布时间】:2020-11-30 20:07:35 【问题描述】:我有 pandas.core.series.Series 文本。我只想保留长度小于 512 的句子。
0 Lebanon, officially known as the Republic of Lebanon, is a country in Western Asia. It is bordered by Syria to the north and east and Israel to the south, while Cyprus lies west across the Mediterranean Sea. Lebanon's location at the cros-s-roads of the Mediterranean Basin and the Ar*** hinterland has contributed to its rich history and shaped a cultural identity of religious and ethnic diversity At just 10,452 km2 (4,036 mi2), it is the smallest recognized sovereign state on the mainland Asian continent The earliest evidence of civilization in Lebanon dates back more than seven thousand years, predating recorded history.
1 Lebanon was home to the Phoenicians, a maritime culture that flourished for almost three thousand years (c. 3200–539 BC). In 64 BC, the region came under the rule of the Roman Empire, and eventually became one of its leading centers of Christianity. The Mount Lebanon range saw the emergence of a onastic tradition known as the Maronite Church. As the Arab Muslims conquered the region, the Maronites held onto their religion and identity. However, a new religious group, the Druze, established themselves in Mount Lebanon as well, generating a religious divide that has lasted for centuries. During the Crusades, the Maronites re-established contact with the Roman Catholic Church and asserted their communion with Rome. These ties have influenced the region into the modern era.
如果有 len(sentence) > 512,我想删除。所以,输出将是:
0 Lebanon, officially known as the Republic of Lebanon, is a country in Western Asia. It is bordered by Syria to the north and east and Israel to the south, while Cyprus lies west across the Mediterranean Sea.
1 Lebanon was home to the Phoenicians, a maritime culture that flourished for almost three thousand years (c. 3200–539 BC). In 64 BC, the region came under the rule of the Roman Empire, and eventually became one of its leading centers of Christianity. The Mount Lebanon range saw the emergence of a monastic tradition known as the Maronite Church. As the Arab Muslims conquered the region, the Maronites held onto their religion and identity. However, a new religious group, the Druze, established themselves in Mount Lebanon as well, generating a religious divide that has lasted for centuries. During the Crusades, the Maronites re-established contact with the Roman Catholic Church and asserted their communion with Rome. These ties have influenced the region into the modern era.
我可以使用此代码吗?感谢您的帮助。
remove = [x for x in text if len(x) < 512]
我尝试了Python: Pandas filter string data based on its string length 的解决方案,但情况不同。
【问题讨论】:
试一试 【参考方案1】:对split
和join
使用列表推导式:
L = ['.'.join(y for y in x.split('.') if len(y) < 512) for x in s]
s = pd.Series(L, index=s.index)
或者使用Series.apply
的自定义函数:
s = s.apply(lambda x: '.'.join(y for y in x.split('.') if len(y) < 512))
【讨论】:
以上是关于如何删除一定长度的字符串?的主要内容,如果未能解决你的问题,请参考以下文章