Возможный дубликат:
Как я могу удалить повторяющиеся строки из файла?
У меня есть файл с дублированными записями, которые я хочу удалить. Это то, что я пробовал
import sys
for line in sys.stdin:
line = line.rstrip()
line = line.split()
idlist = []
if idlist == []:
idlist = line[1]
else:
idlist.append(line[1])
print line[0], idlist
for line in sys.stdin:
line = line.rstrip()
line = line.split()
lines_seen = set()
dup = line[1]
if dup not in lines_seen:
lines_seen = dup
else:
lines_seen.append(dup)
print line[0], lines_seen
sys.stdin.close()
Это выглядит так:
BLE 1234
BLE 1223
LLE 3456
ELE 1223
BLE 4444
ELE 5555
BLE 4444
BLE 1234
BLE 1223
LLE 3456
BLE 4444
ELE 5555
Благодарю! EDG
elem1_seen = set() # first initialize an empty set of seen elem[1]
lines_out = [] # list of "unique" output lines
for line in sys.stdin: # iterate over input
elems = line.rstrip().split() # split line into two elements
if elems[1] not in elem1_seen: # if second element not seen before...
lines_out.append(line) # append the whole line to output
elem1_seen.add(elems[1]) # add this second element to seen before set
print lines_out # print output
import fileinput
ss = '''BLE 1234
BLE 1223
LLE 3456
ELE 1223
BLE 4444
ELE 5555
BLE 4444
'''
with open('klmp.txt','w') as f:
f.write(ss)
seen = []
for line in fileinput.input('klmp.txt',inplace=1):
b = line.split()[1]
if b not in seen:
seen.append(b)
print line.strip()
Поиск со словом "fileinput" в SO, я нашел:
Основная проблема заключается в том, что вы меняете типы переменных, создавая небольшую путаницу:
import sys
for line in sys.stdin:
line = line.rstrip() #Line is a string
line = line.split() #Line is a list
idlist = [] #idlist is a list
if idlist == []:
idlist = line[1] #id list is a string
else:
idlist.append(line[1]) #and now?
print line[0], idlist