Deep learning on Python using Tensorflow

I am currently using to load the learning data set from the CSV file.Each CSV file was created before Tenforflow started, but each file size is >100G, causing problems such as HD space and data transfer.So instead of pre-creating that CSV file, I'm looking for a way to create a list that contains similar information in the program and change that list to a learning dataset.

Below is a list of some of the code you are currently using.

import tensorflow as tf
import sys
os.environ ['CUDA_VISIBLE_DEVICES'] = "0" # This is specified because you are using a PC with multiple GPUs.

outfn = "stackoverflow.csv", [tf.string, tf.string])
# The following is for confirmation.
for element in dataset.as_numpy_iterator():

The stackoverflow.csv contains two columns of strings connected by commas.

aaaaaaaaaaa2, ccccccccccc1

(Actually, each string has about 1000 characters and has between 1 million and 100 million lines.)


without the above I'm looking for a way to reconfigure the stackoverflow.csv information on my PC and store it in dataset.

If anyone understands, I would appreciate it if you could let me know.
Thank you for your cooperation.

2022-09-30 13:50

1 Answers

Sorry, I solved myself.
I used from_tensor_slices.

Instead of a notepad, the instructions are listed below.

The solution is

instead of List->tensorflow.dataset It was List->pandas.dataframe->tensorflow.dataset.

For example, the pandas.dataframe->tensorflow.dataset part is

import pandas as pd
df=pd.read_csv("stackoverflow.csv", header=None)
input = df ["input" ]
target = df ["target" ], target.values))

2022-09-30 13:50

If you have any answers or tips

