COMP3425数据挖掘

Posted 2023-05-08 guhgf18

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了COMP3425数据挖掘相关的知识，希望对你有一定的参考价值。

COMP3425辅导、辅导c/c++，Python编程
COMP3425 and COMP8410 Data Mining S1 2023
Assignment 2: Description of
Data
Data and Metadata

The data supplied for the assignment arises from The Australian Data Archive’s ANU Poll
Dataverse [1]. As a student of the course, you are assumed to accept the Terms and Conditions
of Use reproduced below. Please read them carefully. The custodian of the data has requested
you delete your data at the end of the course.

In particular the data captures the results of a survey poll conducted in 2019 on the topic of
attitudes and behaviours towards Universities, amongst other things. You can find a complete
description of the purpose of the poll and coding of the data (metadata) and also a descriptive
summary of the poll results here:
https://dataverse.ada.edu.au/dataset.xhtml?persistentId=doi:10.26193/GOVGBB
The data is provided to you for the assignment in two forms. The first is the original dataset
as downloaded from the ADA called 2.ANUPoll2019RoleOfGovernment_CSV_01445.csv, in
comma-separated-values format. This data is described by the metadata in 1.
ADA.CODEBOOK.01445.xslx and the corresponding question text in 1.
ADA.QUESTIONNAIRE.01445.pdf

The second is a form derived from the original, pre-processed for the COMP3425 data mining
assignment, in comma-separated-values format called 3425_data.csv. Below you will find a
description of the pre-processing undertaken and this, in addition to the original metadata,
will be needed to assist your understanding of the data.

If you are a COMP3425 (undergraduate) student, you must work with the pre-processed
dataset 3425_data.csv.

If you are COMP8410 (postgraduate) student you may use either the original or the pre-
processed data, or both. The original will give you more opportunity to show off your technical
skills and creativity, while the pre-processed one is more constrained but may save time,
requiring you to spend less effort understanding the data, and helping to avoid some data
errors. The same代做 rubric will be used for marking in both cases, but the original dataset provides
an extended learning experience and better opportunity for higher marks. Even if you use the
original data, you may find it useful to observe the pre-processing that has been undertaken to
produce 3425_data.csv to seed ideas or to solve problems you encounter.

Pre-processing applied with Excel to derive 3425_data.csv

? Only a selection of the original attributes have been retained.
? The Q15_safe_gambler column has been added, based on respondent’s answers to
questions Q15a-i, which have answers that range from almost always to never.
Q15_safe_gambler is a normalized number in the range [0,1] that shows the rarity of
the various problem gambling behaviours raised in Q15a-i. Refused and Don’t know
options are replaced by the midpoint value for each question, and the field is null
when the Q15 questions were not asked.
Q15_safe_gambler = IF(NOT(Q14=" "),((IF(OR(Q15a=-98, Q15a =-99),2.5,
Q15a)+(IF(OR(Q15b=-98, Q15b =-99),2.5, Q15b))+(IF(OR(Q15c =-98, Q15c
=-99),2.5, Q15c))+(IF(OR(Q15d =-98, Q15d =-99),2.5, Q15d))+(IF(OR(Q15e
=-98, Q15e4=-99),2.5, Q15e))+(IF(OR(Q15f =-98, Q15f =-99),2.5,
Q15f))+(IF(OR(Q15g =-98, Q15g=-99),2.5 Q15g))+(IF(OR(Q15h=-98, Q15h =-
99),2.5, Q15h))+(IF(OR(Q15i=-98, Q15i =-99),2.5, Q15i)))-9)/27,"")

? The binary undecided voter column was added based on the given answer to Q4, and
is TRUE when the answer to Q4 is one of -98, -99, 95, 97 and FALSE otherwise. That
is, IF(OR(OR(OR(Q4=-99, Q4=-98),Q4=95), Q4=97),TRUE,FALSE).
? For two categorical columns, nominal Q2 and nominal StateMap, double quotation
marks were added to all non-empty cells. For the rest of the categorical columns,
you can use the same approach to help Rattle recognise categorical data in a column
if necessary. For example, for nominal StateMap, the formula CONCATENATE("""",
StateMap, """") is used. For nominal Q2, the formula CONCATENATE("""", TEXT(Q2,
"0"), """") is used.

References

[1] Biddle, Nicholas; and Reddy, Karuna, 2019, “ANU Poll 2019: Role of the University”,
doi/10.26193/GOVGBB

Terms and Conditions of Use

This data has been distributed exclusively for students of COMP3425 and COMP8410 S1
2023 only. Data must be destroyed at the end of the course but may be re-obtained by
request to the Australian Data Archive.

Furthermore, from https://dataverse.ada.edu.au/dataset.xhtml?persistentId=doi:10.26193/GOVGBB,

I acknowledge that:

1. Use of the material is restricted to use for analytical purposes and that this means that I can only
use the material to produce information of an analytical nature.

Examples of such uses are: (a) the manipulation of data to produce means, correlations or other
descriptive summary measures; (b) the estimation of population characteristics from sample data;
(c) the use of data as input to mathematical models and for other types of analyses (e.g. factor
analysis); and (d) to provide graphical and pictorial representation of characteristics of the
population or sub-sets of the population.

2. The material is not to be used for any non-analytical purposes, or for commercial or financial gain,
without the express written permission of the Australian Data Archive.
Examples of non-analytical purposes are: (a) transmitting or allowing access to the data in part or
whole to any other person / Department / Organisation not a party to this undertaking; and (b)
attempting to match unit record data in whole or in part with any other information for the
purposes of attempting to identify individuals.

3. Outputs (such as statistics, tables and graphs) obtained from analysis of these data may be further
disseminated provided that I:
(a) acknowledge both the original depositors and the Australian Data Archive; (b) acknowledge
another archive where the data file is made available through the Australian Data Archive by
another archive; and (c) declare that those who carried out the original analysis and collection of the
data bear no responsibility for the further analysis or interpretation of it.

4. Use of the material is solely at my risk and I indemnify the Australian Data Archive and its host
institution, The Australian National University.

5. The Australian Data Archive and its host institution, The Australian National University, shall not
be held liable for any breach of this undertaking.

6. The Australian Data Archive and its host institution, The Australian National University, shall not
be held responsible for the accuracy and completeness of the material supplied.

WX：codehelp

服务器找不到 java:comp/env/jdbc/my_db 数据源...在上下文“java:”中找不到名称 comp/env/jdbc

【中文标题】服务器找不到 java:comp/env/jdbc/my_db 数据源...在上下文“java:”中找不到名称 comp/env/jdbc【英文标题】：The server cannot locate the java:comp/env/jdbc/my_db data source ... Name comp/env/jdbc not found in context "java:" 【发布时间】：2013-08-02 09:36:12 【问题描述】：

我有一个包含许多模块的 Java EE 应用程序。我试图能够进行间接 JNDI 查找。

我遵循了这些步骤：

ejb-jar.xml: 在每个模块中。我定义了一个企业 bean。模块中的所有 DAO 都继承自这个 DAO(MyDataAccessObject)

<enterprise-beans>
        <session>
            <ejb-name>DataAccessObject</ejb-name>
            <ejb-class>com.mycompany.dao.MyDataAccessObject</ejb-class>
            <session-type>Stateless</session-type>
            <transaction-type>Container</transaction-type>
            <resource-ref id="MyRef">
                <description />
                <res-ref-name>jdbc/My_db</res-ref-name>
                <res-type>javax.sql.DataSource</res-type>
                <res-auth>Container</res-auth>
                <res-sharing-scope>Shareable</res-sharing-scope>
            </resource-ref>
        </session>
</enterprise-beans>

persistence.xml ：我在每个persistence.xml中定义（在每个模块中）

<jta-data-source>java:comp/env/jdbc/My_db</jta-data-source>

ibm-application-bnd.xml

 <resRefBindings xmi:id="MyRef" jndiName="jdbc/My_db">

    ?????? Should I use resRefBindings. If yes, how? 
</resRefBindings>

我应该在 Websphere 知道的关于 java:comp/env/jdbc/My_db 的文档中添加什么内容？

我已经做的足够/正确吗？

目前，如果我想启动应用程序，我会收到此错误：

The server cannot locate the java:comp/env/jdbc/my_db data source for the My_Modul persistence unit because it has encountered the following exception:
 Name comp/env/jdbc not found in context "java:".

编辑：我在事件文件中也发现了这个错误：

Caused by: <openjpa-2.1.2-SNAPSHOT-r422266:1384519 fatal user error> org.apache.openjpa.persistence.ArgumentException: A JDBC Driver or DataSource class name must be specified in the ConnectionDriverName property.
    at org.apache.openjpa.jdbc.schema.DataSourceFactory.newDataSource(DataSourceFactory.java:76)
    at org.apache.openjpa.jdbc.conf.JDBCConfigurationImpl.createConnectionFactory(JDBCConfigurationImpl.java:844)
    at org.apache.openjpa.jdbc.conf.JDBCConfigurationImpl.getDBDictionaryInstance(JDBCConfigurationImpl.java:602)
    at org.apache.openjpa.jdbc.meta.MappingRepository.endConfiguration(MappingRepository.java:1510)

【问题讨论】：

您是否尝试在 jta-data-source 中仅使用 jdbc/My_db？我承认我并没有完全做到这一点，但我认为无论你映射它，你引用它的方式都是通过res-ref-name。（我们总是在部署时通过 WebSphere 控制台而不是通过创建绑定文件来映射我们的引用。） WebSphere 确实支持persistence.xml 中的java:comp 名称，尽管我不相信规范中有定义。请显示完整的异常（对于“当前我在部署时遇到此错误”），这可能会提供有关查找失败原因的其他提示。 @bkail 我在事件文件中添加了一个错误。它对你说了什么吗？ @dbreaux 你能提供一个样本如何工作 @Kayser 是的，错误信息通常意味着 DataSource 查找失败；该错误应以 CWWJP0013E 开头。 【参考方案1】：

您使用在 WAS 7.0 中受支持但被认为已过时的 XMI 绑定 (resRefBindings)。建议使用 XML 绑定。在META-INF 中应该有一个名为ibm-ejb-jar-bnd.xml 的文件，其内容如下：

<ejb-jar-bnd xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://websphere.ibm.com/xml/ns/javaee"
    xsi:schemaLocation="http://websphere.ibm.com/xml/ns/javaee http://websphere.ibm.com/xml/ns/javaee/ibm-ejb-jar-bnd_1_0.xsd" version="1.0">
  <session name="DataAccessObject">
    <resource-ref name="**datasource_ref_in_your_EJB**" binding-name="jdbc/My_db"/>
  </session>
</ejb-jar-bnd>

我还假设您已经在 WAS 中配置了 JNDI 名称为“jdbc/My_db”的数据源。

【讨论】：

以上是关于COMP3425数据挖掘的主要内容，如果未能解决你的问题，请参考以下文章